* Re: [PATCH v9 01/23] x86/virt/tdx: Consolidate TDX global initialization states
From: Dave Hansen @ 2026-05-18 18:09 UTC (permalink / raw)
To: Edgecombe, Rick P, Gao, Chao
Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
Huang, Kai, kvm@vger.kernel.org, Li, Xiaoyao, Zhao, Yan Y,
dave.hansen@linux.intel.com, tony.lindgren@linux.intel.com,
Chatre, Reinette, seanjc@google.com, pbonzini@redhat.com,
binbin.wu@linux.intel.com, Weiny, Ira, nik.borisov@suse.com,
mingo@redhat.com, Verma, Vishal L, kas@kernel.org, Shahar, Sagi,
Annapurve, Vishal, djbw@kernel.org, tglx@kernel.org,
paulmck@kernel.org, hpa@zytor.com, bp@alien8.de,
yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <f6e9a736d15ec41d23156c8e4e75533e8debf908.camel@intel.com>
On 5/18/26 11:00, Edgecombe, Rick P wrote:
> This is a great improvement by itself, irrespective of this series.
Agreed. If it came by itself, I'd probably apply it.
^ permalink raw reply
* Re: [PATCH v9 04/23] coco/tdx-host: Introduce a "tdx_host" device
From: Dave Hansen @ 2026-05-18 18:08 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Dan Williams, Jonathan Cameron, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <agr1oB+IZNCxPuLL@intel.com>
On 5/18/26 04:18, Chao Gao wrote:
> On Fri, May 15, 2026 at 09:21:36AM -0700, Dave Hansen wrote:
>> On 5/13/26 08:09, Chao Gao wrote:
>>> Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com>
>>> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>>
>> This SoB chain at _least_ needs a note. It looks quite bizarre.
>
> I will add a note.
>
> This patch was originally written by me and then substantially revised
> by Yilun, hence his Co-developed-by and Signed-off-by.
>
> Then Dan made additional cleanups on top of Yilun's version and was the first
> to post it at:
>
> https://lore.kernel.org/all/20250919142237.418648-2-dan.j.williams@intel.com/
>
> The current version is based on that posted patch, which is why the SoB
> chain is unusual.
On some level, I just don't care how this bounced around within Intel or
any other company. I don't need to hear the tale of woe.
I'd frankly like to just see simplicity in the SoB chain:
Thanks to Dan and Xu Yilun for all the help on this one.
Signed-off-by: Chao
IMNHO, SoB should start when it leaves your company.
>>> +config TDX_HOST_SERVICES
>>> + tristate "TDX Host Services Driver"
>>> + depends on INTEL_TDX_HOST
>>> + default m
>>> + help
>>> + Enable access to TDX host services like module update and
>>> + extensions (e.g. TDX Connect).
>>> +
>>> + Say y or m if enabling support for confidential virtual machine
>>> + support (CONFIG_INTEL_TDX_HOST). The module is called tdx_host.ko.
>>
>> In what world will anyone ever set INTEL_TDX_HOST=y, but turn this off?
>
> No, I do not think disabling TDX_HOST_SERVICES while INTEL_TDX_HOST=y makes
> sense.
>
>> Is this even worth a Kconfig prompt?
>>
>> I guess we need it for the module or built in choice. But otherwise it
>> seems a bit silly.
>
> Yes. it is for the module vs built-in choice.
Can it just be this, then?
config TDX_HOST_SERVICES
tristate
depends on INTEL_TDX_HOST
default m
That won't prompt people and it will set =m too. I think.
^ permalink raw reply
* Re: [PATCH v9 01/23] x86/virt/tdx: Consolidate TDX global initialization states
From: Edgecombe, Rick P @ 2026-05-18 18:00 UTC (permalink / raw)
To: Hansen, Dave, Gao, Chao
Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
Huang, Kai, kvm@vger.kernel.org, Li, Xiaoyao, Zhao, Yan Y,
dave.hansen@linux.intel.com, tony.lindgren@linux.intel.com,
Chatre, Reinette, seanjc@google.com, pbonzini@redhat.com,
binbin.wu@linux.intel.com, Weiny, Ira, nik.borisov@suse.com,
mingo@redhat.com, Verma, Vishal L, kas@kernel.org, Shahar, Sagi,
Annapurve, Vishal, djbw@kernel.org, tglx@kernel.org,
paulmck@kernel.org, hpa@zytor.com, bp@alien8.de,
yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <agrDIa/foXa2PhCh@intel.com>
On Mon, 2026-05-18 at 15:43 +0800, Chao Gao wrote:
> commit 26bb389c5762fd6a496fbed1cc55e4978e99a5cb
> Author: Chao Gao <chao.gao@intel.com>
> Date: Sun May 17 20:03:00 2026 -0700
>
> x86/virt/tdx: Clarify try_init_module_global() result caching
>
> TDX module global initialization is executed only once. The first call
> caches both the result and the "done" state, and later callers reuse the
> saved result. A lock protects that cached state.
>
> The current code is harder to read because sysinit_done is accessed under
^ harder then what? Maybe just "hard"
> the lock, while sysinit_ret is not.
>
> To improve readability, move sysinit_ret accesses within the lock.
>
> Group sysinit_ret/sysinit_done updates right after initialization so
> Caching the result is separate from the initialization itself.
>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
This is a great improvement by itself, irrespective of this series. The original
code made my head hurt when I first saw it:
https://lore.kernel.org/all/726dccd6d46d0bd471ec0b2f6861f8e45bade26c.camel@intel.com/
The handling of things outside the lock is one thing, but also the function
scopes statics stood out to me as strange. So yea, maybe two patches, this one
and another to get rid of the function scoped statics?
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index c0c6281b08a5..ad56f142dd0b 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -115,28 +115,34 @@ static int try_init_module_global(void)
> static DEFINE_RAW_SPINLOCK(sysinit_lock);
> static bool sysinit_done;
> static int sysinit_ret;
> + int ret;
>
> raw_spin_lock(&sysinit_lock);
>
> - if (sysinit_done)
> + /* Return the "cached" return code. */
> + if (sysinit_done) {
> + ret = sysinit_ret;
> goto out;
> + }
>
> /* RCX is module attributes and all bits are reserved */
> args.rcx = 0;
> - sysinit_ret = seamcall_prerr(TDH_SYS_INIT, &args);
> + ret = seamcall_prerr(TDH_SYS_INIT, &args);
>
> /*
> * The first SEAMCALL also detects the TDX module, thus
> * it can fail due to the TDX module is not loaded.
> * Dump message to let the user know.
> */
> - if (sysinit_ret == -ENODEV)
> + if (ret == -ENODEV)
> pr_err("module not loaded\n");
>
> + /* Save the return code for later callers. */
> sysinit_done = true;
> + sysinit_ret = ret;
> out:
> raw_spin_unlock(&sysinit_lock);
> - return sysinit_ret;
> + return ret;
> }
^ permalink raw reply
* Re: [PATCH v9 02/23] x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h
From: Edgecombe, Rick P @ 2026-05-18 16:57 UTC (permalink / raw)
To: Hansen, Dave, Gao, Chao
Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
Huang, Kai, kvm@vger.kernel.org, Li, Xiaoyao, Zhao, Yan Y,
dave.hansen@linux.intel.com, tony.lindgren@linux.intel.com,
Chatre, Reinette, seanjc@google.com, pbonzini@redhat.com,
binbin.wu@linux.intel.com, Weiny, Ira, nik.borisov@suse.com,
mingo@redhat.com, Verma, Vishal L, kas@kernel.org, Shahar, Sagi,
Annapurve, Vishal, djbw@kernel.org, tglx@kernel.org,
paulmck@kernel.org, hpa@zytor.com, bp@alien8.de,
yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <agrFTv8Lr3HeNM6P@intel.com>
On Mon, 2026-05-18 at 15:52 +0800, Chao Gao wrote:
> On Fri, May 15, 2026 at 09:15:47AM -0700, Dave Hansen wrote:
> > On 5/13/26 08:09, Chao Gao wrote:
> > > This prepares for TDX module update [1] and Dynamic PAMT [2] support. Both
> > > add new TDX_FEATURES0 capability bits, and both need those capabilities to
> > > be queried from code outside arch/x86/virt. The corresponding feature-query
> > > helpers therefore need to live in the public asm/tdx.h header, so move the
> > > existing bit definitions there first.
> >
> > Please don't add unnecessary changelog cruft. If you need this move for
> > this series, that's enough.
>
> Sure. Will remove "Dynamic PAMT" stuff from the changelog.
I think it should not link to old versions of this series to explain the
preparation. That is very confusing. We can just explain what will come in the
later patches of *this* series. I'll circle back and propose some verbiage.
^ permalink raw reply
* Re: [RFC PATCH v4 00/14] coco/TSM: Host-side Arm CCA IDE setup via connect/disconnect callbacks
From: Aneesh Kumar K.V @ 2026-05-18 15:53 UTC (permalink / raw)
To: Will Deacon
Cc: linux-coco, kvmarm, linux-arm-kernel, linux-kernel,
Alexey Kardashevskiy, Catalin Marinas, Dan Williams,
Jason Gunthorpe, Jonathan Cameron, Marc Zyngier, Samuel Ortiz,
Steven Price, Suzuki K Poulose, Xu Yilun
In-Reply-To: <agsNO9cc7H-b0H8L@willie-the-truck>
Will Deacon <will@kernel.org> writes:
> On Mon, Apr 27, 2026 at 12:21:07PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> arch/arm64/include/asm/rmi_cmds.h | 85 +++
>> arch/arm64/include/asm/rmi_smc.h | 168 +++++
>
> Curious, but why does this stuff have to live in the arch code? Wouldn't
> it be better off somewhere like drivers/firmware/ or
> include/linux/arm-rmi.h?
>
Those headers are used to collect all RMI-related helpers and #defines.
They were introduced by the Realm KVM/host support patch series, and I
am continuing to use the same headers to add more helpers.
We can consider moving the RMI helpers used by virt/coco/arm-caa-guest/,
virt/coco/arm-cca-host/, and
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-realm.c into a more generic
header such as include/linux/arm-rmi.h. However, that would either
require moving all the helpers currently used by KVM there as well,
otherwise we would end up with two separate headers carrying RMI
helpers.
Additionally, there are also arch/arm64/include/asm/rsi_cmds.h and
arch/arm64/include/asm/rsi_smc.h to consider.
-aneesh
^ permalink raw reply
* Re: [PATCH v9 12/23] x86/virt/seamldr: Introduce skeleton for TDX module updates
From: Dave Hansen @ 2026-05-18 15:36 UTC (permalink / raw)
To: Chao Gao, kvm, linux-coco, linux-kernel
Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
yilun.xu, xiaoyao.li, yan.y.zhao, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <20260513151045.1420990-13-chao.gao@intel.com>
On 5/13/26 08:09, Chao Gao wrote:
> b) omit touch_nmi_watchdog() and rcu_momentary_eqs(), which exist
> there for debugging and are not strictly needed for this update flow
Could you possibly start a thread with a suggested refactoring of this
code? The use of those helpers is really subtle and it would be great to
put them in one place and document the subtlety so that future users can
leverage the helper.
^ permalink raw reply
* Re: [PATCH v9 09/23] coco/tdx-host: Don't expose P-SEAMLDR information on CPUs with erratum
From: Dave Hansen @ 2026-05-18 15:29 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <agsJpjZU0kbsY3oy@intel.com>
On 5/18/26 05:44, Chao Gao wrote:
> On Fri, May 15, 2026 at 10:26:19AM -0700, Dave Hansen wrote:
>> On 5/13/26 08:09, Chao Gao wrote:
>>> Some TDX-capable CPUs have an erratum, as documented in Intel® Trust
>>> Domain CPU Architectural Extensions (May 2021 edition) Chapter 2.3:
>> 2021, eh?
> The TDX ISA document has not been updated since then; the May 2021
> edition is still the latest revision. See:
>
> https://www.intel.com/content/www/us/en/developer/tools/trust-domain-
> extensions/documentation.html
I think you are saying that the CPUs have an erratum.
That erratum diverges their implementation from the spec: "Intel® Trust
Domain CPU Architectural Extensions (May 2021 edition) Chapter 2.3".
But when you combine those two things in one sentence, it's incredibly
confusing.
The erratum you are talking about is brand new. I just asked for it to
be created in the last month or two. Thus, my confusion when you say
there: "an erratum, as documented in ... May 2021".
Thus, I'm questioning the 2021 date. You probably also want to mention
that the erratum is, as of today, not publicly documented.
Can you rephrase this all and make it clearer, please?
^ permalink raw reply
* Re: [PATCH v9 11/23] x86/virt/seamldr: Allocate and populate a module update request
From: Dave Hansen @ 2026-05-18 15:12 UTC (permalink / raw)
To: Chao Gao
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <agsfCsvcILh3Vg7a@intel.com>
On 5/18/26 07:15, Chao Gao wrote:
>>> +#define TDX_IMAGE_VERSION_2 0x200
>>> +
>>> +struct tdx_image_header {
>>> + u16 version; // This ABI is always 0x200
>>
>> That comment reads strangely in here. Did I ask you to write that?
>
> I copied that from the example you suggested for this structure in v8. But
> yes, it does read awkwardly here, so I will drop it.
The point is that the code or comments needs to mention *somewhere* that
"This (header->version) ABI is always 0x200". I didn't mean for you to
literally put that ^ in there.
It's fine if it is in the code to. Just mention it somewhere.
>>> +static int init_seamldr_params(struct seamldr_params *params, const u8 *data, u32 size)
>>> +{
>>> + const struct tdx_image *image = (const void *)data;
>>> + const struct tdx_image_header *header = &image->header;
>>> +
>>> + u32 sigstruct_len = header->sigstruct_nr_pages * PAGE_SIZE;
>>> + u32 module_len = header->module_nr_pages * PAGE_SIZE;
>>> +
>>> + u8 *header_start = (u8 *)header;
>>> + u8 *header_end = header_start + HEADER_SIZE;
>>> +
>>> + u8 *sigstruct_start = header_end;
>>> + u8 *sigstruct_end = sigstruct_start + sigstruct_len;
>>> +
>>> + u8 *module_start = sigstruct_end;
>>> +
>>> + /* Check the calculated payload size against the data size. */
>>> + if (HEADER_SIZE + sigstruct_len + module_len != size)
>>> + return -EINVAL;
>>> +
>>> + /*
>>> + * Don't care about user passing the wrong file, but protect
>>> + * kernel ABI by preventing accepting garbage.
>>> + */
>>
>> How does this "protect kernel ABI"?
>
> "Protect kernel ABI" was imprecise here. The intent is to reject
> obviously malformed headers.
>
> If the kernel accepts garbage in header fields today, userspace can come
> to rely on that behavior. Later, the kernel may start validating those
> fields more strictly or assign meaning to fields that were previously
> reserved. Rejecting the same image then could be seen as a kernel
> regression.
>
> I'll simplify the comment to:
>
> /* Reject obviously malformed image headers. */
I'm still not following this at all.
I suspect that someone along the way said something in reviewing this
about ensuring that fields that are "reserved" are treated as "reserved
+ must be zero".
Somehow that recommendation got conflated with the version checking.
But the deeper point is that neither this patch nor its contributor is
quite able to articulate the reason for this line of code being here.
Let me try.
The "tdx_image" ABI is versioned. However, there has only ever
been one public versions of the structure: ->version==0x200. The
kernel can only parse that version. Future versions of the
module might be able to use the same ABIs (user/kernel and
kernel/SEAMLDR) but they will not be able to use this kernel
code.
Reject module images without that specific version. This ensures
that the kernel is able to understand the passed-in format.
^ permalink raw reply
* Re: [PATCH v9 11/23] x86/virt/seamldr: Allocate and populate a module update request
From: Chao Gao @ 2026-05-18 14:15 UTC (permalink / raw)
To: Dave Hansen
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <7d7fff5a-53a5-439e-9ff8-dcbd97f473cd@intel.com>
>> +#define TDX_IMAGE_VERSION_2 0x200
>> +
>> +struct tdx_image_header {
>> + u16 version; // This ABI is always 0x200
>
>That comment reads strangely in here. Did I ask you to write that?
I copied that from the example you suggested for this structure in v8. But
yes, it does read awkwardly here, so I will drop it.
>
>> + u16 checksum;
>> + u8 signature[8];
>> + u32 sigstruct_nr_pages;
>> + u32 module_nr_pages;
>> + u8 reserved[4076];
>> +} __packed;
>> +
>> +#define HEADER_SIZE sizeof(struct tdx_image_header)
>> +static_assert(HEADER_SIZE == 4096);
>> +
>> +/* Intel TDX module update ABI structure. aka. "TDX module blob". */
>> +struct tdx_image {
>> + struct tdx_image_header header;
>> + u8 payload[]; // Contains sigstruct pages followed by module pages
>> +};
>> +
>> +static void populate_pa_list(u64 *pa_list, u32 max_entries, const u8 *start, u32 nr_pages)
>
>The naming in there is painful. How about:
>
>populate_pa_list(u64 *pa_list, u32 pa_list_len,
> const u8 *vmalloc_addr, u32 vmalloc_len_pages)
Sure.
>
>> +{
>> + int i;
>> +
>> + nr_pages = MIN(nr_pages, max_entries);
>
>This seems wonky. Should it really be silently suppressing things if
>either the allocation or source is too small? I get not wanting to
>overflow, but this seems strange.
Ok. I'll add explicit bounds checks and drop the MIN().
>
>> + for (i = 0; i < nr_pages; i++) {
>> + pa_list[i] = vmalloc_to_pfn(start) << PAGE_SHIFT;
>> + start += PAGE_SIZE;
>> + }
>
>At the point that you modify 'start', it's not 'start' any more. Use
>another variable. This would do, for instance:
>
> for (i = 0; i < nr_pages; i++) {
> unsigned long offset = i * PAGE_SIZE;
>
> pa_list[i] = vmalloc_to_pfn(&start[offset]);
> }
Good point.
>
>
>> +static void populate_seamldr_params(struct seamldr_params *params,
>> + const u8 *sig, u32 sig_nr_pages,
>> + const u8 *mod, u32 mod_nr_pages)
>> +{
>> + params->version = 0;
>> + params->scenario = SEAMLDR_SCENARIO_UPDATE;
>> + params->module_nr_pages = mod_nr_pages;
>> +
>> + populate_pa_list(params->sigstruct_pages_pa_list, SEAMLDR_MAX_NR_SIG_PAGES,
>> + sig, sig_nr_pages);
>> + populate_pa_list(params->module_pages_pa_list, SEAMLDR_MAX_NR_MODULE_PAGES,
>> + mod, mod_nr_pages);
>> +}
>
>Yes, this is starting to look OK. Nit: vertically align the "*_PAGES" args:
>
>
> populate_pa_list(params->sigstruct_pages_pa_list, SEAMLDR_...,
> sig, sig_nr_pages);
> populate_pa_list(params->module_pages_pa_list, SEAMLDR_...,
> mod, mod_nr_pages);
>
>
>> +static int init_seamldr_params(struct seamldr_params *params, const u8 *data, u32 size)
>> +{
>> + const struct tdx_image *image = (const void *)data;
>> + const struct tdx_image_header *header = &image->header;
>> +
>> + u32 sigstruct_len = header->sigstruct_nr_pages * PAGE_SIZE;
>> + u32 module_len = header->module_nr_pages * PAGE_SIZE;
>> +
>> + u8 *header_start = (u8 *)header;
>> + u8 *header_end = header_start + HEADER_SIZE;
>> +
>> + u8 *sigstruct_start = header_end;
>> + u8 *sigstruct_end = sigstruct_start + sigstruct_len;
>> +
>> + u8 *module_start = sigstruct_end;
>> +
>> + /* Check the calculated payload size against the data size. */
>> + if (HEADER_SIZE + sigstruct_len + module_len != size)
>> + return -EINVAL;
>> +
>> + /*
>> + * Don't care about user passing the wrong file, but protect
>> + * kernel ABI by preventing accepting garbage.
>> + */
>
>How does this "protect kernel ABI"?
"Protect kernel ABI" was imprecise here. The intent is to reject
obviously malformed headers.
If the kernel accepts garbage in header fields today, userspace can come
to rely on that behavior. Later, the kernel may start validating those
fields more strictly or assign meaning to fields that were previously
reserved. Rejecting the same image then could be seen as a kernel
regression.
I'll simplify the comment to:
/* Reject obviously malformed image headers. */
^ permalink raw reply
* Re: [RFC PATCH v4 00/14] coco/TSM: Host-side Arm CCA IDE setup via connect/disconnect callbacks
From: Will Deacon @ 2026-05-18 12:59 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm)
Cc: linux-coco, kvmarm, linux-arm-kernel, linux-kernel,
Alexey Kardashevskiy, Catalin Marinas, Dan Williams,
Jason Gunthorpe, Jonathan Cameron, Marc Zyngier, Samuel Ortiz,
Steven Price, Suzuki K Poulose, Xu Yilun
In-Reply-To: <20260427065121.916615-1-aneesh.kumar@kernel.org>
On Mon, Apr 27, 2026 at 12:21:07PM +0530, Aneesh Kumar K.V (Arm) wrote:
> arch/arm64/include/asm/rmi_cmds.h | 85 +++
> arch/arm64/include/asm/rmi_smc.h | 168 +++++
Curious, but why does this stuff have to live in the arch code? Wouldn't
it be better off somewhere like drivers/firmware/ or
include/linux/arm-rmi.h?
Will
^ permalink raw reply
* Re: [PATCH v9 09/23] coco/tdx-host: Don't expose P-SEAMLDR information on CPUs with erratum
From: Chao Gao @ 2026-05-18 12:44 UTC (permalink / raw)
To: Dave Hansen
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <f1057429-4918-4b02-ae17-9bd6cc38a3c1@intel.com>
On Fri, May 15, 2026 at 10:26:19AM -0700, Dave Hansen wrote:
>On 5/13/26 08:09, Chao Gao wrote:
>> Some TDX-capable CPUs have an erratum, as documented in Intel® Trust
>> Domain CPU Architectural Extensions (May 2021 edition) Chapter 2.3:
>
>2021, eh?
The TDX ISA document has not been updated since then; the May 2021
edition is still the latest revision. See:
https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/documentation.html
^ permalink raw reply
* Re: [PATCH v9 08/23] coco/tdx-host: Expose P-SEAMLDR information via sysfs
From: Chao Gao @ 2026-05-18 12:35 UTC (permalink / raw)
To: Dave Hansen
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <e2ec0efe-2248-4823-ab3f-5d87835db3c5@intel.com>
>> +static umode_t seamldr_group_visible(struct kobject *kobj, struct attribute *attr, int idx)
>> +{
>> + const struct tdx_sys_info *sysinfo = tdx_get_sysinfo();
>> +
>> + if (!sysinfo)
>> + return 0;
>> +
>> + return tdx_supports_runtime_update(sysinfo) ? attr->mode : 0;
>> +}
>> +
>> +static const struct attribute_group seamldr_group = {
>> + .attrs = seamldr_attrs,
>> + .is_visible = seamldr_group_visible,
>> +};
>
>I feel like we need to mention *somewhere* that these are kinda nasty.
>tdx_get_sysinfo() is slow and single-threaded. These very much are and
>need to stay 0400 for good reason.
>
>Talk about the DEVICE_ATTR_ADMIN_RO() choice _somewhere_, please.
I will add a comment to make the DEVICE_ATTR_ADMIN_RO() choice
explicit.
+/*
+ * These attributes are intended for admins managing TDX module updates.
+ * Reading them issues a slow, serialized P-SEAMLDR query, so keep them
+ * admin-only.
+ */
static DEVICE_ATTR_ADMIN_RO(seamldr_version);
static DEVICE_ATTR_ADMIN_RO(num_remaining_updates);
^ permalink raw reply
* Re: [PATCH v9 07/23] x86/virt/seamldr: Add a helper to retrieve P-SEAMLDR information
From: Chao Gao @ 2026-05-18 11:51 UTC (permalink / raw)
To: Dave Hansen
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <e54e9647-18ec-4691-a979-c67798a7d85f@intel.com>
>> Note that there are two distinct P-SEAMLDR APIs with similar names:
>>
>> SEAMLDR.INFO: Returns a SEAMLDR_INFO structure containing SEAMLDR
>> information such as version and remaining updates.
>>
>> SEAMLDR.SEAMINFO: Returns a SEAMLDR_SEAMINFO structure containing SEAM
>> and system information such as Convertible Memory
>> Regions (CMRs) and number of CPUs and sockets.
>>
>> The former is used here.
>This doesn't help.
>
>"SEAMLDR.INFO" is metadata about the loader. It's metadata for the
>update process.
>
>"SEAMLDR.SEAMINFO" is metadata about the module. It is for the module
>init process, not for the update process.
Thanks for rewriting this.
One small nuance is that SEAMLDR.SEAMINFO is really about SEAM mode
rather than the loaded module itself. So I think this is a bit more
accurate:
SEAMLDR.SEAMINFO is metadata about SEAM mode. It is used for module
initialization, not for the update process.
>
>Right? Isn't that a billion times more useful and actually helps
>differentiate them?
>
>Also, more nits: I hate former/latter too. It makes me stop reading and
>have to go back *EVER* *TIME*. I hate that particular english construct.
>It's horrid.
I will avoid former/latter wording in future patches.
^ permalink raw reply
* Re: [PATCH v2 08/15] KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers
From: Huang, Kai @ 2026-05-18 11:31 UTC (permalink / raw)
To: pbonzini@redhat.com, kas@kernel.org, seanjc@google.com,
vkuznets@redhat.com, dwmw2@infradead.org, paul@xen.org
Cc: Edgecombe, Rick P, x86@kernel.org, binbin.wu@linux.intel.com,
dave.hansen@linux.intel.com, linux-kernel@vger.kernel.org,
yosry@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev
In-Reply-To: <20260514215355.1648463-9-seanjc@google.com>
> @@ -10413,29 +10413,30 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
>
> if (!is_64_bit_hypercall(vcpu))
> ret = (u32)ret;
> - kvm_rax_write(vcpu, ret);
> + kvm_rax_write_raw(vcpu, ret);
> return kvm_skip_emulated_instruction(vcpu);
> }
>
Nit: AFAICT if we use kvm_rax_write(vcpu, ret) instead of the "raw" version
here, we can then remove the
if (!is_64_bit_hypercall(vcpu))
ret = (u32)ret;
But I saw in your next patch you are going to remove the non-raw write helpers.
^ permalink raw reply
* Re: [PATCH v9 05/23] coco/tdx-host: Expose TDX module version
From: Chao Gao @ 2026-05-18 11:29 UTC (permalink / raw)
To: Dave Hansen
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
H. Peter Anvin
In-Reply-To: <5e097cb0-362a-4dee-af68-9ce583312c97@intel.com>
On Fri, May 15, 2026 at 09:53:59AM -0700, Dave Hansen wrote:
>On 5/13/26 08:09, Chao Gao wrote:
>> For TDX module updates, userspace needs to select compatible update
>> versions based on the current module version. This design delegates
>> module selection complexity to userspace because TDX module update
>> policies are complex and version series are platform-specific.
>
>I'm not sure exactly what a "version series" is.
By "version series" I meant release lines such as 1.5.x, 2.0.x, and
3.0.x, but that is not clear from the changelog.
>Do you need to say more
>than that the policy is complex?
I will tighten it up and just say that the update policy is complex.
>
>> For example, the 1.5.x series is for certain platform generations, while
>> the 2.0.x series is intended for others. And TDX module 1.5.x may be
>> updated to 1.5.y but not to 1.5.y+1.
>
>That's not much of an example, IMNHO. How about:
>
> For example, the 1.5.x series runs on Sapphire Rapids but not
> Granite Rapids, which needs 2.0.x. Updates are also constrained
> by version distance, so a 1.5.6 module might permit updates to
> 1.5.7 but not to 1.5.20.
Yes, that is much better than my version.
>
>> Expose the TDX module version to userspace via sysfs to aid module
>> selection. Since the TDX faux device will drive module updates, expose
>> the version as its attribute.
>>
>> One bonus of exposing TDX module version via sysfs is: TDX module
>> version information remains available even after dmesg logs are cleared.
>
>I honestly wouldn't even mention this bit. You don't need a bonus.
Sure, I will drop that part.
>
>> +++ b/Documentation/ABI/testing/sysfs-devices-faux-tdx-host
>> @@ -0,0 +1,6 @@
>> +What: /sys/devices/faux/tdx_host/version
>> +Contact: linux-coco@lists.linux.dev
>> +Description: (RO) Report the version of the loaded TDX module. The TDX module
>> + version is formatted as x.y.z, where "x" is the major version,
>> + "y" is the minor version and "z" is the update version. Versions
>> + are used for bug reporting, TDX module updates etc.
>
>The "etc." is silly. Just zap it.
>
>Description: (RO) Report the version of the loaded TDX module.
> Formatted as "major.minor.update". Used by TDX module
> update tooling. Example: "1.2.03"
>
>That's at least a wee bit of warning to folks about the leading 0 so if
>they are parsing it they are a wee bit careful with it.
Thanks, this wording is much better and more concise.
^ permalink raw reply
* Re: [PATCH v9 04/23] coco/tdx-host: Introduce a "tdx_host" device
From: Chao Gao @ 2026-05-18 11:18 UTC (permalink / raw)
To: Dave Hansen
Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
ira.weiny, kai.huang, kas, nik.borisov, paulmck, pbonzini,
reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
Dan Williams, Jonathan Cameron, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, x86, H. Peter Anvin
In-Reply-To: <cc35a467-e978-4e3b-a6ba-17a8bde69bd7@intel.com>
On Fri, May 15, 2026 at 09:21:36AM -0700, Dave Hansen wrote:
>On 5/13/26 08:09, Chao Gao wrote:
>> Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com>
>> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>
>This SoB chain at _least_ needs a note. It looks quite bizarre.
I will add a note.
This patch was originally written by me and then substantially revised
by Yilun, hence his Co-developed-by and Signed-off-by.
Then Dan made additional cleanups on top of Yilun's version and was the first
to post it at:
https://lore.kernel.org/all/20250919142237.418648-2-dan.j.williams@intel.com/
The current version is based on that posted patch, which is why the SoB
chain is unusual.
>
>> +config TDX_HOST_SERVICES
>> + tristate "TDX Host Services Driver"
>> + depends on INTEL_TDX_HOST
>> + default m
>> + help
>> + Enable access to TDX host services like module update and
>> + extensions (e.g. TDX Connect).
>> +
>> + Say y or m if enabling support for confidential virtual machine
>> + support (CONFIG_INTEL_TDX_HOST). The module is called tdx_host.ko.
>
>In what world will anyone ever set INTEL_TDX_HOST=y, but turn this off?
No, I do not think disabling TDX_HOST_SERVICES while INTEL_TDX_HOST=y makes
sense.
>Is this even worth a Kconfig prompt?
>
>I guess we need it for the module or built in choice. But otherwise it
>seems a bit silly.
Yes. it is for the module vs built-in choice.
^ permalink raw reply
* Re: [PATCH v4 07/13] dma-direct: make dma_direct_map_phys() honor DMA_ATTR_CC_SHARED
From: Christian Borntraeger @ 2026-05-18 10:04 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
linux-coco
Cc: Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
Jason Gunthorpe, Mostafa Saleh, Petr Tesarik,
Alexey Kardashevskiy, Dan Williams, Xu Yilun, linuxppc-dev,
linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik, Sven Schnelle,
x86, Halil Pasic, Matthew Rosato, Jaehoon Kim
In-Reply-To: <20260512090408.794195-8-aneesh.kumar@kernel.org>
cc Halil, Matt, Jaehoon.
Can you have a look what this means for virtio on secure execution?
Am 12.05.26 um 11:04 schrieb Aneesh Kumar K.V (Arm):
> Teach dma_direct_map_phys() to select the DMA address encoding based on
> DMA_ATTR_CC_SHARED.
>
> Use phys_to_dma_unencrypted() for decrypted mappings and
> phys_to_dma_encrypted() otherwise. If a device requires unencrypted DMA
> but the source physical address is still encrypted, force the mapping
> through swiotlb so the DMA address and backing memory attributes remain
> consistent.
>
> Update the arm64, x86, s390 and powerpc secure-guest setup to not use
> swiotlb force option
>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
> Changes from v3:
> * Handle DMA_ATTR_MMIO
> ---
> arch/arm64/mm/init.c | 4 +--
> arch/powerpc/platforms/pseries/svm.c | 2 +-
> arch/s390/mm/init.c | 2 +-
> arch/x86/kernel/pci-dma.c | 4 +--
> kernel/dma/direct.c | 4 ++-
> kernel/dma/direct.h | 38 +++++++++++++---------------
> 6 files changed, 24 insertions(+), 30 deletions(-)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 97987f850a33..acf67c7064db 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -338,10 +338,8 @@ void __init arch_mm_preinit(void)
> unsigned int flags = SWIOTLB_VERBOSE;
> bool swiotlb = max_pfn > PFN_DOWN(arm64_dma_phys_limit);
>
> - if (is_realm_world()) {
> + if (is_realm_world())
> swiotlb = true;
> - flags |= SWIOTLB_FORCE;
> - }
>
> if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) && !swiotlb) {
> /*
> diff --git a/arch/powerpc/platforms/pseries/svm.c b/arch/powerpc/platforms/pseries/svm.c
> index 384c9dc1899a..7a403dbd35ee 100644
> --- a/arch/powerpc/platforms/pseries/svm.c
> +++ b/arch/powerpc/platforms/pseries/svm.c
> @@ -29,7 +29,7 @@ static int __init init_svm(void)
> * need to use the SWIOTLB buffer for DMA even if dma_capable() says
> * otherwise.
> */
> - ppc_swiotlb_flags |= SWIOTLB_ANY | SWIOTLB_FORCE;
> + ppc_swiotlb_flags |= SWIOTLB_ANY;
>
> /* Share the SWIOTLB buffer with the host. */
> swiotlb_update_mem_attributes();
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 1f72efc2a579..843dbd445124 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -149,7 +149,7 @@ static void __init pv_init(void)
> virtio_set_mem_acc_cb(virtio_require_restricted_mem_acc);
>
> /* make sure bounce buffers are shared */
> - swiotlb_init(true, SWIOTLB_FORCE | SWIOTLB_VERBOSE);
> + swiotlb_init(true, SWIOTLB_VERBOSE);
> swiotlb_update_mem_attributes();
> }
>
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index 6267363e0189..75cf8f6ae8cd 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -59,10 +59,8 @@ static void __init pci_swiotlb_detect(void)
> * bounce buffers as the hypervisor can't access arbitrary VM memory
> * that is not explicitly shared with it.
> */
> - if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
> + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> x86_swiotlb_enable = true;
> - x86_swiotlb_flags |= SWIOTLB_FORCE;
> - }
> }
> #else
> static inline void __init pci_swiotlb_detect(void)
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index ac315dd046c4..5aaa813c5509 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -691,8 +691,10 @@ size_t dma_direct_max_mapping_size(struct device *dev)
> {
> /* If SWIOTLB is active, use its maximum mapping size */
> if (is_swiotlb_active(dev) &&
> - (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev)))
> + (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev) ||
> + force_dma_unencrypted(dev)))
> return swiotlb_max_mapping_size(dev);
> +
> return SIZE_MAX;
> }
>
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index e05dc7649366..4e35264ab6f8 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -89,36 +89,32 @@ static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> dma_addr_t dma_addr;
>
> if (is_swiotlb_force_bounce(dev)) {
> - if (!(attrs & DMA_ATTR_CC_SHARED)) {
> - if (attrs & (DMA_ATTR_MMIO | DMA_ATTR_REQUIRE_COHERENT))
> - return DMA_MAPPING_ERROR;
> + if (attrs & (DMA_ATTR_MMIO | DMA_ATTR_REQUIRE_COHERENT))
> + return DMA_MAPPING_ERROR;
>
> - return swiotlb_map(dev, phys, size, dir, attrs);
> - }
> - } else if (attrs & DMA_ATTR_CC_SHARED) {
> - return DMA_MAPPING_ERROR;
> + return swiotlb_map(dev, phys, size, dir, attrs);
> }
>
> - if (attrs & DMA_ATTR_MMIO) {
> - dma_addr = phys;
> - if (unlikely(!dma_capable(dev, dma_addr, size, false, attrs)))
> - goto err_overflow;
> - } else if (attrs & DMA_ATTR_CC_SHARED) {
> + if (attrs & DMA_ATTR_CC_SHARED)
> dma_addr = phys_to_dma_unencrypted(dev, phys);
> + else
> + dma_addr = phys_to_dma_encrypted(dev, phys);
> +
> + if (attrs & DMA_ATTR_MMIO) {
> if (unlikely(!dma_capable(dev, dma_addr, size, false, attrs)))
> goto err_overflow;
> - } else {
> - dma_addr = phys_to_dma(dev, phys);
> - if (unlikely(!dma_capable(dev, dma_addr, size, true, attrs)) ||
> - dma_kmalloc_needs_bounce(dev, size, dir)) {
> - if (is_swiotlb_active(dev) &&
> - !(attrs & DMA_ATTR_REQUIRE_COHERENT))
> - return swiotlb_map(dev, phys, size, dir, attrs);
> + goto dma_mapped;
> + }
>
> - goto err_overflow;
> - }
> + if (unlikely(!dma_capable(dev, dma_addr, size, true, attrs)) ||
> + dma_kmalloc_needs_bounce(dev, size, dir)) {
> + if (is_swiotlb_active(dev) &&
> + !(attrs & DMA_ATTR_REQUIRE_COHERENT))
> + return swiotlb_map(dev, phys, size, dir, attrs);
> + goto err_overflow;
> }
>
> +dma_mapped:
> if (!dev_is_dma_coherent(dev) &&
> !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO))) {
> arch_sync_dma_for_device(phys, size, dir);
^ permalink raw reply
* Re: [PATCH v2 03/15] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
From: Binbin Wu @ 2026-05-18 9:55 UTC (permalink / raw)
To: David Woodhouse
Cc: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
Kiryl Shutsemau, Paul Durrant, Dave Hansen, Rick Edgecombe, kvm,
x86, linux-coco, linux-kernel, Yosry Ahmed, Kai Huang
In-Reply-To: <d7af5a04a11432ea1878431e8b51061f19c16061.camel@infradead.org>
On 5/18/2026 5:50 PM, David Woodhouse wrote:
> On Mon, 2026-05-18 at 17:43 +0800, Binbin Wu wrote:
>>
>>
>> On 5/18/2026 3:15 PM, David Woodhouse wrote:
>>> On Mon, 2026-05-18 at 10:19 +0800, Binbin Wu wrote:
>>>>
>>>>>>> longmode = is_64_bit_hypercall(vcpu);
>>>>>>
>>>>>> Is the variable name misleading?
>>>>>
>>>>> It most definitely is. However, @longmode is passed around quite a few locations
>>>>> in xen.c, and so I don't want to opportunistically fix this one variable. Though
>>>>> I'm definitely not opposed to a separate patch to rename them all to is_64bit or
>>>>> something.
>>>>
>>>> OK, I can do it.
>>>
>>> This one (as shown above) is clearly indicating whether this particular
>>> vCPU is in 64-bit mode for this particular hypercall. Changing that to
>>> is_64bit makes sense.
>>>
>>> However, there is a separate overall mode for the VM, which is stored
>>> in 'kvm->arch.xen.long_mode' and accessed by userspace using the
>>> KVM_XEN_ATTR_TYPE_LONG_MODE attribute. It affects the datatypes used by
>>> shared memory data structures, and is also latched by the kernel when
>>> the guest writes the MSR for the hypercall page. That one should
>>> probably keep its name.
>>
>> For this one, I think the current KVM code is consistent.
>> The format is determined by EFER.LMA, whether the guest is running in 64 bit or
>> compatible mode doesn't change the ABI.
>
> Agreed. For the hypercall case you're looking at, switching the name to
> is_64bit makes sense.
>
>> struct compat_shared_info is used only when the guest is running natively in a
>> 32-bit build.
>
> The struct compat_shared_info is also used in !kvm->arch.xen.long_mode
> on a 64-bit host, as that's what means the guest is considered to be a
> 32-bit guest.
>
> It's somewhat orthogonal from whether any given vCPU is making any
> given hypercall while in 64-bit mode. The 'long_mode' is *latched* at
> certain specific times which are defined by Xen's historical behaviour.
>
> I'm suggesting that you clean up longmode→is_64bit for the *hypercalls*
> but leave 'long_mode' as is.
>
Yes, will only do it for is_64_bit_hypercall().
>
^ permalink raw reply
* Re: [PATCH v2 03/15] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
From: David Woodhouse @ 2026-05-18 9:50 UTC (permalink / raw)
To: Binbin Wu, Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, Paul Durrant,
Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
Yosry Ahmed, Kai Huang
In-Reply-To: <868898e6-1c57-498d-848d-1df9cbeac23c@linux.intel.com>
[-- Attachment #1: Type: text/plain, Size: 2088 bytes --]
On Mon, 2026-05-18 at 17:43 +0800, Binbin Wu wrote:
>
>
> On 5/18/2026 3:15 PM, David Woodhouse wrote:
> > On Mon, 2026-05-18 at 10:19 +0800, Binbin Wu wrote:
> > >
> > > > > > longmode = is_64_bit_hypercall(vcpu);
> > > > >
> > > > > Is the variable name misleading?
> > > >
> > > > It most definitely is. However, @longmode is passed around quite a few locations
> > > > in xen.c, and so I don't want to opportunistically fix this one variable. Though
> > > > I'm definitely not opposed to a separate patch to rename them all to is_64bit or
> > > > something.
> > >
> > > OK, I can do it.
> >
> > This one (as shown above) is clearly indicating whether this particular
> > vCPU is in 64-bit mode for this particular hypercall. Changing that to
> > is_64bit makes sense.
> >
> > However, there is a separate overall mode for the VM, which is stored
> > in 'kvm->arch.xen.long_mode' and accessed by userspace using the
> > KVM_XEN_ATTR_TYPE_LONG_MODE attribute. It affects the datatypes used by
> > shared memory data structures, and is also latched by the kernel when
> > the guest writes the MSR for the hypercall page. That one should
> > probably keep its name.
>
> For this one, I think the current KVM code is consistent.
> The format is determined by EFER.LMA, whether the guest is running in 64 bit or
> compatible mode doesn't change the ABI.
Agreed. For the hypercall case you're looking at, switching the name to
is_64bit makes sense.
> struct compat_shared_info is used only when the guest is running natively in a
> 32-bit build.
The struct compat_shared_info is also used in !kvm->arch.xen.long_mode
on a 64-bit host, as that's what means the guest is considered to be a
32-bit guest.
It's somewhat orthogonal from whether any given vCPU is making any
given hypercall while in 64-bit mode. The 'long_mode' is *latched* at
certain specific times which are defined by Xen's historical behaviour.
I'm suggesting that you clean up longmode→is_64bit for the *hypercalls*
but leave 'long_mode' as is.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH v2 03/15] KVM: x86/xen: Don't truncate RAX when handling hypercall from protected guest
From: Binbin Wu @ 2026-05-18 9:43 UTC (permalink / raw)
To: David Woodhouse, Sean Christopherson
Cc: Paolo Bonzini, Vitaly Kuznetsov, Kiryl Shutsemau, Paul Durrant,
Dave Hansen, Rick Edgecombe, kvm, x86, linux-coco, linux-kernel,
Yosry Ahmed, Kai Huang
In-Reply-To: <ad4934f4a663c448924aeabe6418e68ae3ef9813.camel@infradead.org>
On 5/18/2026 3:15 PM, David Woodhouse wrote:
> On Mon, 2026-05-18 at 10:19 +0800, Binbin Wu wrote:
>>
>>>>> longmode = is_64_bit_hypercall(vcpu);
>>>>
>>>> Is the variable name misleading?
>>>
>>> It most definitely is. However, @longmode is passed around quite a few locations
>>> in xen.c, and so I don't want to opportunistically fix this one variable. Though
>>> I'm definitely not opposed to a separate patch to rename them all to is_64bit or
>>> something.
>>
>> OK, I can do it.
>
> This one (as shown above) is clearly indicating whether this particular
> vCPU is in 64-bit mode for this particular hypercall. Changing that to
> is_64bit makes sense.
>
> However, there is a separate overall mode for the VM, which is stored
> in 'kvm->arch.xen.long_mode' and accessed by userspace using the
> KVM_XEN_ATTR_TYPE_LONG_MODE attribute. It affects the datatypes used by
> shared memory data structures, and is also latched by the kernel when
> the guest writes the MSR for the hypercall page. That one should
> probably keep its name.
For this one, I think the current KVM code is consistent.
The format is determined by EFER.LMA, whether the guest is running in 64 bit or
compatible mode doesn't change the ABI.
struct compat_shared_info is used only when the guest is running natively in a
32-bit build.
>
>
^ permalink raw reply
* Re: [PATCH v4 00/13] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths
From: Jiri Pirko @ 2026-05-18 8:34 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <yq5amrxxxh0h.fsf@kernel.org>
Mon, May 18, 2026 at 10:23:58AM +0200, aneesh.kumar@kernel.org wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>> Tue, May 12, 2026 at 11:03:55AM +0200, aneesh.kumar@kernel.org wrote:
>>>This series propagates DMA_ATTR_CC_SHARED through the dma-direct,
>>>dma-pool, and swiotlb paths so that encrypted and decrypted DMA buffers
>>>are handled consistently.
>>>
>>>Today, the direct DMA path mostly relies on force_dma_unencrypted() for
>>>shared/decrypted buffer handling. This series consolidates the
>>>force_dma_unencrypted() checks in the top-level functions and ensures
>>>that the remaining DMA interfaces use DMA attributes to make the correct
>>>decisions.
>>
>> FWIW, the patchset in general looks good to me. I tested this with my
>> system_cc_shared dmabuf flow, works flawlessly.
>>
>> Thanks!
>>
>Thanks, Can I add
>
>Tested-by: Jiri Pirko <jiri@resnulli.us>
Tested-by: Jiri Pirko <jiri@nvidia.com>
Thanks.
^ permalink raw reply
* Re: [PATCH v4 03/13] dma-pool: track decrypted atomic pools and select them via attrs
From: Aneesh Kumar K.V @ 2026-05-18 8:32 UTC (permalink / raw)
To: Alexey Kardashevskiy, iommu, linux-arm-kernel, linux-kernel,
linux-coco
Cc: Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
Jason Gunthorpe, Mostafa Saleh, Petr Tesarik, Dan Williams,
Xu Yilun, linuxppc-dev, linux-s390, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <df4b78cf-6192-4fe3-8ad5-df9e6fdda8e6@amd.com>
Alexey Kardashevskiy <aik@amd.com> writes:
> On 16/5/26 22:53, Alexey Kardashevskiy wrote:
>> On 12/5/26 19:03, Aneesh Kumar K.V (Arm) wrote:
...
>>> -static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>>> +static int atomic_pool_expand(struct dma_gen_pool *dma_pool, size_t pool_size,
>>> gfp_t gfp)
>>> {
>>> unsigned int order;
>>> @@ -113,12 +119,15 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>>> * Memory in the atomic DMA pools must be unencrypted, the pools do not
>>> * shrink so no re-encryption occurs in dma_direct_free().
>>> */
>>> - ret = set_memory_decrypted((unsigned long)page_to_virt(page),
>>> + if (dma_pool->unencrypted) {
>>> + ret = set_memory_decrypted((unsigned long)page_to_virt(page),
>>> 1 << order);
>>> - if (ret)
>>> - goto remove_mapping;
>>> - ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
>>> - pool_size, NUMA_NO_NODE);
>>> + if (ret)
>>> + goto remove_mapping;
>>> + }
>>> +
>>> + ret = gen_pool_add_virt(dma_pool->pool, (unsigned long)addr,
>>> + page_to_phys(page), pool_size, NUMA_NO_NODE);
>
>
> This clause could go to the else branch.
>
>
Can you clarify this better?
>>> if (ret)
>>> goto encrypt_mapping;
>>> @@ -126,11 +135,15 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>>> return 0;
>>> encrypt_mapping:
>>> - ret = set_memory_encrypted((unsigned long)page_to_virt(page),
>>> - 1 << order);
>>> - if (WARN_ON_ONCE(ret)) {
>>> - /* Decrypt succeeded but encrypt failed, purposely leak */
>>> - goto out;
>>> + if (dma_pool->unencrypted) {
>>> + int rc;
>>> +
>>> + rc = set_memory_encrypted((unsigned long)page_to_virt(page),
>>> + 1 << order);
>>> + if (WARN_ON_ONCE(rc)) {
>>> + /* Decrypt succeeded but encrypt failed, purposely leak */
>>> + goto out;
>>> + }
>>> }
>>> remove_mapping:
>>> #ifdef CONFIG_DMA_DIRECT_REMAP
>>> @@ -142,46 +155,52 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>>> return ret;
>>> }
...
>>> bool dma_free_from_pool(struct device *dev, void *start, size_t size)
>>> {
>>> - struct gen_pool *pool = NULL;
>>> + struct dma_gen_pool *dma_pool = NULL;
>>> +
>>> + while ((dma_pool = dma_guess_pool(dma_pool, 0))) {
>>> - while ((pool = dma_guess_pool(pool, 0))) {
>>> - if (!gen_pool_has_addr(pool, (unsigned long)start, size))
>>> + if (!gen_pool_has_addr(dma_pool->pool, (unsigned long)start, size))
>>
>>
>> v3 of this just crashed here with dma_pool!=NULL but dma_pool->pool==NULL. continuing debugging... Thanks,
>
>
> dma_direct_free:
> dma_free_from_pool (loop over pools) -> false
> [here was a crash which I fixed by "if (!dma_pool->pool) continue"]
> swiotlb_find_pool (loop again) -> false
> __dma_direct_free_pages
> swiotlb_free
> swiotlb_find_pool (loop again)
> dma_free_contiguous => done.
>
> so that works but kinda hard to follow and there is some room for
> optimization. I do not normally have swiottlb when I test this and
> there is too many of this swiotlb stuff on the real direct dma mapping
> path imho. Thanks,
>
I will work on this in the next update. I can possibly drop the
swiotlb_find_pool from the swiotlb_free() path.
>>
>>
>>> continue;
>>> - gen_pool_free(pool, (unsigned long)start, size);
>>> +
>>> + gen_pool_free(dma_pool->pool, (unsigned long)start, size);
>>> return true;
>>> }
>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>> index 1abd3e6146f4..ab4eccbaa076 100644
>>> --- a/kernel/dma/swiotlb.c
>>> +++ b/kernel/dma/swiotlb.c
>>> @@ -612,6 +612,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>>> u64 phys_limit, gfp_t gfp)
>>> {
>>> struct page *page;
>>> + unsigned long attrs = 0;
>>> /*
>>> * Allocate from the atomic pools if memory is encrypted and
>>> @@ -623,8 +624,12 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>>> if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
>>> return NULL;
>>> + /* swiotlb considered decrypted by default */
>>> + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>>> + attrs = DMA_ATTR_CC_SHARED;
>>> +
>>> return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
>>> - dma_coherent_ok);
>>> + attrs, dma_coherent_ok);
>>> }
>>> gfp &= ~GFP_ZONEMASK;
>>
>
> --
> Alexey
-aneesh
^ permalink raw reply
* Re: [PATCH v4 00/13] dma-mapping: Use DMA_ATTR_CC_SHARED through direct, pool and swiotlb paths
From: Aneesh Kumar K.V @ 2026-05-18 8:23 UTC (permalink / raw)
To: Jiri Pirko
Cc: iommu, linux-arm-kernel, linux-kernel, linux-coco, Robin Murphy,
Marek Szyprowski, Will Deacon, Marc Zyngier, Steven Price,
Suzuki K Poulose, Catalin Marinas, Jason Gunthorpe, Mostafa Saleh,
Petr Tesarik, Alexey Kardashevskiy, Dan Williams, Xu Yilun,
linuxppc-dev, linux-s390, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy (CS GROUP), Alexander Gordeev,
Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <agldl4XQLKAtyLty@FV6GYCPJ69>
Jiri Pirko <jiri@resnulli.us> writes:
> Tue, May 12, 2026 at 11:03:55AM +0200, aneesh.kumar@kernel.org wrote:
>>This series propagates DMA_ATTR_CC_SHARED through the dma-direct,
>>dma-pool, and swiotlb paths so that encrypted and decrypted DMA buffers
>>are handled consistently.
>>
>>Today, the direct DMA path mostly relies on force_dma_unencrypted() for
>>shared/decrypted buffer handling. This series consolidates the
>>force_dma_unencrypted() checks in the top-level functions and ensures
>>that the remaining DMA interfaces use DMA attributes to make the correct
>>decisions.
>
> FWIW, the patchset in general looks good to me. I tested this with my
> system_cc_shared dmabuf flow, works flawlessly.
>
> Thanks!
>
Thanks, Can I add
Tested-by: Jiri Pirko <jiri@resnulli.us>
-aneesh
^ permalink raw reply
* Re: [PATCH v4 03/13] dma-pool: track decrypted atomic pools and select them via attrs
From: Alexey Kardashevskiy @ 2026-05-18 8:19 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
linux-coco
Cc: Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
Jason Gunthorpe, Mostafa Saleh, Petr Tesarik, Dan Williams,
Xu Yilun, linuxppc-dev, linux-s390, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <6f01978e-ead9-409b-866d-69231dc44d00@amd.com>
On 16/5/26 22:53, Alexey Kardashevskiy wrote:
> On 12/5/26 19:03, Aneesh Kumar K.V (Arm) wrote:
>> Teach the atomic DMA pool code to distinguish between encrypted and
>> decrypted pools, and make pool allocation select the matching pool based
>> on DMA attributes.
>>
>> Introduce a dma_gen_pool wrapper that records whether a pool is
>> decrypted, initialize that state when the atomic pools are created, and
>> use it when expanding and resizing the pools. Update dma_alloc_from_pool()
>> to take attrs and skip pools whose encrypted/decrypted state does not
>> match DMA_ATTR_CC_SHARED. Update dma_free_from_pool() accordingly.
>>
>> Also pass DMA_ATTR_CC_SHARED from the swiotlb atomic allocation path
>> so decrypted swiotlb allocations are taken from the correct atomic pool.
>>
>> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
>> ---
>> drivers/iommu/dma-iommu.c | 2 +-
>> include/linux/dma-map-ops.h | 2 +-
>> kernel/dma/direct.c | 11 ++-
>> kernel/dma/pool.c | 163 +++++++++++++++++++++++-------------
>> kernel/dma/swiotlb.c | 7 +-
>> 5 files changed, 122 insertions(+), 63 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 54d96e847f16..c2595bee3d41 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -1673,7 +1673,7 @@ void *iommu_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
>> if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
>> !gfpflags_allow_blocking(gfp) && !coherent)
>> page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr,
>> - gfp, NULL);
>> + gfp, attrs, NULL);
>> else
>> cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
>> if (!cpu_addr)
>> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
>> index 6a1832a73cad..696b2c3a2305 100644
>> --- a/include/linux/dma-map-ops.h
>> +++ b/include/linux/dma-map-ops.h
>> @@ -212,7 +212,7 @@ void *dma_common_pages_remap(struct page **pages, size_t size, pgprot_t prot,
>> void dma_common_free_remap(void *cpu_addr, size_t size);
>> struct page *dma_alloc_from_pool(struct device *dev, size_t size,
>> - void **cpu_addr, gfp_t flags,
>> + void **cpu_addr, gfp_t flags, unsigned long attrs,
>> bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t));
>> bool dma_free_from_pool(struct device *dev, void *start, size_t size);
>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> index 0c2e1f8436ce..dc2907439b3d 100644
>> --- a/kernel/dma/direct.c
>> +++ b/kernel/dma/direct.c
>> @@ -162,7 +162,7 @@ static bool dma_direct_use_pool(struct device *dev, gfp_t gfp)
>> }
>> static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
>> - dma_addr_t *dma_handle, gfp_t gfp)
>> + dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
>> {
>> struct page *page;
>> u64 phys_limit;
>> @@ -172,7 +172,8 @@ static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
>> return NULL;
>> gfp |= dma_direct_optimal_gfp_mask(dev, &phys_limit);
>> - page = dma_alloc_from_pool(dev, size, &ret, gfp, dma_coherent_ok);
>> + page = dma_alloc_from_pool(dev, size, &ret, gfp, attrs,
>> + dma_coherent_ok);
>> if (!page)
>> return NULL;
>> *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
>> @@ -261,7 +262,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>> */
>> if ((remap || (attrs & DMA_ATTR_CC_SHARED)) &&
>> dma_direct_use_pool(dev, gfp))
>> - return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
>> + return dma_direct_alloc_from_pool(dev, size, dma_handle,
>> + gfp, attrs);
>> if (is_swiotlb_for_alloc(dev)) {
>> page = dma_direct_alloc_swiotlb(dev, size);
>> @@ -397,7 +399,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
>> attrs |= DMA_ATTR_CC_SHARED;
>> if ((attrs & DMA_ATTR_CC_SHARED) && dma_direct_use_pool(dev, gfp))
>> - return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
>> + return dma_direct_alloc_from_pool(dev, size, dma_handle,
>> + gfp, attrs);
>> if (is_swiotlb_for_alloc(dev)) {
>> page = dma_direct_alloc_swiotlb(dev, size);
>> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
>> index 2b2fbb709242..75f0eba48a23 100644
>> --- a/kernel/dma/pool.c
>> +++ b/kernel/dma/pool.c
>> @@ -12,12 +12,18 @@
>> #include <linux/set_memory.h>
>> #include <linux/slab.h>
>> #include <linux/workqueue.h>
>> +#include <linux/cc_platform.h>
>> -static struct gen_pool *atomic_pool_dma __ro_after_init;
>> +struct dma_gen_pool {
>> + bool unencrypted;
>> + struct gen_pool *pool;
>> +};
>> +
>> +static struct dma_gen_pool atomic_pool_dma __ro_after_init;
>> static unsigned long pool_size_dma;
>> -static struct gen_pool *atomic_pool_dma32 __ro_after_init;
>> +static struct dma_gen_pool atomic_pool_dma32 __ro_after_init;
>> static unsigned long pool_size_dma32;
>> -static struct gen_pool *atomic_pool_kernel __ro_after_init;
>> +static struct dma_gen_pool atomic_pool_kernel __ro_after_init;
>> static unsigned long pool_size_kernel;
>> /* Size can be defined by the coherent_pool command line */
>> @@ -76,7 +82,7 @@ static bool cma_in_zone(gfp_t gfp)
>> return true;
>> }
>> -static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>> +static int atomic_pool_expand(struct dma_gen_pool *dma_pool, size_t pool_size,
>> gfp_t gfp)
>> {
>> unsigned int order;
>> @@ -113,12 +119,15 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>> * Memory in the atomic DMA pools must be unencrypted, the pools do not
>> * shrink so no re-encryption occurs in dma_direct_free().
>> */
>> - ret = set_memory_decrypted((unsigned long)page_to_virt(page),
>> + if (dma_pool->unencrypted) {
>> + ret = set_memory_decrypted((unsigned long)page_to_virt(page),
>> 1 << order);
>> - if (ret)
>> - goto remove_mapping;
>> - ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
>> - pool_size, NUMA_NO_NODE);
>> + if (ret)
>> + goto remove_mapping;
>> + }
>> +
>> + ret = gen_pool_add_virt(dma_pool->pool, (unsigned long)addr,
>> + page_to_phys(page), pool_size, NUMA_NO_NODE);
This clause could go to the else branch.
>> if (ret)
>> goto encrypt_mapping;
>> @@ -126,11 +135,15 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>> return 0;
>> encrypt_mapping:
>> - ret = set_memory_encrypted((unsigned long)page_to_virt(page),
>> - 1 << order);
>> - if (WARN_ON_ONCE(ret)) {
>> - /* Decrypt succeeded but encrypt failed, purposely leak */
>> - goto out;
>> + if (dma_pool->unencrypted) {
>> + int rc;
>> +
>> + rc = set_memory_encrypted((unsigned long)page_to_virt(page),
>> + 1 << order);
>> + if (WARN_ON_ONCE(rc)) {
>> + /* Decrypt succeeded but encrypt failed, purposely leak */
>> + goto out;
>> + }
>> }
>> remove_mapping:
>> #ifdef CONFIG_DMA_DIRECT_REMAP
>> @@ -142,46 +155,52 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
>> return ret;
>> }
>> -static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp)
>> +static void atomic_pool_resize(struct dma_gen_pool *dma_pool, gfp_t gfp)
>> {
>> - if (pool && gen_pool_avail(pool) < atomic_pool_size)
>> - atomic_pool_expand(pool, gen_pool_size(pool), gfp);
>> + if (dma_pool->pool && gen_pool_avail(dma_pool->pool) < atomic_pool_size)
>> + atomic_pool_expand(dma_pool, gen_pool_size(dma_pool->pool), gfp);
>> }
>> static void atomic_pool_work_fn(struct work_struct *work)
>> {
>> if (IS_ENABLED(CONFIG_ZONE_DMA))
>> - atomic_pool_resize(atomic_pool_dma,
>> + atomic_pool_resize(&atomic_pool_dma,
>> GFP_KERNEL | GFP_DMA);
>> if (IS_ENABLED(CONFIG_ZONE_DMA32))
>> - atomic_pool_resize(atomic_pool_dma32,
>> + atomic_pool_resize(&atomic_pool_dma32,
>> GFP_KERNEL | GFP_DMA32);
>> - atomic_pool_resize(atomic_pool_kernel, GFP_KERNEL);
>> + atomic_pool_resize(&atomic_pool_kernel, GFP_KERNEL);
>> }
>> -static __init struct gen_pool *__dma_atomic_pool_init(size_t pool_size,
>> - gfp_t gfp)
>> +static __init struct dma_gen_pool *__dma_atomic_pool_init(struct dma_gen_pool *dma_pool,
>> + size_t pool_size, gfp_t gfp)
>> {
>> - struct gen_pool *pool;
>> int ret;
>> - pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
>> - if (!pool)
>> + dma_pool->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
>> + if (!dma_pool->pool)
>> return NULL;
>> - gen_pool_set_algo(pool, gen_pool_first_fit_order_align, NULL);
>> + gen_pool_set_algo(dma_pool->pool, gen_pool_first_fit_order_align, NULL);
>> +
>> + /* if platform is using memory encryption atomic pools are by default decrypted. */
>> + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>> + dma_pool->unencrypted = true;
>> + else
>> + dma_pool->unencrypted = false;
>> - ret = atomic_pool_expand(pool, pool_size, gfp);
>> + ret = atomic_pool_expand(dma_pool, pool_size, gfp);
>> if (ret) {
>> - gen_pool_destroy(pool);
>> + gen_pool_destroy(dma_pool->pool);
>> + dma_pool->pool = NULL;
>> pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n",
>> pool_size >> 10, &gfp);
>> return NULL;
>> }
>> pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n",
>> - gen_pool_size(pool) >> 10, &gfp);
>> - return pool;
>> + gen_pool_size(dma_pool->pool) >> 10, &gfp);
>> + return dma_pool;
>> }
>> #ifdef CONFIG_ZONE_DMA32
>> @@ -207,21 +226,22 @@ static int __init dma_atomic_pool_init(void)
>> /* All memory might be in the DMA zone(s) to begin with */
>> if (has_managed_zone(ZONE_NORMAL)) {
>> - atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
>> - GFP_KERNEL);
>> - if (!atomic_pool_kernel)
>> + __dma_atomic_pool_init(&atomic_pool_kernel, atomic_pool_size, GFP_KERNEL);
>> + if (!atomic_pool_kernel.pool)
>> ret = -ENOMEM;
>> }
>> +
>> if (has_managed_dma()) {
>> - atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
>> - GFP_KERNEL | GFP_DMA);
>> - if (!atomic_pool_dma)
>> + __dma_atomic_pool_init(&atomic_pool_dma, atomic_pool_size,
>> + GFP_KERNEL | GFP_DMA);
>> + if (!atomic_pool_dma.pool)
>> ret = -ENOMEM;
>> }
>> +
>> if (has_managed_dma32) {
>> - atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
>> - GFP_KERNEL | GFP_DMA32);
>> - if (!atomic_pool_dma32)
>> + __dma_atomic_pool_init(&atomic_pool_dma32, atomic_pool_size,
>> + GFP_KERNEL | GFP_DMA32);
>> + if (!atomic_pool_dma32.pool)
>> ret = -ENOMEM;
>> }
>> @@ -230,19 +250,44 @@ static int __init dma_atomic_pool_init(void)
>> }
>> postcore_initcall(dma_atomic_pool_init);
>> -static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
>> +static inline struct dma_gen_pool *__dma_guess_pool(struct dma_gen_pool *first,
>> + struct dma_gen_pool *second, struct dma_gen_pool *third)
>> +{
>> + if (first->pool)
>> + return first;
>> + if (second && second->pool)
>> + return second;
>> + if (third && third->pool)
>> + return third;
>> + return NULL;
>> +}
>> +
>> +static inline struct dma_gen_pool *dma_guess_pool(struct dma_gen_pool *prev,
>> + gfp_t gfp)
>> {
>> - if (prev == NULL) {
>> + if (!prev) {
>> if (gfp & GFP_DMA)
>> - return atomic_pool_dma ?: atomic_pool_dma32 ?: atomic_pool_kernel;
>> + return __dma_guess_pool(&atomic_pool_dma,
>> + &atomic_pool_dma32,
>> + &atomic_pool_kernel);
>> +
>> if (gfp & GFP_DMA32)
>> - return atomic_pool_dma32 ?: atomic_pool_dma ?: atomic_pool_kernel;
>> - return atomic_pool_kernel ?: atomic_pool_dma32 ?: atomic_pool_dma;
>> + return __dma_guess_pool(&atomic_pool_dma32,
>> + &atomic_pool_dma,
>> + &atomic_pool_kernel);
>> +
>> + return __dma_guess_pool(&atomic_pool_kernel,
>> + &atomic_pool_dma32,
>> + &atomic_pool_dma);
>> }
>> - if (prev == atomic_pool_kernel)
>> - return atomic_pool_dma32 ? atomic_pool_dma32 : atomic_pool_dma;
>> - if (prev == atomic_pool_dma32)
>> - return atomic_pool_dma;
>> +
>> + if (prev == &atomic_pool_kernel)
>> + return __dma_guess_pool(&atomic_pool_dma32,
>> + &atomic_pool_dma, NULL);
>> +
>> + if (prev == &atomic_pool_dma32)
>> + return __dma_guess_pool(&atomic_pool_dma, NULL, NULL);
>> +
>> return NULL;
>> }
>> @@ -272,16 +317,20 @@ static struct page *__dma_alloc_from_pool(struct device *dev, size_t size,
>> }
>> struct page *dma_alloc_from_pool(struct device *dev, size_t size,
>> - void **cpu_addr, gfp_t gfp,
>> + void **cpu_addr, gfp_t gfp, unsigned long attrs,
>> bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t))
>> {
>> - struct gen_pool *pool = NULL;
>> + struct dma_gen_pool *dma_pool = NULL;
>> struct page *page;
>> bool pool_found = false;
>> - while ((pool = dma_guess_pool(pool, gfp))) {
>> + while ((dma_pool = dma_guess_pool(dma_pool, gfp))) {
>> +
>> + if (dma_pool->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
>> + continue;
>> +
>> pool_found = true;
>> - page = __dma_alloc_from_pool(dev, size, pool, cpu_addr,
>> + page = __dma_alloc_from_pool(dev, size, dma_pool->pool, cpu_addr,
>> phys_addr_ok);
>> if (page)
>> return page;
>> @@ -296,12 +345,14 @@ struct page *dma_alloc_from_pool(struct device *dev, size_t size,
>> bool dma_free_from_pool(struct device *dev, void *start, size_t size)
>> {
>> - struct gen_pool *pool = NULL;
>> + struct dma_gen_pool *dma_pool = NULL;
>> +
>> + while ((dma_pool = dma_guess_pool(dma_pool, 0))) {
>> - while ((pool = dma_guess_pool(pool, 0))) {
>> - if (!gen_pool_has_addr(pool, (unsigned long)start, size))
>> + if (!gen_pool_has_addr(dma_pool->pool, (unsigned long)start, size))
>
>
> v3 of this just crashed here with dma_pool!=NULL but dma_pool->pool==NULL. continuing debugging... Thanks,
dma_direct_free:
dma_free_from_pool (loop over pools) -> false
[here was a crash which I fixed by "if (!dma_pool->pool) continue"]
swiotlb_find_pool (loop again) -> false
__dma_direct_free_pages
swiotlb_free
swiotlb_find_pool (loop again)
dma_free_contiguous => done.
so that works but kinda hard to follow and there is some room for optimization. I do not normally have swiottlb when I test this and there is too many of this swiotlb stuff on the real direct dma mapping path imho. Thanks,
>
>
>> continue;
>> - gen_pool_free(pool, (unsigned long)start, size);
>> +
>> + gen_pool_free(dma_pool->pool, (unsigned long)start, size);
>> return true;
>> }
>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> index 1abd3e6146f4..ab4eccbaa076 100644
>> --- a/kernel/dma/swiotlb.c
>> +++ b/kernel/dma/swiotlb.c
>> @@ -612,6 +612,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>> u64 phys_limit, gfp_t gfp)
>> {
>> struct page *page;
>> + unsigned long attrs = 0;
>> /*
>> * Allocate from the atomic pools if memory is encrypted and
>> @@ -623,8 +624,12 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>> if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
>> return NULL;
>> + /* swiotlb considered decrypted by default */
>> + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
>> + attrs = DMA_ATTR_CC_SHARED;
>> +
>> return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
>> - dma_coherent_ok);
>> + attrs, dma_coherent_ok);
>> }
>> gfp &= ~GFP_ZONEMASK;
>
--
Alexey
^ permalink raw reply
* Re: [PATCH v4 04/13] dma: swiotlb: track pool encryption state and honor DMA_ATTR_CC_SHARED
From: Alexey Kardashevskiy @ 2026-05-18 8:19 UTC (permalink / raw)
To: Aneesh Kumar K.V (Arm), iommu, linux-arm-kernel, linux-kernel,
linux-coco
Cc: Robin Murphy, Marek Szyprowski, Will Deacon, Marc Zyngier,
Steven Price, Suzuki K Poulose, Catalin Marinas, Jiri Pirko,
Jason Gunthorpe, Mostafa Saleh, Petr Tesarik, Dan Williams,
Xu Yilun, linuxppc-dev, linux-s390, Madhavan Srinivasan,
Michael Ellerman, Nicholas Piggin, Christophe Leroy (CS GROUP),
Alexander Gordeev, Gerald Schaefer, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, x86
In-Reply-To: <20260512090408.794195-5-aneesh.kumar@kernel.org>
On 12/5/26 19:03, Aneesh Kumar K.V (Arm) wrote:
> Teach swiotlb to distinguish between encrypted and decrypted bounce
> buffer pools, and make allocation and mapping paths select a pool whose
> state matches the requested DMA attributes.
>
> Add a decrypted flag to io_tlb_mem, initialize it for the default and
> restricted pools, and propagate DMA_ATTR_CC_SHARED into swiotlb pool
> allocation. Reject swiotlb alloc/map requests when the selected pool does
> not match the required encrypted/decrypted state.
>
> Also return DMA addresses with the matching phys_to_dma_{encrypted,
> unencrypted} helper so the DMA address encoding stays consistent with the
> chosen pool.
>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
> include/linux/dma-direct.h | 10 ++++
> include/linux/swiotlb.h | 8 ++-
> kernel/dma/direct.c | 14 +++--
> kernel/dma/swiotlb.c | 108 +++++++++++++++++++++++++++----------
> 4 files changed, 107 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index c249912456f9..94fad4e7c11e 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -77,6 +77,10 @@ static inline dma_addr_t dma_range_map_max(const struct bus_dma_region *map)
> #ifndef phys_to_dma_unencrypted
> #define phys_to_dma_unencrypted phys_to_dma
> #endif
> +
> +#ifndef phys_to_dma_encrypted
> +#define phys_to_dma_encrypted phys_to_dma
> +#endif
> #else
> static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
> {
> @@ -90,6 +94,12 @@ static inline dma_addr_t phys_to_dma_unencrypted(struct device *dev,
> {
> return dma_addr_unencrypted(__phys_to_dma(dev, paddr));
> }
> +
> +static inline dma_addr_t phys_to_dma_encrypted(struct device *dev,
> + phys_addr_t paddr)
> +{
> + return dma_addr_encrypted(__phys_to_dma(dev, paddr));
> +}
> /*
> * If memory encryption is supported, phys_to_dma will set the memory encryption
> * bit in the DMA address, and dma_to_phys will clear it.
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 3dae0f592063..b3fa3c6e0169 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -81,6 +81,7 @@ struct io_tlb_pool {
> struct list_head node;
> struct rcu_head rcu;
> bool transient;
> + bool unencrypted;
> #endif
> };
>
> @@ -111,6 +112,7 @@ struct io_tlb_mem {
> struct dentry *debugfs;
> bool force_bounce;
> bool for_alloc;
> + bool unencrypted;
> #ifdef CONFIG_SWIOTLB_DYNAMIC
> bool can_grow;
> u64 phys_limit;
> @@ -282,7 +284,8 @@ static inline void swiotlb_sync_single_for_cpu(struct device *dev,
> extern void swiotlb_print_info(void);
>
> #ifdef CONFIG_DMA_RESTRICTED_POOL
> -struct page *swiotlb_alloc(struct device *dev, size_t size);
> +struct page *swiotlb_alloc(struct device *dev, size_t size,
> + unsigned long attrs);
> bool swiotlb_free(struct device *dev, struct page *page, size_t size);
>
> static inline bool is_swiotlb_for_alloc(struct device *dev)
> @@ -290,7 +293,8 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
> return dev->dma_io_tlb_mem->for_alloc;
> }
> #else
> -static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
> +static inline struct page *swiotlb_alloc(struct device *dev, size_t size,
> + unsigned long attrs)
> {
> return NULL;
> }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index dc2907439b3d..97ae4fa10521 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -104,9 +104,10 @@ static void __dma_direct_free_pages(struct device *dev, struct page *page,
> dma_free_contiguous(dev, page, size);
> }
>
> -static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
> +static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size,
> + unsigned long attrs)
> {
> - struct page *page = swiotlb_alloc(dev, size);
> + struct page *page = swiotlb_alloc(dev, size, attrs);
>
> if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> swiotlb_free(dev, page, size);
> @@ -266,8 +267,12 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> gfp, attrs);
>
> if (is_swiotlb_for_alloc(dev)) {
here we know it is shared so ...
> - page = dma_direct_alloc_swiotlb(dev, size);
> + page = dma_direct_alloc_swiotlb(dev, size, attrs);
> if (page) {
> + /*
> + * swiotlb allocations comes from pool already marked
> + * decrypted
> + */
... is not this needed here
attrs |= DMA_ATTR_CC_SHARED;
?
and then the setup_page label below can do the right thing, which I tried, with enforcing io_tlb_default_mem.for_alloc=1, it works - accepted device can still do DMA to shared memory. Thanks,
> mark_mem_decrypt = false;
> goto setup_page;
> }
> @@ -374,6 +379,7 @@ void dma_direct_free(struct device *dev, size_t size,
> return;
>
> if (swiotlb_find_pool(dev, dma_to_phys(dev, dma_addr)))
> + /* Swiotlb doesn't need a page attribute update on free */
> mark_mem_encrypted = false;
>
> if (is_vmalloc_addr(cpu_addr)) {
> @@ -403,7 +409,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
> gfp, attrs);
>
> if (is_swiotlb_for_alloc(dev)) {
> - page = dma_direct_alloc_swiotlb(dev, size);
> + page = dma_direct_alloc_swiotlb(dev, size, attrs);
> if (!page)
> return NULL;
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index ab4eccbaa076..065663be282c 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -259,10 +259,21 @@ void __init swiotlb_update_mem_attributes(void)
> struct io_tlb_pool *mem = &io_tlb_default_mem.defpool;
> unsigned long bytes;
>
> + /*
> + * if platform support memory encryption, swiotlb buffers are
> + * decrypted by default.
> + */
> + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> + io_tlb_default_mem.unencrypted = true;
> + else
> + io_tlb_default_mem.unencrypted = false;
> +
> if (!mem->nslabs || mem->late_alloc)
> return;
> bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
> - set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
> +
> + if (io_tlb_default_mem.unencrypted)
> + set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
> }
>
> static void swiotlb_init_io_tlb_pool(struct io_tlb_pool *mem, phys_addr_t start,
> @@ -505,8 +516,10 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> if (!mem->slots)
> goto error_slots;
>
> - set_memory_decrypted((unsigned long)vstart,
> - (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> + if (io_tlb_default_mem.unencrypted)
> + set_memory_decrypted((unsigned long)vstart,
> + (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
> +
> swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
> nareas);
> add_mem_pool(&io_tlb_default_mem, mem);
> @@ -539,7 +552,9 @@ void __init swiotlb_exit(void)
> tbl_size = PAGE_ALIGN(mem->end - mem->start);
> slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
>
> - set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> + if (io_tlb_default_mem.unencrypted)
> + set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> +
> if (mem->late_alloc) {
> area_order = get_order(array_size(sizeof(*mem->areas),
> mem->nareas));
> @@ -563,6 +578,7 @@ void __init swiotlb_exit(void)
> * @gfp: GFP flags for the allocation.
> * @bytes: Size of the buffer.
> * @phys_limit: Maximum allowed physical address of the buffer.
> + * @unencrypted: true to allocate unencrypted memory, false for encrypted memory
> *
> * Allocate pages from the buddy allocator. If successful, make the allocated
> * pages decrypted that they can be used for DMA.
> @@ -570,7 +586,8 @@ void __init swiotlb_exit(void)
> * Return: Decrypted pages, %NULL on allocation failure, or ERR_PTR(-EAGAIN)
> * if the allocated physical address was above @phys_limit.
> */
> -static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> +static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes,
> + u64 phys_limit, bool unencrypted)
> {
> unsigned int order = get_order(bytes);
> struct page *page;
> @@ -588,13 +605,13 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> }
>
> vaddr = phys_to_virt(paddr);
> - if (set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
> + if (unencrypted && set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
> goto error;
> return page;
>
> error:
> /* Intentional leak if pages cannot be encrypted again. */
> - if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> + if (unencrypted && !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> __free_pages(page, order);
> return NULL;
> }
> @@ -604,30 +621,26 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
> * @dev: Device for which a memory pool is allocated.
> * @bytes: Size of the buffer.
> * @phys_limit: Maximum allowed physical address of the buffer.
> + * @attrs: DMA attributes for the allocation.
> * @gfp: GFP flags for the allocation.
> *
> * Return: Allocated pages, or %NULL on allocation failure.
> */
> static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> - u64 phys_limit, gfp_t gfp)
> + u64 phys_limit, unsigned long attrs, gfp_t gfp)
> {
> struct page *page;
> - unsigned long attrs = 0;
>
> /*
> * Allocate from the atomic pools if memory is encrypted and
> * the allocation is atomic, because decrypting may block.
> */
> - if (!gfpflags_allow_blocking(gfp) && dev && force_dma_unencrypted(dev)) {
> + if (!gfpflags_allow_blocking(gfp) && (attrs & DMA_ATTR_CC_SHARED)) {
> void *vaddr;
>
> if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
> return NULL;
>
> - /* swiotlb considered decrypted by default */
> - if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> - attrs = DMA_ATTR_CC_SHARED;
> -
> return dma_alloc_from_pool(dev, bytes, &vaddr, gfp,
> attrs, dma_coherent_ok);
> }
> @@ -638,7 +651,8 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> else if (phys_limit <= DMA_BIT_MASK(32))
> gfp |= __GFP_DMA32;
>
> - while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit))) {
> + while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit,
> + !!(attrs & DMA_ATTR_CC_SHARED)))) {
> if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
> phys_limit < DMA_BIT_MASK(64) &&
> !(gfp & (__GFP_DMA32 | __GFP_DMA)))
> @@ -657,15 +671,18 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
> * swiotlb_free_tlb() - free a dynamically allocated IO TLB buffer
> * @vaddr: Virtual address of the buffer.
> * @bytes: Size of the buffer.
> + * @unencrypted: true if @vaddr was allocated decrypted and must be
> + * re-encrypted before being freed
> */
> -static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> +static void swiotlb_free_tlb(void *vaddr, size_t bytes, bool unencrypted)
> {
> if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> dma_free_from_pool(NULL, vaddr, bytes))
> return;
>
> /* Intentional leak if pages cannot be encrypted again. */
> - if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> + if (!unencrypted ||
> + !set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
> __free_pages(virt_to_page(vaddr), get_order(bytes));
> }
>
> @@ -676,6 +693,7 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> * @nslabs: Desired (maximum) number of slabs.
> * @nareas: Number of areas.
> * @phys_limit: Maximum DMA buffer physical address.
> + * @attrs: DMA attributes for the allocation.
> * @gfp: GFP flags for the allocations.
> *
> * Allocate and initialize a new IO TLB memory pool. The actual number of
> @@ -686,7 +704,8 @@ static void swiotlb_free_tlb(void *vaddr, size_t bytes)
> */
> static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> unsigned long minslabs, unsigned long nslabs,
> - unsigned int nareas, u64 phys_limit, gfp_t gfp)
> + unsigned int nareas, u64 phys_limit, unsigned long attrs,
> + gfp_t gfp)
> {
> struct io_tlb_pool *pool;
> unsigned int slot_order;
> @@ -704,9 +723,10 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> if (!pool)
> goto error;
> pool->areas = (void *)pool + sizeof(*pool);
> + pool->unencrypted = !!(attrs & DMA_ATTR_CC_SHARED);
>
> tlb_size = nslabs << IO_TLB_SHIFT;
> - while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, gfp))) {
> + while (!(tlb = swiotlb_alloc_tlb(dev, tlb_size, phys_limit, attrs, gfp))) {
> if (nslabs <= minslabs)
> goto error_tlb;
> nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
> @@ -724,7 +744,8 @@ static struct io_tlb_pool *swiotlb_alloc_pool(struct device *dev,
> return pool;
>
> error_slots:
> - swiotlb_free_tlb(page_address(tlb), tlb_size);
> + swiotlb_free_tlb(page_address(tlb), tlb_size,
> + !!(attrs & DMA_ATTR_CC_SHARED));
> error_tlb:
> kfree(pool);
> error:
> @@ -742,7 +763,9 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
> struct io_tlb_pool *pool;
>
> pool = swiotlb_alloc_pool(NULL, IO_TLB_MIN_SLABS, default_nslabs,
> - default_nareas, mem->phys_limit, GFP_KERNEL);
> + default_nareas, mem->phys_limit,
> + mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
> + GFP_KERNEL);
> if (!pool) {
> pr_warn_ratelimited("Failed to allocate new pool");
> return;
> @@ -762,7 +785,7 @@ static void swiotlb_dyn_free(struct rcu_head *rcu)
> size_t tlb_size = pool->end - pool->start;
>
> free_pages((unsigned long)pool->slots, get_order(slots_size));
> - swiotlb_free_tlb(pool->vaddr, tlb_size);
> + swiotlb_free_tlb(pool->vaddr, tlb_size, pool->unencrypted);
> kfree(pool);
> }
>
> @@ -1232,6 +1255,7 @@ static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
> nslabs = nr_slots(alloc_size);
> phys_limit = min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
> pool = swiotlb_alloc_pool(dev, nslabs, nslabs, 1, phys_limit,
> + mem->unencrypted ? DMA_ATTR_CC_SHARED : 0,
> GFP_NOWAIT);
> if (!pool)
> return -1;
> @@ -1394,6 +1418,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> enum dma_data_direction dir, unsigned long attrs)
> {
> struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> + bool require_decrypted = false;
> unsigned int offset;
> struct io_tlb_pool *pool;
> unsigned int i;
> @@ -1411,6 +1436,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
> if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
> pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
>
> + /*
> + * if we are trying to swiotlb map a decrypted paddr or the paddr is encrypted
> + * but the device is forcing decryption, use decrypted io_tlb_mem
> + */
> + if ((attrs & DMA_ATTR_CC_SHARED) || force_dma_unencrypted(dev))
> + require_decrypted = true;
> +
> + if (require_decrypted != mem->unencrypted)
> + return (phys_addr_t)DMA_MAPPING_ERROR;
> +
> /*
> * The default swiotlb memory pool is allocated with PAGE_SIZE
> * alignment. If a mapping is requested with larger alignment,
> @@ -1608,8 +1643,14 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
> if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
> return DMA_MAPPING_ERROR;
>
> - /* Ensure that the address returned is DMA'ble */
> - dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
> + /*
> + * Use the allocated io_tlb_mem encryption type to determine dma addr.
> + */
> + if (dev->dma_io_tlb_mem->unencrypted)
> + dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
> + else
> + dma_addr = phys_to_dma_encrypted(dev, swiotlb_addr);
> +
> if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> __swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
> attrs | DMA_ATTR_SKIP_CPU_SYNC,
> @@ -1773,7 +1814,8 @@ static inline void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
>
> #ifdef CONFIG_DMA_RESTRICTED_POOL
>
> -struct page *swiotlb_alloc(struct device *dev, size_t size)
> +struct page *swiotlb_alloc(struct device *dev, size_t size,
> + unsigned long attrs)
> {
> struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> struct io_tlb_pool *pool;
> @@ -1784,6 +1826,9 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
> if (!mem)
> return NULL;
>
> + if (mem->unencrypted != !!(attrs & DMA_ATTR_CC_SHARED))
> + return NULL;
> +
> align = (1 << (get_order(size) + PAGE_SHIFT)) - 1;
> index = swiotlb_find_slots(dev, 0, size, align, &pool);
> if (index == -1)
> @@ -1853,9 +1898,18 @@ static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> kfree(mem);
> return -ENOMEM;
> }
> + /*
> + * if platform supports memory encryption,
> + * restricted mem pool is decrypted by default
> + */
> + if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
> + mem->unencrypted = true;
> + set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
> + rmem->size >> PAGE_SHIFT);
> + } else {
> + mem->unencrypted = false;
> + }
>
> - set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
> - rmem->size >> PAGE_SHIFT);
> swiotlb_init_io_tlb_pool(pool, rmem->base, nslabs,
> false, nareas);
> mem->force_bounce = true;
--
Alexey
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox