From: Alexey Kardashevskiy <aik@amd.com>
To: Xu Yilun <yilun.xu@linux.intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
kvm@vger.kernel.org, dri-devel@lists.freedesktop.org,
linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
sumit.semwal@linaro.org, christian.koenig@amd.com,
pbonzini@redhat.com, seanjc@google.com,
alex.williamson@redhat.com, vivek.kasireddy@intel.com,
dan.j.williams@intel.com, yilun.xu@intel.com,
linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org,
lukas@wunner.de, yan.y.zhao@intel.com, daniel.vetter@ffwll.ch,
leon@kernel.org, baolu.lu@linux.intel.com,
zhenzhong.duan@intel.com, tao1.su@intel.com
Subject: Re: [RFC PATCH 00/12] Private MMIO support for private assigned dev
Date: Mon, 26 May 2025 17:18:48 +1000 [thread overview]
Message-ID: <ae16db07-5fca-4369-aa67-cbe2e0fd60fd@amd.com> (raw)
In-Reply-To: <aDE5SPzOAU0sNIt+@yilunxu-OptiPlex-7050>
On 24/5/25 13:13, Xu Yilun wrote:
> On Thu, May 22, 2025 at 01:45:57PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 16/5/25 02:04, Xu Yilun wrote:
>>> On Wed, May 14, 2025 at 01:33:39PM -0300, Jason Gunthorpe wrote:
>>>> On Wed, May 14, 2025 at 03:02:53PM +0800, Xu Yilun wrote:
>>>>>> We have an awkward fit for what CCA people are doing to the various
>>>>>> Linux APIs. Looking somewhat maximally across all the arches a "bind"
>>>>>> for a CC vPCI device creation operation does:
>>>>>>
>>>>>> - Setup the CPU page tables for the VM to have access to the MMIO
>>>>>
>>>>> This is guest side thing, is it? Anything host need to opt-in?
>>>>
>>>> CPU hypervisor page tables.
>>>>
>>>>>> - Revoke hypervisor access to the MMIO
>>>>>
>>>>> VFIO could choose never to mmap MMIO, so in this case nothing to do?
>>>>
>>>> Yes, if you do it that way.
>>>>>> - Setup the vIOMMU to understand the vPCI device
>>>>>> - Take over control of some of the IOVA translation, at least for T=1,
>>>>>> and route to the the vIOMMU
>>>>>> - Register the vPCI with any attestation functions the VM might use
>>>>>> - Do some DOE stuff to manage/validate TDSIP/etc
>>>>>
>>>>> Intel TDX Connect has a extra requirement for "unbind":
>>>>>
>>>>> - Revoke KVM page table (S-EPT) for the MMIO only after TDISP
>>>>> CONFIG_UNLOCK
>>>>
>>>> Maybe you could express this as the S-EPT always has the MMIO mapped
>>>> into it as long as the vPCI function is installed to the VM?
>>>
>>> Yeah.
>>>
>>>> Is KVM responsible for the S-EPT?
>>>
>>> Yes.
>>>
>>>>
>>>>> Another thing is, seems your term "bind" includes all steps for
>>>>> shared -> private conversion.
>>>>
>>>> Well, I was talking about vPCI creation. I understand that during the
>>>> vPCI lifecycle the VM will do "bind" "unbind" which are more or less
>>>> switching the device into a T=1 mode. Though I understood on some
>>>
>>> I want to introduce some terms about CC vPCI.
>>>
>>> 1. "Bind", guest requests host do host side CC setup & put device in
>>> CONFIG_LOCKED state, waiting for attestation. Any further change which
>>> has secuity concern breaks "bind", e.g. reset, touch MMIO, physical MSE,
>>> BAR addr...
>>>
>>> 2. "Attest", after "bind", guest verifies device evidences (cert,
>>> measurement...).
>>>
>>> 3. "Accept", after successful attestation, guest do guest side CC setup &
>>> switch the device into T=1 mode (TDISP RUN state)
>>
>> (implementation note)
>> AMD SEV moves TDI to RUN at "Attest" as a guest still can avoid encrypted MMIO access and the PSP keeps IOMMU blocked until the guest enables it.
>>
>
> Good to know. That's why we have these SW defined verbs rather than
> reusing TDISP terms.
>
>>> 4. "Unbind", guest requests host put device in CONFIG_UNLOCK state +
>>> remove all CC setup.
>>>
>>>> arches this was mostly invisible to the hypervisor?
>>>
>>> Attest & Accept can be invisible to hypervisor, or host just help pass
>>> data blobs between guest, firmware & device.
>>
>> No, they cannot.
>
> MM.. TSM driver is the agent of trusted firmware in the OS, so I
> excluded it from "hypervisor". TSM driver could parse data blobs and do
> whatever requested by trusted firmware.
>
> I want to justify the general guest_request interface, explain why
> VIFO/IOMMUFD don't have to maintain the "attest", "accept" states.
>
>>
>>> Bind cannot be host agnostic, host should be aware not to touch device
>>> after Bind.
>>
>> Bind actually connects a TDI to a guest, the guest could not possibly do that alone as it does not know/have access to the physical PCI function#0 to do the DOE/SecSPDM messaging, and neither does the PSP.
>>
>> The non-touching clause (or, more precisely "selectively touching") is about "Attest" and "Accept" when the TDI is in the CONFIG_LOCKED or RUN state. Up to the point when we rather want to block the config space and MSIX BAR access after the TDI is CONFIG_LOCKED/RUN to prevent TDI from going to the ERROR state.
>>
>>
>>>>
>>>>> But in my mind, "bind" only includes
>>>>> putting device in TDISP LOCK state & corresponding host setups required
>>>>> by firmware. I.e "bind" means host lockes down the CC setup, waiting for
>>>>> guest attestation.
>>>>
>>>> So we will need to have some other API for this that modifies the vPCI
>>>> object.
>>>
>>> IIUC, in Alexey's patch ioctl(iommufd, IOMMU_VDEVICE_TSM_BIND) does the
>>> "Bind" thing in host.
>>
>>
>> I am still not sure what "vPCI" means exactly, a passed through PCI device? Or a piece of vIOMMU handling such device?
>>
>
> My understanding is both. When you "Bind" you modifies the physical
> device, you may also need to setup a piece of vIOMMU for private
> assignement to work.
>
>>
>>>> It might be reasonable to have VFIO reach into iommufd to do that on
>>>> an already existing iommufd VDEVICE object. A little weird, but we
>>>> could probably make that work.
>>>
>>> Mm, Are you proposing an uAPI in VFIO, and a kAPI from VFIO -> IOMMUFD like:
>>>
>>> ioctl(vfio_fd, VFIO_DEVICE_ATTACH_VDEV, vdev_id)
>>> -> iommufd_device_attach_vdev()
>>> -> tsm_tdi_bind()
>>>
>>>>
>>>> But you have some weird ordering issues here if the S-EPT has to have
>>>> the VFIO MMIO then you have to have a close() destruction order that
>>>
>>> Yeah, by holding kvm reference.
>>>
>>>> sees VFIO remove the S-EPT and release the KVM, then have iommufd
>>>> destroy the VDEVICE object.
>>>
>>> Regarding VM destroy, TDX Connect has more enforcement, VM could only be
>>> destroyed after all assigned CC vPCI devices are destroyed.
>>
>> Can be done by making IOMMUFD/vdevice holding the kvm pointer to ensure tsm_tdi_unbind() is not called before the guest disappeared from the firmware. I seem to be just lucky with the current order of things being destroyed, hmm.
>>
>
> tsm_tdi_unbind() *should* be called before guest disappear. For TDX
> Connect that is the enforcement. Holding KVM pointer is the effective
> way.
>
>>
>>> Nowadays, VFIO already holds KVM reference, so we need
>>>
>>> close(vfio_fd)
>>> -> iommufd_device_detach_vdev()
>>> -> tsm_tdi_unbind()
>>> -> tdi stop
>>> -> callback to VFIO, dmabuf_move_notify(revoke)
>>> -> KVM unmap MMIO
>>> -> tdi metadata remove
>>> -> kvm_put_kvm()
>>> -> kvm_destroy_vm()
>>>
>>>
>>>>
>>>>>> It doesn't mean that iommufd is suddenly doing PCI stuff, no, that
>>>>>> stays in VFIO.
>>>>>
>>>>> I'm not sure if Alexey's patch [1] illustates your idea. It calls
>>>>> tsm_tdi_bind() which directly does device stuff, and impacts MMIO.
>>>>> VFIO doesn't know about this.
>>
>> VFIO knows about this enough as we asked it to share MMIO via dmabuf's fd and not via mmap(), otherwise it is the same MMIO, exactly where it was, BARs do not change.
>>
>
> Yes, if you define a SW "lock down" in boarder sense than TDISP LOCKED.
> But seems TDX Connect failed to adapt to this solution because it still
> needs to handle MMIO invalidation before FLR, see below.
>
>>>>>
>>>>> I have to interpret this as VFIO firstly hand over device CC features
>>>>> and MMIO resources to IOMMUFD, so VFIO never cares about them.
>>>>>
>>>>> [1] https://lore.kernel.org/all/20250218111017.491719-15-aik@amd.com/
>>>>
>>>> There is also the PCI layer involved here and maybe PCI should be
>>>> participating in managing some of this. Like it makes a bit of sense
>>>> that PCI would block the FLR on platforms that require this?
>>>
>>> FLR to a bound device is absolutely fine, just break the CC state.
>>> Sometimes it is exactly what host need to stop CC immediately.
>>> The problem is in VFIO's pre-FLR handling so we need to patch VFIO, not
>>> PCI core.
>>
>> What is a problem here exactly?
>> FLR by the host which equals to any other PCI error? The guest may or may not be able to handle it, afaik it does not handle any errors now, QEMU just stops the guest.
>
> It is about TDX Connect.
>
> According to the dmabuf patchset, the dmabuf needs to be revoked before
> FLR. That means KVM unmaps MMIOs when the device is in LOCKED/RUN state.
> That is forbidden by TDX Module and will crash KVM.
FLR is something you tell the device to do, how/why would TDX know about it? Or it check the TDI state on every map/unmap (unlikely)?
> So the safer way is
> to unbind the TDI first, then revoke MMIOs, then do FLR.
>
> I'm not sure when p2p dma is involved AMD will have the same issue.
On AMD, the host can "revoke" at any time, at worst it'll see RMP events from IOMMU. Thanks,
> Cause in that case, MMIOs would also be mapped in IOMMU PT and revoke
> MMIOs means IOMMU mapping drop. The root cause of the concern is secure
> firmware should monitor IOMMU mapping integrity for private assignement
> or hypervisor could silently drop trusted DMA writting.
>
> TDX Connect has the wider impact on this issue cause it uses the same
> table for KVM S-EPT and Secure IOMMU PT.
>
> Thanks,
> Yilun
>
>> Or FLR by the guest? Then it knows it needs to do the dance with attest/accept, again.
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>> Yilun
>>>
>>>>
>>>> Jason
>>
>> --
>> Alexey
>>
--
Alexey
next prev parent reply other threads:[~2025-05-26 7:18 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-07 14:27 [RFC PATCH 00/12] Private MMIO support for private assigned dev Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI Xu Yilun
2025-01-08 8:01 ` Christian König
2025-01-08 13:23 ` Jason Gunthorpe
2025-01-08 13:44 ` Christian König
2025-01-08 14:58 ` Jason Gunthorpe
2025-01-08 15:25 ` Christian König
2025-01-08 16:22 ` Jason Gunthorpe
2025-01-08 17:56 ` Xu Yilun
2025-01-10 19:24 ` Simona Vetter
2025-01-10 20:16 ` Jason Gunthorpe
2025-01-08 18:44 ` Simona Vetter
2025-01-08 19:22 ` Xu Yilun
[not found] ` <58e97916-e6fd-41ef-84b4-bbf53ed0e8e4@amd.com>
2025-01-08 23:06 ` Xu Yilun
2025-01-10 19:34 ` Simona Vetter
2025-01-10 20:38 ` Jason Gunthorpe
2025-01-12 22:10 ` Xu Yilun
2025-01-14 14:44 ` Simona Vetter
2025-01-14 17:31 ` Jason Gunthorpe
2025-01-15 8:55 ` Simona Vetter
2025-01-15 9:32 ` Christoph Hellwig
2025-01-15 13:34 ` Jason Gunthorpe
2025-01-16 5:33 ` Christoph Hellwig
2024-06-19 23:39 ` Xu Yilun
2025-01-16 13:28 ` Jason Gunthorpe
[not found] ` <420bd2ea-d87c-4f01-883e-a7a5cf1635fe@amd.com>
2025-01-17 14:42 ` Simona Vetter
2025-01-20 12:14 ` Christian König
2025-01-20 17:59 ` Jason Gunthorpe
2025-01-20 18:50 ` Simona Vetter
2025-01-20 19:48 ` Jason Gunthorpe
2025-01-21 16:11 ` Simona Vetter
2025-01-21 17:36 ` Jason Gunthorpe
2025-01-22 11:04 ` Simona Vetter
2025-01-22 13:28 ` Jason Gunthorpe
2025-01-22 13:29 ` Christian König
2025-01-22 14:37 ` Jason Gunthorpe
2025-01-22 14:59 ` Christian König
2025-01-23 13:59 ` Jason Gunthorpe
[not found] ` <9a36fba5-2dee-46fd-9f51-47c5f0ffc1d4@amd.com>
2025-01-23 14:35 ` Christian König
2025-01-23 15:02 ` Jason Gunthorpe
[not found] ` <89f46c7f-a585-44e2-963d-bf00bf09b493@amd.com>
2025-01-23 16:08 ` Jason Gunthorpe
2025-01-09 8:09 ` Christian König
2025-01-10 20:54 ` Jason Gunthorpe
2025-01-15 9:38 ` Christian König
2025-01-15 13:38 ` Jason Gunthorpe
[not found] ` <f6c2524f-5ef5-4c2c-a464-a7b195e0bf6c@amd.com>
2025-01-15 13:46 ` Christian König
2025-01-15 14:14 ` Jason Gunthorpe
[not found] ` <c86cfee1-063a-4972-a343-ea0eff2141c9@amd.com>
2025-01-15 14:30 ` Christian König
2025-01-15 15:10 ` Jason Gunthorpe
[not found] ` <6f7a14aa-f607-45f9-9e15-759e26079dec@amd.com>
2025-01-15 17:09 ` Jason Gunthorpe
[not found] ` <5f588dac-d3e2-445d-9389-067b875412dc@amd.com>
2024-06-20 22:02 ` Xu Yilun
2025-01-20 13:44 ` Christian König
2025-01-22 4:16 ` Xu Yilun
2025-01-16 16:07 ` Jason Gunthorpe
2025-01-17 14:37 ` Simona Vetter
[not found] ` <0e7f92bd-7da3-4328-9081-0957b3d155ca@amd.com>
2025-01-09 9:28 ` Leon Romanovsky
2025-01-07 14:27 ` [RFC PATCH 02/12] vfio: Export vfio device get and put registration helpers Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 03/12] vfio/pci: Share the core device pointer while invoking feature functions Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 04/12] vfio/pci: Allow MMIO regions to be exported through dma-buf Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 05/12] vfio/pci: Support get_pfn() callback for dma-buf Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 06/12] KVM: Support vfio_dmabuf backed MMIO region Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 07/12] KVM: x86/mmu: Handle page fault for vfio_dmabuf backed MMIO Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device Xu Yilun
2025-01-08 13:30 ` Jason Gunthorpe
2025-01-08 16:57 ` Xu Yilun
2025-01-09 14:40 ` Jason Gunthorpe
2025-01-09 16:40 ` Xu Yilun
2025-01-10 13:31 ` Jason Gunthorpe
2025-01-11 3:48 ` Xu Yilun
2025-01-13 16:49 ` Jason Gunthorpe
2024-06-17 23:28 ` Xu Yilun
2025-01-14 13:35 ` Jason Gunthorpe
2025-01-15 12:57 ` Alexey Kardashevskiy
2025-01-15 13:01 ` Jason Gunthorpe
2025-01-17 1:57 ` Baolu Lu
2025-01-17 13:25 ` Jason Gunthorpe
2024-06-23 19:59 ` Xu Yilun
2025-01-20 13:25 ` Jason Gunthorpe
2024-06-24 21:12 ` Xu Yilun
2025-01-21 17:43 ` Jason Gunthorpe
2025-01-22 4:32 ` Xu Yilun
2025-01-22 12:55 ` Jason Gunthorpe
2025-01-23 7:41 ` Xu Yilun
2025-01-23 13:08 ` Jason Gunthorpe
2025-01-20 4:41 ` Baolu Lu
2025-01-20 9:45 ` Alexey Kardashevskiy
2025-01-20 13:28 ` Jason Gunthorpe
2025-03-12 1:37 ` Dan Williams
2025-03-17 16:38 ` Jason Gunthorpe
2025-01-07 14:27 ` [RFC PATCH 09/12] vfio/pci: Export vfio dma-buf specific info for importers Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 10/12] KVM: vfio_dmabuf: Fetch VFIO specific dma-buf data for sanity check Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 11/12] KVM: x86/mmu: Export kvm_is_mmio_pfn() Xu Yilun
2025-01-07 14:27 ` [RFC PATCH 12/12] KVM: TDX: Implement TDX specific private MMIO map/unmap for SEPT Xu Yilun
2025-04-29 6:48 ` [RFC PATCH 00/12] Private MMIO support for private assigned dev Alexey Kardashevskiy
2025-04-29 7:50 ` Alexey Kardashevskiy
2025-05-09 3:04 ` Alexey Kardashevskiy
2025-05-09 11:12 ` Xu Yilun
2025-05-09 16:28 ` Xu Yilun
2025-05-09 18:43 ` Jason Gunthorpe
2025-05-10 3:47 ` Xu Yilun
2025-05-12 9:30 ` Alexey Kardashevskiy
2025-05-12 14:06 ` Jason Gunthorpe
2025-05-13 10:03 ` Zhi Wang
2025-05-14 9:47 ` Xu Yilun
2025-05-14 20:05 ` Zhi Wang
2025-05-15 18:02 ` Xu Yilun
2025-05-15 19:21 ` Jason Gunthorpe
2025-05-16 6:19 ` Xu Yilun
2025-05-16 12:49 ` Jason Gunthorpe
2025-05-17 2:33 ` Xu Yilun
2025-05-20 10:57 ` Alexey Kardashevskiy
2025-05-24 3:33 ` Xu Yilun
2025-05-15 10:29 ` Alexey Kardashevskiy
2025-05-15 16:44 ` Zhi Wang
2025-05-15 16:53 ` Zhi Wang
2025-05-21 10:41 ` Alexey Kardashevskiy
2025-05-14 7:02 ` Xu Yilun
2025-05-14 16:33 ` Jason Gunthorpe
2025-05-15 16:04 ` Xu Yilun
2025-05-15 17:56 ` Jason Gunthorpe
2025-05-16 6:03 ` Xu Yilun
2025-05-22 3:45 ` Alexey Kardashevskiy
2025-05-24 3:13 ` Xu Yilun
2025-05-26 7:18 ` Alexey Kardashevskiy [this message]
2025-05-29 14:41 ` Xu Yilun
2025-05-29 16:29 ` Jason Gunthorpe
2025-05-30 16:07 ` Xu Yilun
2025-05-30 2:29 ` Alexey Kardashevskiy
2025-05-30 16:23 ` Xu Yilun
2025-06-10 4:20 ` Alexey Kardashevskiy
2025-06-10 5:19 ` Baolu Lu
2025-06-10 6:53 ` Xu Yilun
2025-05-14 3:20 ` Xu Yilun
2025-06-10 4:37 ` Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae16db07-5fca-4369-aa67-cbe2e0fd60fd@amd.com \
--to=aik@amd.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=christian.koenig@amd.com \
--cc=dan.j.williams@intel.com \
--cc=daniel.vetter@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=sumit.semwal@linaro.org \
--cc=tao1.su@intel.com \
--cc=vivek.kasireddy@intel.com \
--cc=yan.y.zhao@intel.com \
--cc=yilun.xu@intel.com \
--cc=yilun.xu@linux.intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).