From: Alexey Kardashevskiy <aik@amd.com>
To: Dan Williams <dan.j.williams@intel.com>, kvm@vger.kernel.org
Cc: iommu@lists.linux.dev, linux-coco@lists.linux.dev,
linux-pci@vger.kernel.org,
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
Alex Williamson <alex.williamson@redhat.com>,
pratikrajesh.sampat@amd.com, michael.day@amd.com,
david.kaplan@amd.com, dhaval.giani@amd.com,
Santosh Shukla <santosh.shukla@amd.com>,
Tom Lendacky <thomas.lendacky@amd.com>,
Michael Roth <michael.roth@amd.com>,
Alexander Graf <agraf@suse.de>,
Nikunj A Dadhania <nikunj@amd.com>,
Vasant Hegde <vasant.hegde@amd.com>,
Lukas Wunner <lukas@wunner.de>
Subject: Re: [RFC PATCH 00/21] Secure VFIO, TDISP, SEV TIO
Date: Fri, 30 Aug 2024 14:38:21 +1000 [thread overview]
Message-ID: <fd8549e3-c2b3-47f7-b413-3007a60ba82b@amd.com> (raw)
In-Reply-To: <66d1072ea0590_31daf294e8@dwillia2-xfh.jf.intel.com.notmuch>
On 30/8/24 09:41, Dan Williams wrote:
> Alexey Kardashevskiy wrote:
> [..]
>>>> - skipping various enforcements of non-SME or
>>>> SWIOTLB in the guest;
>>>
>>> Is this based on some concept of private vs shared mode devices?
>>>
>>>> No mixed share+private DMA supported within the
>>>> same IOMMU.
>>>
>>> What does this mean? A device may not have mixed mappings (makes sense),
>>
>> Currently devices do not have an idea about private host memory (but it
>> is being worked on afaik).
>
> Worked on where? You mean the PCI core indicating that a device is
> private or not? Is that not indicated by guest-side TSM connection
> state?
> >>> or an IOMMU can not host devices that do not all agree on whether
DMA is
>>> private or shared?
>>
>> The hardware allows that via hardware-assisted vIOMMU and I/O page
>> tables in the guest with C-bit takes into accound by the IOMMU but the
>> software support is missing right now. So for this initial drop, vTOM is
>> used for DMA - this thing says "everything below <addr> is private,
>> above <addr> - shared" so nothing needs to bother with the C-bit, and in
>> my exercise I set the <addr> to the allowed maximum.
>>
>> So each IOMMUFD instance in the VM is either "all private mappings" or
>> "all shared". Could be half/half by moving that <addr> :)
>
> I thought existing use cases assume that the CC-VM can trigger page
> conversions at will without regard to a vTOM concept? It would be nice
> to have that address-map separation arrangement, has not that ship
> already sailed?
Mmm. I am either confusing you too much or not following you :) Any page
can be converted, the proposed arrangement would require that
convertion-candidate-pages are allocated from a specific pool?
There are two vTOMs - one in IOMMU to decide on Cbit for DMA trafic (I
use this one), one in VMSA ("VIRTUAL_TOM") for guest memory (this
exercise is not using it). Which one do you mean?
>
> [..]
>>> Would the device not just launch in "shared" mode until it is later
>>> converted to private? I am missing the detail of why passing the device
>>> on the command line requires that private memory be mapped early.
>>
>> A sequencing problem.
>>
>> QEMU "realizes" a VFIO device, it creates an iommufd instance which
>> creates a domain and writes to a DTE (a IOMMU descriptor for PCI BDFn).
>> And DTE is not updated after than. For secure stuff, DTE needs to be
>> slightly different. So right then I tell IOMMUFD that it will handle
>> private memory.
>>
>> Then, the same VFIO "realize" handler maps the guest memory in iommufd.
>> I use the same flag (well, pointer to kvm) in the iommufd pinning code,
>> private memory is pinned and mapped (and related page state change
>> happens as the guest memory is made guest-owned in RMP).
>>
>> QEMU goes to machine_reset() and calls "SNP LAUNCH UPDATE" (the actual
>> place changed recenly, huh) and the latter will measure the guest and
>> try making all guest memory private but it already happened => error.
>>
>> I think I have to decouple the pinning and the IOMMU/DTE setting.
>>
>>> That said, the implication that private device assignment requires
>>> hotplug events is a useful property. This matches nicely with initial
>>> thoughts that device conversion events are violent and might as well be
>>> unplug/replug events to match all the assumptions around what needs to
>>> be updated.
>>
>> For the initial drop, I tell QEMU via "-device vfio-pci,x-tio=true" that
>> it is going to be private so there should be no massive conversion.
>
> That's a SEV-TIO RFC-specific hack, or a proposal?
Not sure at the moment :)
> An approach that aligns more closely with the VFIO operational model,
> where it maps and waits for guest faults / usages, is that QEMU would be
> told that the device is "bind capable", because the host is not in a
> position to assume how the guest will use the device. A "bind capable"
> device operates in shared mode unless and until the guest triggers
> private conversion.
True. I just started this exercise without QEMU DiscardManager. Now I
rely on it but it either needs to allow dynamic flip from
discarded==private to discarded==shared (should do for now) or allow 3
states for guest pages.
>>>> This requires the BME hack as MMIO and
>>>
>>> Not sure what the "BME hack" is, I guess this is foreshadowing for later
>>> in this story.
>> >
>>>> BusMaster enable bits cannot be 0 after MMIO
>>>> validation is done
>>>
>>> It would be useful to call out what is a TDISP requirement, vs
>>> device-specific DSM vs host-specific TSM requirement. In this case I
>>> assume you are referring to PCI 6.2 11.2.6 where it notes that TDIs must
>>
>> Oh there is 6.2 already.
>>
>>> enter the TDISP ERROR state if BME is cleared after the device is
>>> locked?
>>>
>>> ...but this begs the question of whether it needs to be avoided outright
>>
>> Well, besides a couple of avoidable places (like testing INTx support
>> which we know is not going to work on VFs anyway), a standard driver
>> enables MSE first (and the value for the command register does not have
>> 1 for BME) and only then BME. TBH I do not think writing BME=0 when
>> BME=0 already is "clearing" but my test device disagrees.
>
> ...but we should not be creating kernel policy around test devices. What
> matters is real devices. Now, if it is likely that real / production
> devices will go into the TDISP ERROR state by not coalescing MSE + BME
> updates then we need a solution.
True but I do not even know who to ask this question :)
> Given it is unlikely that TDISP support will be widespread any time soon
> it is likely tenable to assume TDISP compatible drivers call a new:
>
> pci_enable(pdev, PCI_ENABLE_TARGET | PCI_ENABLE_INITIATOR);
>
> ...or something like that to coalesce command register writes.
>
> Otherwise if that retrofit ends up being too much work or confusion then
> the ROI of teaching the PCI core to recover this scenario needs to be
> evaluated.
Agree.
>>> or handled as an error recovery case dependending on policy.
>>
>> Avoding seems more straight forward unless we actually want enlightened
>> device drivers which want to examine the interface report before
>> enabling the device. Not sure.
>
> If TDISP capable devices trends towards a handful of devices in the near
> term then some driver fixups seems reasonable. Otherwise if every PCI
> device driver Linux has ever seens needs to be ready for that device to
> have a TDISP capable flavor then mitigating this in the PCI core makes
> more sense than playing driver whack-a-mole.
>
>>>> the guest OS booting process when this
>>>> appens.
>>>>
>>>> SVSM could help addressing these (not
>>>> implemented at the moment).
>>>
>>> At first though avoiding SVSM entanglements where the kernel can be
>>> enlightened shoud be the policy. I would only expect SVSM hacks to cover
>>> for legacy OSes that will never be TDISP enlightened, but in that case
>>> we are likely talking about fully unaware L2. Lets assume fully
>>> enlightened L1 for now.
>>
>> Well, I could also tweak OVMF to make necessary calls to the PSP and
>> hack QEMU to postpone the command register updates to get this going,
>> just a matter of ugliness.
>
> Per above, the tradeoff should be in ROI, not ugliness. I don't see how
> OVMF helps when devices might be being virtually hotplugged or reset.
I have no clue how exactly hotplug works on x86, is not BIOS playing
role in it? Thanks,
--
Alexey
next prev parent reply other threads:[~2024-08-30 4:38 UTC|newest]
Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-23 13:21 [RFC PATCH 00/21] Secure VFIO, TDISP, SEV TIO Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 01/21] tsm-report: Rename module to reflect what it does Alexey Kardashevskiy
2024-08-23 22:17 ` Bjorn Helgaas
2024-08-28 13:49 ` Jonathan Cameron
2024-08-30 0:13 ` Dan Williams
2024-09-02 1:29 ` Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 02/21] pci/doe: Define protocol types and make those public Alexey Kardashevskiy
2024-08-23 22:18 ` Bjorn Helgaas
2024-08-30 2:15 ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 03/21] pci: Define TEE-IO bit in PCIe device capabilities Alexey Kardashevskiy
2024-08-23 22:19 ` Bjorn Helgaas
2024-08-28 13:54 ` Jonathan Cameron
2024-08-30 2:21 ` Dan Williams
2024-08-30 4:04 ` Alexey Kardashevskiy
2024-08-30 21:37 ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 04/21] PCI/IDE: Define Integrity and Data Encryption (IDE) extended capability Alexey Kardashevskiy
2024-08-23 22:28 ` Bjorn Helgaas
2024-08-28 14:24 ` Jonathan Cameron
2024-08-30 2:41 ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 05/21] crypto/ccp: Make some SEV helpers public Alexey Kardashevskiy
2024-08-30 2:45 ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 06/21] crypto: ccp: Enable SEV-TIO feature in the PSP when supported Alexey Kardashevskiy
2024-08-28 14:32 ` Jonathan Cameron
2024-09-03 21:27 ` Dan Williams
2024-09-05 2:29 ` Alexey Kardashevskiy
2024-09-05 17:40 ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 07/21] pci/tdisp: Introduce tsm module Alexey Kardashevskiy
2024-08-27 12:32 ` Jason Gunthorpe
2024-08-28 3:00 ` Alexey Kardashevskiy
2024-08-28 23:42 ` Jason Gunthorpe
2024-08-29 0:00 ` Dan Williams
2024-08-29 0:09 ` Jason Gunthorpe
2024-08-29 0:20 ` Dan Williams
2024-08-29 12:03 ` Jason Gunthorpe
2024-08-29 4:57 ` Alexey Kardashevskiy
2024-08-29 12:07 ` Jason Gunthorpe
2024-09-02 0:52 ` Alexey Kardashevskiy
2024-08-28 15:04 ` Jonathan Cameron
2024-09-02 6:50 ` Aneesh Kumar K.V
2024-09-02 7:26 ` Alexey Kardashevskiy
2024-09-03 23:51 ` Dan Williams
2024-09-04 11:13 ` Alexey Kardashevskiy
2024-09-04 23:28 ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 08/21] crypto/ccp: Implement SEV TIO firmware interface Alexey Kardashevskiy
2024-08-28 15:39 ` Jonathan Cameron
2024-08-23 13:21 ` [RFC PATCH 09/21] kvm: Export kvm_vm_set_mem_attributes Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 10/21] vfio: Export helper to get vfio_device from fd Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 11/21] KVM: SEV: Add TIO VMGEXIT and bind TDI Alexey Kardashevskiy
2024-08-29 10:08 ` Xu Yilun
2024-08-30 4:00 ` Alexey Kardashevskiy
2024-08-30 7:02 ` Xu Yilun
2024-09-02 1:24 ` Alexey Kardashevskiy
2024-09-13 13:50 ` Zhi Wang
2024-09-13 22:08 ` Dan Williams
2024-09-14 2:47 ` Tian, Kevin
2024-09-14 5:19 ` Zhi Wang
2024-09-18 10:45 ` Xu Yilun
2024-09-20 3:41 ` Tian, Kevin
2024-08-23 13:21 ` [RFC PATCH 12/21] KVM: IOMMUFD: MEMFD: Map private pages Alexey Kardashevskiy
2024-08-26 8:39 ` Tian, Kevin
2024-08-26 12:30 ` Jason Gunthorpe
2024-08-29 9:34 ` Xu Yilun
2024-08-29 12:15 ` Jason Gunthorpe
2024-08-30 3:47 ` Alexey Kardashevskiy
2024-08-30 12:35 ` Jason Gunthorpe
2024-09-02 1:09 ` Alexey Kardashevskiy
2024-09-02 23:52 ` Jason Gunthorpe
2024-09-03 0:03 ` Alexey Kardashevskiy
2024-09-03 0:37 ` Jason Gunthorpe
2024-08-30 5:20 ` Xu Yilun
2024-08-30 12:36 ` Jason Gunthorpe
2024-09-03 20:34 ` Dan Williams
2024-09-04 0:02 ` Jason Gunthorpe
2024-09-04 0:59 ` Dan Williams
2024-09-05 8:29 ` Tian, Kevin
2024-09-05 12:02 ` Jason Gunthorpe
2024-09-05 12:07 ` Tian, Kevin
2024-09-05 12:00 ` Jason Gunthorpe
2024-09-05 12:17 ` Tian, Kevin
2024-09-05 12:23 ` Jason Gunthorpe
2024-09-05 20:53 ` Dan Williams
2024-09-05 23:06 ` Jason Gunthorpe
2024-09-06 2:46 ` Tian, Kevin
2024-09-06 13:54 ` Jason Gunthorpe
2024-09-06 2:41 ` Tian, Kevin
2024-08-27 2:27 ` Alexey Kardashevskiy
2024-08-27 2:31 ` Tian, Kevin
2024-09-15 21:07 ` Jason Gunthorpe
2024-09-20 21:10 ` Vishal Annapurve
2024-09-23 5:35 ` Tian, Kevin
2024-09-23 6:34 ` Vishal Annapurve
2024-09-23 8:24 ` Tian, Kevin
2024-09-23 16:02 ` Jason Gunthorpe
2024-09-23 23:52 ` Tian, Kevin
2024-09-24 12:07 ` Jason Gunthorpe
2024-09-25 8:44 ` Vishal Annapurve
2024-09-25 15:41 ` Jason Gunthorpe
2024-09-23 20:53 ` Vishal Annapurve
2024-09-23 23:55 ` Tian, Kevin
2024-08-23 13:21 ` [RFC PATCH 13/21] KVM: X86: Handle private MMIO as shared Alexey Kardashevskiy
2024-08-30 16:57 ` Xu Yilun
2024-09-02 2:22 ` Alexey Kardashevskiy
2024-09-03 5:13 ` Xu Yilun
2024-09-06 3:31 ` Alexey Kardashevskiy
2024-09-09 10:07 ` Xu Yilun
2024-09-10 1:28 ` Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 14/21] RFC: iommu/iommufd/amd: Add IOMMU_HWPT_TRUSTED flag, tweak DTE's DomainID, IOTLB Alexey Kardashevskiy
2024-08-27 12:17 ` Jason Gunthorpe
2024-08-23 13:21 ` [RFC PATCH 15/21] coco/sev-guest: Allow multiple source files in the driver Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 16/21] coco/sev-guest: Make SEV-to-PSP request helpers public Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 17/21] coco/sev-guest: Implement the guest side of things Alexey Kardashevskiy
2024-08-28 15:54 ` Jonathan Cameron
2024-09-14 7:19 ` Zhi Wang
2024-09-16 1:18 ` Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 18/21] RFC: pci: Add BUS_NOTIFY_PCI_BUS_MASTER event Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 19/21] sev-guest: Stop changing encrypted page state for TDISP devices Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 20/21] pci: Allow encrypted MMIO mapping via sysfs Alexey Kardashevskiy
2024-08-23 22:37 ` Bjorn Helgaas
2024-09-02 8:22 ` Alexey Kardashevskiy
2024-09-03 21:46 ` Bjorn Helgaas
2024-08-23 13:21 ` [RFC PATCH 21/21] pci: Define pci_iomap_range_encrypted Alexey Kardashevskiy
2024-08-28 20:43 ` [RFC PATCH 00/21] Secure VFIO, TDISP, SEV TIO Dan Williams
2024-08-29 14:13 ` Alexey Kardashevskiy
2024-08-29 23:41 ` Dan Williams
2024-08-30 4:38 ` Alexey Kardashevskiy [this message]
2024-08-30 21:57 ` Dan Williams
2024-09-05 8:21 ` Tian, Kevin
2024-09-03 15:56 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fd8549e3-c2b3-47f7-b413-3007a60ba82b@amd.com \
--to=aik@amd.com \
--cc=agraf@suse.de \
--cc=alex.williamson@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=david.kaplan@amd.com \
--cc=dhaval.giani@amd.com \
--cc=iommu@lists.linux.dev \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=michael.day@amd.com \
--cc=michael.roth@amd.com \
--cc=nikunj@amd.com \
--cc=pratikrajesh.sampat@amd.com \
--cc=santosh.shukla@amd.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=thomas.lendacky@amd.com \
--cc=vasant.hegde@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox