public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@amd.com>
To: Dan Williams <dan.j.williams@intel.com>, kvm@vger.kernel.org
Cc: iommu@lists.linux.dev, linux-coco@lists.linux.dev,
	linux-pci@vger.kernel.org,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	pratikrajesh.sampat@amd.com, michael.day@amd.com,
	david.kaplan@amd.com, dhaval.giani@amd.com,
	Santosh Shukla <santosh.shukla@amd.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Michael Roth <michael.roth@amd.com>,
	Alexander Graf <agraf@suse.de>,
	Nikunj A Dadhania <nikunj@amd.com>,
	Vasant Hegde <vasant.hegde@amd.com>,
	Lukas Wunner <lukas@wunner.de>
Subject: Re: [RFC PATCH 00/21] Secure VFIO, TDISP, SEV TIO
Date: Fri, 30 Aug 2024 14:38:21 +1000	[thread overview]
Message-ID: <fd8549e3-c2b3-47f7-b413-3007a60ba82b@amd.com> (raw)
In-Reply-To: <66d1072ea0590_31daf294e8@dwillia2-xfh.jf.intel.com.notmuch>



On 30/8/24 09:41, Dan Williams wrote:
> Alexey Kardashevskiy wrote:
> [..]
>>>> - skipping various enforcements of non-SME or
>>>> SWIOTLB in the guest;
>>>
>>> Is this based on some concept of private vs shared mode devices?
>>>
>>>> No mixed share+private DMA supported within the
>>>> same IOMMU.
>>>
>>> What does this mean? A device may not have mixed mappings (makes sense),
>>
>> Currently devices do not have an idea about private host memory (but it
>> is being worked on afaik).
> 
> Worked on where? You mean the PCI core indicating that a device is
> private or not? Is that not indicated by guest-side TSM connection
> state?
> >>> or an IOMMU can not host devices that do not all agree on whether 
DMA is
>>> private or shared?
>>
>> The hardware allows that via hardware-assisted vIOMMU and I/O page
>> tables in the guest with C-bit takes into accound by the IOMMU but the
>> software support is missing right now. So for this initial drop, vTOM is
>> used for DMA - this thing says "everything below <addr> is private,
>> above <addr> - shared" so nothing needs to bother with the C-bit, and in
>> my exercise I set the <addr> to the allowed maximum.
>>
>> So each IOMMUFD instance in the VM is either "all private mappings" or
>> "all shared". Could be half/half by moving that <addr> :)
> 
> I thought existing use cases assume that the CC-VM can trigger page
> conversions at will without regard to a vTOM concept? It would be nice
> to have that address-map separation arrangement, has not that ship
> already sailed?

Mmm. I am either confusing you too much or not following you :) Any page 
can be converted, the proposed arrangement would require that 
convertion-candidate-pages are allocated from a specific pool?

There are two vTOMs - one in IOMMU to decide on Cbit for DMA trafic (I 
use this one), one in VMSA ("VIRTUAL_TOM") for guest memory (this 
exercise is not using it). Which one do you mean?

> 
> [..]
>>> Would the device not just launch in "shared" mode until it is later
>>> converted to private? I am missing the detail of why passing the device
>>> on the command line requires that private memory be mapped early.
>>
>> A sequencing problem.
>>
>> QEMU "realizes" a VFIO device, it creates an iommufd instance which
>> creates a domain and writes to a DTE (a IOMMU descriptor for PCI BDFn).
>> And DTE is not updated after than. For secure stuff, DTE needs to be
>> slightly different. So right then I tell IOMMUFD that it will handle
>> private memory.
>>
>> Then, the same VFIO "realize" handler maps the guest memory in iommufd.
>> I use the same flag (well, pointer to kvm) in the iommufd pinning code,
>> private memory is pinned and mapped (and related page state change
>> happens as the guest memory is made guest-owned in RMP).
>>
>> QEMU goes to machine_reset() and calls "SNP LAUNCH UPDATE" (the actual
>> place changed recenly, huh) and the latter will measure the guest and
>> try making all guest memory private but it already happened => error.
>>
>> I think I have to decouple the pinning and the IOMMU/DTE setting.
>>
>>> That said, the implication that private device assignment requires
>>> hotplug events is a useful property. This matches nicely with initial
>>> thoughts that device conversion events are violent and might as well be
>>> unplug/replug events to match all the assumptions around what needs to
>>> be updated.
>>
>> For the initial drop, I tell QEMU via "-device vfio-pci,x-tio=true" that
>> it is going to be private so there should be no massive conversion.
> 
> That's a SEV-TIO RFC-specific hack, or a proposal?

Not sure at the moment :)

> An approach that aligns more closely with the VFIO operational model,
> where it maps and waits for guest faults / usages, is that QEMU would be
> told that the device is "bind capable", because the host is not in a
> position to assume how the guest will use the device. A "bind capable"
> device operates in shared mode unless and until the guest triggers
> private conversion.

True. I just started this exercise without QEMU DiscardManager. Now I 
rely on it but it either needs to allow dynamic flip from 
discarded==private to discarded==shared (should do for now) or  allow 3 
states for guest pages.

>>>> This requires the BME hack as MMIO and
>>>
>>> Not sure what the "BME hack" is, I guess this is foreshadowing for later
>>> in this story.
>>   >
>>>> BusMaster enable bits cannot be 0 after MMIO
>>>> validation is done
>>>
>>> It would be useful to call out what is a TDISP requirement, vs
>>> device-specific DSM vs host-specific TSM requirement. In this case I
>>> assume you are referring to PCI 6.2 11.2.6 where it notes that TDIs must
>>
>> Oh there is 6.2 already.
>>
>>> enter the TDISP ERROR state if BME is cleared after the device is
>>> locked?
>>>
>>> ...but this begs the question of whether it needs to be avoided outright
>>
>> Well, besides a couple of avoidable places (like testing INTx support
>> which we know is not going to work on VFs anyway), a standard driver
>> enables MSE first (and the value for the command register does not have
>> 1 for BME) and only then BME. TBH I do not think writing BME=0 when
>> BME=0 already is "clearing" but my test device disagrees.
> 
> ...but we should not be creating kernel policy around test devices. What
> matters is real devices. Now, if it is likely that real / production
> devices will go into the TDISP ERROR state by not coalescing MSE + BME
> updates then we need a solution.

True but I do not even know who to ask this question :)

> Given it is unlikely that TDISP support will be widespread any time soon
> it is likely tenable to assume TDISP compatible drivers call a new:
> 
>     pci_enable(pdev, PCI_ENABLE_TARGET | PCI_ENABLE_INITIATOR);
> 
> ...or something like that to coalesce command register writes.
> 
> Otherwise if that retrofit ends up being too much work or confusion then
> the ROI of teaching the PCI core to recover this scenario needs to be
> evaluated.

Agree.

>>> or handled as an error recovery case dependending on policy.
>>
>> Avoding seems more straight forward unless we actually want enlightened
>> device drivers which want to examine the interface report before
>> enabling the device. Not sure.
> 
> If TDISP capable devices trends towards a handful of devices in the near
> term then some driver fixups seems reasonable. Otherwise if every PCI
> device driver Linux has ever seens needs to be ready for that device to
> have a TDISP capable flavor then mitigating this in the PCI core makes
> more sense than playing driver whack-a-mole.
 >
>>>> the guest OS booting process when this
>>>> appens.
>>>>
>>>> SVSM could help addressing these (not
>>>> implemented at the moment).
>>>
>>> At first though avoiding SVSM entanglements where the kernel can be
>>> enlightened shoud be the policy. I would only expect SVSM hacks to cover
>>> for legacy OSes that will never be TDISP enlightened, but in that case
>>> we are likely talking about fully unaware L2. Lets assume fully
>>> enlightened L1 for now.
>>
>> Well, I could also tweak OVMF to make necessary calls to the PSP and
>> hack QEMU to postpone the command register updates to get this going,
>> just a matter of ugliness.
> 
> Per above, the tradeoff should be in ROI, not ugliness. I don't see how
> OVMF helps when devices might be being virtually hotplugged or reset.

I have no clue how exactly hotplug works on x86, is not BIOS playing 
role in it? Thanks,


-- 
Alexey


  reply	other threads:[~2024-08-30  4:38 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-23 13:21 [RFC PATCH 00/21] Secure VFIO, TDISP, SEV TIO Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 01/21] tsm-report: Rename module to reflect what it does Alexey Kardashevskiy
2024-08-23 22:17   ` Bjorn Helgaas
2024-08-28 13:49   ` Jonathan Cameron
2024-08-30  0:13   ` Dan Williams
2024-09-02  1:29     ` Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 02/21] pci/doe: Define protocol types and make those public Alexey Kardashevskiy
2024-08-23 22:18   ` Bjorn Helgaas
2024-08-30  2:15   ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 03/21] pci: Define TEE-IO bit in PCIe device capabilities Alexey Kardashevskiy
2024-08-23 22:19   ` Bjorn Helgaas
2024-08-28 13:54   ` Jonathan Cameron
2024-08-30  2:21   ` Dan Williams
2024-08-30  4:04     ` Alexey Kardashevskiy
2024-08-30 21:37       ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 04/21] PCI/IDE: Define Integrity and Data Encryption (IDE) extended capability Alexey Kardashevskiy
2024-08-23 22:28   ` Bjorn Helgaas
2024-08-28 14:24   ` Jonathan Cameron
2024-08-30  2:41   ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 05/21] crypto/ccp: Make some SEV helpers public Alexey Kardashevskiy
2024-08-30  2:45   ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 06/21] crypto: ccp: Enable SEV-TIO feature in the PSP when supported Alexey Kardashevskiy
2024-08-28 14:32   ` Jonathan Cameron
2024-09-03 21:27   ` Dan Williams
2024-09-05  2:29     ` Alexey Kardashevskiy
2024-09-05 17:40       ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 07/21] pci/tdisp: Introduce tsm module Alexey Kardashevskiy
2024-08-27 12:32   ` Jason Gunthorpe
2024-08-28  3:00     ` Alexey Kardashevskiy
2024-08-28 23:42       ` Jason Gunthorpe
2024-08-29  0:00         ` Dan Williams
2024-08-29  0:09           ` Jason Gunthorpe
2024-08-29  0:20             ` Dan Williams
2024-08-29 12:03               ` Jason Gunthorpe
2024-08-29  4:57         ` Alexey Kardashevskiy
2024-08-29 12:07           ` Jason Gunthorpe
2024-09-02  0:52             ` Alexey Kardashevskiy
2024-08-28 15:04   ` Jonathan Cameron
2024-09-02  6:50   ` Aneesh Kumar K.V
2024-09-02  7:26     ` Alexey Kardashevskiy
2024-09-03 23:51   ` Dan Williams
2024-09-04 11:13     ` Alexey Kardashevskiy
2024-09-04 23:28       ` Dan Williams
2024-08-23 13:21 ` [RFC PATCH 08/21] crypto/ccp: Implement SEV TIO firmware interface Alexey Kardashevskiy
2024-08-28 15:39   ` Jonathan Cameron
2024-08-23 13:21 ` [RFC PATCH 09/21] kvm: Export kvm_vm_set_mem_attributes Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 10/21] vfio: Export helper to get vfio_device from fd Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 11/21] KVM: SEV: Add TIO VMGEXIT and bind TDI Alexey Kardashevskiy
2024-08-29 10:08   ` Xu Yilun
2024-08-30  4:00     ` Alexey Kardashevskiy
2024-08-30  7:02       ` Xu Yilun
2024-09-02  1:24         ` Alexey Kardashevskiy
2024-09-13 13:50   ` Zhi Wang
2024-09-13 22:08     ` Dan Williams
2024-09-14  2:47       ` Tian, Kevin
2024-09-14  5:19         ` Zhi Wang
2024-09-18 10:45           ` Xu Yilun
2024-09-20  3:41             ` Tian, Kevin
2024-08-23 13:21 ` [RFC PATCH 12/21] KVM: IOMMUFD: MEMFD: Map private pages Alexey Kardashevskiy
2024-08-26  8:39   ` Tian, Kevin
2024-08-26 12:30     ` Jason Gunthorpe
2024-08-29  9:34       ` Xu Yilun
2024-08-29 12:15         ` Jason Gunthorpe
2024-08-30  3:47           ` Alexey Kardashevskiy
2024-08-30 12:35             ` Jason Gunthorpe
2024-09-02  1:09               ` Alexey Kardashevskiy
2024-09-02 23:52                 ` Jason Gunthorpe
2024-09-03  0:03                   ` Alexey Kardashevskiy
2024-09-03  0:37                     ` Jason Gunthorpe
2024-08-30  5:20           ` Xu Yilun
2024-08-30 12:36             ` Jason Gunthorpe
2024-09-03 20:34               ` Dan Williams
2024-09-04  0:02                 ` Jason Gunthorpe
2024-09-04  0:59                   ` Dan Williams
2024-09-05  8:29                     ` Tian, Kevin
2024-09-05 12:02                       ` Jason Gunthorpe
2024-09-05 12:07                         ` Tian, Kevin
2024-09-05 12:00                     ` Jason Gunthorpe
2024-09-05 12:17                       ` Tian, Kevin
2024-09-05 12:23                         ` Jason Gunthorpe
2024-09-05 20:53                           ` Dan Williams
2024-09-05 23:06                             ` Jason Gunthorpe
2024-09-06  2:46                               ` Tian, Kevin
2024-09-06 13:54                                 ` Jason Gunthorpe
2024-09-06  2:41                             ` Tian, Kevin
2024-08-27  2:27     ` Alexey Kardashevskiy
2024-08-27  2:31       ` Tian, Kevin
2024-09-15 21:07   ` Jason Gunthorpe
2024-09-20 21:10     ` Vishal Annapurve
2024-09-23  5:35       ` Tian, Kevin
2024-09-23  6:34         ` Vishal Annapurve
2024-09-23  8:24           ` Tian, Kevin
2024-09-23 16:02             ` Jason Gunthorpe
2024-09-23 23:52               ` Tian, Kevin
2024-09-24 12:07                 ` Jason Gunthorpe
2024-09-25  8:44                   ` Vishal Annapurve
2024-09-25 15:41                     ` Jason Gunthorpe
2024-09-23 20:53             ` Vishal Annapurve
2024-09-23 23:55               ` Tian, Kevin
2024-08-23 13:21 ` [RFC PATCH 13/21] KVM: X86: Handle private MMIO as shared Alexey Kardashevskiy
2024-08-30 16:57   ` Xu Yilun
2024-09-02  2:22     ` Alexey Kardashevskiy
2024-09-03  5:13       ` Xu Yilun
2024-09-06  3:31         ` Alexey Kardashevskiy
2024-09-09 10:07           ` Xu Yilun
2024-09-10  1:28             ` Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 14/21] RFC: iommu/iommufd/amd: Add IOMMU_HWPT_TRUSTED flag, tweak DTE's DomainID, IOTLB Alexey Kardashevskiy
2024-08-27 12:17   ` Jason Gunthorpe
2024-08-23 13:21 ` [RFC PATCH 15/21] coco/sev-guest: Allow multiple source files in the driver Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 16/21] coco/sev-guest: Make SEV-to-PSP request helpers public Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 17/21] coco/sev-guest: Implement the guest side of things Alexey Kardashevskiy
2024-08-28 15:54   ` Jonathan Cameron
2024-09-14  7:19   ` Zhi Wang
2024-09-16  1:18     ` Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 18/21] RFC: pci: Add BUS_NOTIFY_PCI_BUS_MASTER event Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 19/21] sev-guest: Stop changing encrypted page state for TDISP devices Alexey Kardashevskiy
2024-08-23 13:21 ` [RFC PATCH 20/21] pci: Allow encrypted MMIO mapping via sysfs Alexey Kardashevskiy
2024-08-23 22:37   ` Bjorn Helgaas
2024-09-02  8:22     ` Alexey Kardashevskiy
2024-09-03 21:46       ` Bjorn Helgaas
2024-08-23 13:21 ` [RFC PATCH 21/21] pci: Define pci_iomap_range_encrypted Alexey Kardashevskiy
2024-08-28 20:43 ` [RFC PATCH 00/21] Secure VFIO, TDISP, SEV TIO Dan Williams
2024-08-29 14:13   ` Alexey Kardashevskiy
2024-08-29 23:41     ` Dan Williams
2024-08-30  4:38       ` Alexey Kardashevskiy [this message]
2024-08-30 21:57         ` Dan Williams
2024-09-05  8:21     ` Tian, Kevin
2024-09-03 15:56 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd8549e3-c2b3-47f7-b413-3007a60ba82b@amd.com \
    --to=aik@amd.com \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=david.kaplan@amd.com \
    --cc=dhaval.giani@amd.com \
    --cc=iommu@lists.linux.dev \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=michael.day@amd.com \
    --cc=michael.roth@amd.com \
    --cc=nikunj@amd.com \
    --cc=pratikrajesh.sampat@amd.com \
    --cc=santosh.shukla@amd.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vasant.hegde@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox