From: Peter Xu <zhexu@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: brijesh.singh@amd.com, mst@redhat.com, qemu-devel@nongnu.org,
Peter Xu <zhexu@redhat.com>,
eric.auger@redhat.com, Suravee.Suthikulpanit@amd.com
Subject: Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias support
Date: Tue, 30 Jul 2019 09:18:47 +0800 [thread overview]
Message-ID: <20190730011847.GB19232@xz-x1> (raw)
In-Reply-To: <20190729130441.5657a535@x1.home>
On Mon, Jul 29, 2019 at 01:04:41PM -0600, Alex Williamson wrote:
> On Mon, 29 Jul 2019 16:26:46 +0800
> Peter Xu <zhexu@redhat.com> wrote:
>
> > On Fri, Jul 26, 2019 at 06:55:53PM -0600, Alex Williamson wrote:
> > > When we account for DMA aliases in the PCI address space, we can no
> > > longer use a single IVHD entry in the IVRS covering all devices. We
> > > instead need to walk the PCI bus and create alias ranges when we find
> > > a conventional bus. These alias ranges cannot overlap with a "Select
> > > All" range (as currently implemented), so we also need to enumerate
> > > each device with IVHD entries.
> > >
> > > Importantly, the IVHD entries used here include a Device ID, which is
> > > simply the PCI BDF (Bus/Device/Function). The guest firmware is
> > > responsible for programming bus numbers, so the final revision of this
> > > table depends on the update mechanism (acpi_build_update) to be called
> > > after guest PCI enumeration.
> >
> > Ouch... so the ACPI build procedure is after those guest PCI code!
> > Could I ask how do you find this? :) It seems much easier for sure
> > this way...
>
> I believe this is what MST was referring to with the MCFG update,
> acpi_build() is called from both acpi_setup() and acpi_build_update(),
> the latter is setup in numerous places to be called via a mechanism
> that re-writes the table on certain guest actions. For instance
> acpi_add_rom_blob() passes this function as a callback which turns into
> a select_cb in fw_cfg, such that (aiui) the tables are updated before
> the guest reads them. I added some fprintfs in the PCI config write
> path to confirm that the update callback occurs after PCI enumeration
> for both SeaBIOS and OVMF. Since we seem to have other dependencies on
> this ordering elsewhere, I don't think that the IVRS update is unique
> in this regard.
Agreed.
[...]
> > We've implmented the similar logic for multiple times:
> >
> > - When we want to do DMA (pci_requester_id)
> > - When we want to fetch the DMA address space (the previous patch)
> > - When we fill in the AMD ACPI table (this patch)
> >
> > Do you think we can generalize them somehow? I'm thinking how about
> > we directly fetch RID in the 2nd/3rd use case using pci_requester_id()
> > (which existed already) and simply use it?
>
> For this patch, I think we could use pci_requester_id() for dev_id_b
> above, but we still need to walk the buses and devices, so it really
> only simplifies the 'if (pci_is_express...' code block above, and
> barely at that since we're at the point in the topology where such a
> decision is made. For the previous patch, pci_requester_id() returns a
> BDF and that code is executed before bus numbers are programmed. We
> might still make use of requester_id_cache, but a different interface
> would be necessary. Also note how pci_req_id_cache_get() assumes we're
> looking for the requester ID as seen from the root bus while
> pci_device_iommu_address_space() is from the bus hosting iommu_fn.
> While these are generally the same in practice, it's not required. I'd
> therefore tend to leave that to some future consolidation. I can
> update the comment to include that justification in the previous patch.
Yes, we can work on top in the future if needed. I see that Michael
already plan to merge this version, then it may not worth a repost for
the comment (unless there will be a repost, we could mark a TODO).
>
> > > + /*
> > > + * A PCI bus walk, for each PCI host bridge, is necessary to create a
> > > + * complete set of IVHD entries. Do this into a separate blob so that we
> > > + * can calculate the total IVRS table length here and then append the new
> > > + * blob further below. Fall back to an entry covering all devices, which
> > > + * is sufficient when no aliases are present.
> > > + */
> > > + object_child_foreach_recursive(object_get_root(),
> > > + ivrs_host_bridges, ivhd_blob);
> > > +
> > > + if (!ivhd_blob->len) {
> > > + /*
> > > + * Type 1 device entry reporting all devices
> > > + * These are 4-byte device entries currently reporting the range of
> > > + * Refer to Spec - Table 95:IVHD Device Entry Type Codes(4-byte)
> > > + */
> > > + build_append_int_noprefix(ivhd_blob, 0x0000001, 4);
> > > + }
> >
> > Is there a real use case for ivhd_blob->len==0?
>
> It was mostly paranoia, but I believe it's really only an Intel
> convention that the PCI host bridge appears as a device on the bus. It
> seems possible that we could have a host bridge with no devices, in
> which case we'd insert this select-all entry to make the IVRS valid.
> Perhaps in combination with AMD exposing their IOMMU as a device on the
> PCI bus this is not really an issue, but it's a trivial safety net.
> Thanks,
That question was only for curiousity. This code path will only
trigger when AMD vIOMMU is detected so I assume the IOMMU device
should always be there, but of course it won't hurt as a safety net.
Thanks for doing this!
--
Peter Xu
next prev parent reply other threads:[~2019-07-30 1:19 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-27 0:55 [Qemu-devel] [for-4.2 PATCH 0/2] PCI DMA alias support Alex Williamson
2019-07-27 0:55 ` [Qemu-devel] [for-4.2 PATCH 1/2] pci: Use PCI aliases when determining device IOMMU address space Alex Williamson
2019-07-27 0:55 ` [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias support Alex Williamson
2019-07-29 8:26 ` Peter Xu
2019-07-29 19:04 ` Alex Williamson
2019-07-30 1:18 ` Peter Xu [this message]
2019-07-29 19:15 ` [Qemu-devel] [for-4.2 PATCH 0/2] PCI " Michael S. Tsirkin
2019-08-19 21:23 ` Alex Williamson
2019-10-14 2:59 ` Peter Xu
2019-10-15 17:53 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190730011847.GB19232@xz-x1 \
--to=zhexu@redhat.com \
--cc=Suravee.Suthikulpanit@amd.com \
--cc=alex.williamson@redhat.com \
--cc=brijesh.singh@amd.com \
--cc=eric.auger@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).