qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <zhexu@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: brijesh.singh@amd.com, mst@redhat.com, qemu-devel@nongnu.org,
	Peter Xu <zhexu@redhat.com>,
	eric.auger@redhat.com, Suravee.Suthikulpanit@amd.com
Subject: Re: [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias support
Date: Tue, 30 Jul 2019 09:18:47 +0800	[thread overview]
Message-ID: <20190730011847.GB19232@xz-x1> (raw)
In-Reply-To: <20190729130441.5657a535@x1.home>

On Mon, Jul 29, 2019 at 01:04:41PM -0600, Alex Williamson wrote:
> On Mon, 29 Jul 2019 16:26:46 +0800
> Peter Xu <zhexu@redhat.com> wrote:
> 
> > On Fri, Jul 26, 2019 at 06:55:53PM -0600, Alex Williamson wrote:
> > > When we account for DMA aliases in the PCI address space, we can no
> > > longer use a single IVHD entry in the IVRS covering all devices.  We
> > > instead need to walk the PCI bus and create alias ranges when we find
> > > a conventional bus.  These alias ranges cannot overlap with a "Select
> > > All" range (as currently implemented), so we also need to enumerate
> > > each device with IVHD entries.
> > > 
> > > Importantly, the IVHD entries used here include a Device ID, which is
> > > simply the PCI BDF (Bus/Device/Function).  The guest firmware is
> > > responsible for programming bus numbers, so the final revision of this
> > > table depends on the update mechanism (acpi_build_update) to be called
> > > after guest PCI enumeration.  
> > 
> > Ouch... so the ACPI build procedure is after those guest PCI code!
> > Could I ask how do you find this? :) It seems much easier for sure
> > this way...
> 
> I believe this is what MST was referring to with the MCFG update,
> acpi_build() is called from both acpi_setup() and acpi_build_update(),
> the latter is setup in numerous places to be called via a mechanism
> that re-writes the table on certain guest actions.  For instance
> acpi_add_rom_blob() passes this function as a callback which turns into
> a select_cb in fw_cfg, such that (aiui) the tables are updated before
> the guest reads them.  I added some fprintfs in the PCI config write
> path to confirm that the update callback occurs after PCI enumeration
> for both SeaBIOS and OVMF.  Since we seem to have other dependencies on
> this ordering elsewhere, I don't think that the IVRS update is unique
> in this regard.

Agreed.

[...]

> > We've implmented the similar logic for multiple times:
> > 
> >   - When we want to do DMA (pci_requester_id)
> >   - When we want to fetch the DMA address space (the previous patch)
> >   - When we fill in the AMD ACPI table (this patch)
> > 
> > Do you think we can generalize them somehow?  I'm thinking how about
> > we directly fetch RID in the 2nd/3rd use case using pci_requester_id()
> > (which existed already) and simply use it?
> 
> For this patch, I think we could use pci_requester_id() for dev_id_b
> above, but we still need to walk the buses and devices, so it really
> only simplifies the 'if (pci_is_express...' code block above, and
> barely at that since we're at the point in the topology where such a
> decision is made.  For the previous patch, pci_requester_id() returns a
> BDF and that code is executed before bus numbers are programmed.  We
> might still make use of requester_id_cache, but a different interface
> would be necessary.  Also note how pci_req_id_cache_get() assumes we're
> looking for the requester ID as seen from the root bus while
> pci_device_iommu_address_space() is from the bus hosting iommu_fn.
> While these are generally the same in practice, it's not required.  I'd
> therefore tend to leave that to some future consolidation.  I can
> update the comment to include that justification in the previous patch.

Yes, we can work on top in the future if needed.  I see that Michael
already plan to merge this version, then it may not worth a repost for
the comment (unless there will be a repost, we could mark a TODO).

> 
> > > +    /*
> > > +     * A PCI bus walk, for each PCI host bridge, is necessary to create a
> > > +     * complete set of IVHD entries.  Do this into a separate blob so that we
> > > +     * can calculate the total IVRS table length here and then append the new
> > > +     * blob further below.  Fall back to an entry covering all devices, which
> > > +     * is sufficient when no aliases are present.
> > > +     */
> > > +    object_child_foreach_recursive(object_get_root(),
> > > +                                   ivrs_host_bridges, ivhd_blob);
> > > +
> > > +    if (!ivhd_blob->len) {
> > > +        /*
> > > +         *   Type 1 device entry reporting all devices
> > > +         *   These are 4-byte device entries currently reporting the range of
> > > +         *   Refer to Spec - Table 95:IVHD Device Entry Type Codes(4-byte)
> > > +         */
> > > +        build_append_int_noprefix(ivhd_blob, 0x0000001, 4);
> > > +    }  
> > 
> > Is there a real use case for ivhd_blob->len==0?
> 
> It was mostly paranoia, but I believe it's really only an Intel
> convention that the PCI host bridge appears as a device on the bus.  It
> seems possible that we could have a host bridge with no devices, in
> which case we'd insert this select-all entry to make the IVRS valid.
> Perhaps in combination with AMD exposing their IOMMU as a device on the
> PCI bus this is not really an issue, but it's a trivial safety net.
> Thanks,

That question was only for curiousity.  This code path will only
trigger when AMD vIOMMU is detected so I assume the IOMMU device
should always be there, but of course it won't hurt as a safety net.

Thanks for doing this!

-- 
Peter Xu


  reply	other threads:[~2019-07-30  1:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-27  0:55 [Qemu-devel] [for-4.2 PATCH 0/2] PCI DMA alias support Alex Williamson
2019-07-27  0:55 ` [Qemu-devel] [for-4.2 PATCH 1/2] pci: Use PCI aliases when determining device IOMMU address space Alex Williamson
2019-07-27  0:55 ` [Qemu-devel] [for-4.2 PATCH 2/2] hw/i386: AMD-Vi IVRS DMA alias support Alex Williamson
2019-07-29  8:26   ` Peter Xu
2019-07-29 19:04     ` Alex Williamson
2019-07-30  1:18       ` Peter Xu [this message]
2019-07-29 19:15 ` [Qemu-devel] [for-4.2 PATCH 0/2] PCI " Michael S. Tsirkin
2019-08-19 21:23   ` Alex Williamson
2019-10-14  2:59     ` Peter Xu
2019-10-15 17:53       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190730011847.GB19232@xz-x1 \
    --to=zhexu@redhat.com \
    --cc=Suravee.Suthikulpanit@amd.com \
    --cc=alex.williamson@redhat.com \
    --cc=brijesh.singh@amd.com \
    --cc=eric.auger@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).