From: Alex Williamson <alex.williamson@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Michael S . Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
QEMU Stable <qemu-stable@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] intel_iommu: do address space switching when reset
Date: Thu, 6 Sep 2018 12:41:36 -0600 [thread overview]
Message-ID: <20180906124136.0b966adc@t450s.home> (raw)
In-Reply-To: <20180906065312.GD16937@xz-x1>
On Thu, 6 Sep 2018 14:53:12 +0800
Peter Xu <peterx@redhat.com> wrote:
> On Wed, Sep 05, 2018 at 08:55:50AM -0600, Alex Williamson wrote:
> > On Wed, 5 Sep 2018 19:31:58 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >
> > > We will drop all the mappings when system reset, however we'll still
> > > keep the existing memory layouts. That'll be problematic since if IOMMU
> > > is enabled in the guest and then reboot the guest, SeaBIOS will try to
> > > drive a device that with no page mapped there. What we need to do is to
> > > rebuild the GPA->HPA mapping when system resets, hence ease SeaBIOS.
> > >
> > > Without this patch, a guest that boots on an assigned NVMe device might
> > > fail to find the boot device after a system reboot/reset and we'll be
> > > able to observe SeaBIOS errors if turned on:
> > >
> > > WARNING - Timeout at nvme_wait:144!
> > >
> > > With the patch applied, the guest will be able to find the NVMe drive
> > > and bootstrap there even after multiple reboots or system resets.
> > >
> > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1625173
> > > CC: QEMU Stable <qemu-stable@nongnu.org>
> > > Tested-by: Cong Li <coli@redhat.com>
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > > hw/i386/intel_iommu.c | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index 3dfada19a6..d3eb068d43 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -3231,6 +3231,14 @@ static void vtd_reset(DeviceState *dev)
> > > * When device reset, throw away all mappings and external caches
> > > */
> > > vtd_address_space_unmap_all(s);
> > > +
> > > + /*
> > > + * Switch address spaces if needed (e.g., when reboot from a
> > > + * kernel that has IOMMU enabled, we should switch address spaces
> > > + * to rebuild the GPA->HPA mappings otherwise SeaBIOS might
> > > + * encounter DMA errors when running with e.g. a NVMe card).
> > > + */
> > > + vtd_switch_address_space_all(s);
> > > }
> > >
> > > static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> >
> > I'm curious why these aren't part of vtd_init(). vtd_init is where
> > GCMD is set back to it's power-on state, which disables translation, so
> > logically we should reset the address space at that point. Similarly,
> > the root entry is reset, so it would make sense to throw away all the
> > mappings there too. Thanks,
>
> vtd_init() is only called when realize() or reset, and AFAIU it's not
> called by GCMD operations. However I think I get the point that
> logically we should do similar things in e.g. vtd_handle_gcmd_srtp()
> when the enable bit switches.
>
> My understanding is that if other things happened rather than the
> system reboot (e.g., when root pointer is replaced, or during the
> guest running the guest driver turns DMAR from on to off) the guest
> will be responsible to do the rest of invalidations first before doing
> that switch, so we'll possibly do the unmap_all() and address space
> switches in other places (e.g., in vtd_context_global_invalidate, or
> per device invalidations).
AIUI, the entire global command register is write-once, so the guest
cannot disable the IOMMU or change the root pointer after it's been
initialized, except through a system reset. I think that means the
guest can only operate through the invalidation queue at runtime. The
bug being fixed here is that the IOMMU has been reset to its power-on
state where translation is disabled, but the emulation of that disabled
state also needs to return the per-device address space to that of
system memory, or identity map thereof. The commit log seems to imply
that there's some sort of SeaBIOS issue and we're just doing this to
help the BIOS, when in reality, we just forgot to reset the per device
address space and a subsequent boot of a guest that didn't enable the
IOMMU would have the same sort of issues. In fact any usage of an
assigned device prior to re-enabling the IOMMU would fail, there's
nothing unique to NVMe here. Thanks,
Alex
next prev parent reply other threads:[~2018-09-06 18:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-05 11:31 [Qemu-devel] [PATCH] intel_iommu: do address space switching when reset Peter Xu
2018-09-05 14:55 ` Alex Williamson
2018-09-06 6:53 ` Peter Xu
2018-09-06 18:41 ` Alex Williamson [this message]
2018-09-07 1:00 ` Peter Xu
2018-09-07 1:56 ` Alex Williamson
2018-09-07 2:21 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180906124136.0b966adc@t450s.home \
--to=alex.williamson@redhat.com \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).