From: Alex Williamson <alex.williamson@redhat.com>
To: Auger Eric <eric.auger@redhat.com>
Cc: peter.maydell@linaro.org, aik@ozlabs.ru, qemu-devel@nongnu.org,
peterx@redhat.com, qemu-arm@nongnu.org, pbonzini@redhat.com,
eric.auger.pro@gmail.com
Subject: Re: [Qemu-arm] [Qemu-devel] [PATCH v5 2/2] hw/vfio/common: Fail on VFIO/HW nested paging detection
Date: Fri, 30 Aug 2019 11:22:45 -0600 [thread overview]
Message-ID: <20190830112245.7e98d32d@x1.home> (raw)
In-Reply-To: <979ca496-3401-0d53-e42d-8e04922ece52@redhat.com>
On Fri, 30 Aug 2019 10:06:56 +0200
Auger Eric <eric.auger@redhat.com> wrote:
> Hi Alex,
>
> On 8/29/19 8:14 PM, Alex Williamson wrote:
> > On Thu, 29 Aug 2019 11:01:41 +0200
> > Eric Auger <eric.auger@redhat.com> wrote:
> >
> >> As of today, VFIO only works along with vIOMMU supporting
> >> caching mode. The SMMUv3 does not support this mode and
> >> requires HW nested paging to work properly with VFIO.
> >>
> >> So any attempt to run a VFIO device protected by such IOMMU
> >> would prevent the assigned device from working and at the
> >> moment the guest does not even boot as the default
> >> memory_region_iommu_replay() implementation attempts to
> >> translate the whole address space and completely stalls
> >> the guest.
> >
> > Why doesn't this stall an x86 guest?
> it does not stall on x86 since intel_iommu implements a custom replay
> (see vtd_iommu_replay) and you do not execute the dummy default one.
> This function performs a full page table walk, scanning all the valid
> entries and calling the MAP notifier on those. Although this operation
> is tedious it has nothing to compare against the dummy default replay
> function which calls translate() on the whole address range (on a page
> basis).
Ah right. OTOH, what are the arguments against smmuv3 providing a
replay function?
> > I'm a bit confused about what this provides versus the flag_changed
> > notifier looking for IOMMU_NOTIFIER_MAP, which AIUI is the common
> > deficiency between VT-d w/o caching-mode and SMMUv3 w/o nested mode.
> > The iommu notifier is registered prior to calling iommu_replay, so it
> > seems we already have an opportunity to do something there. Help me
> > understand why this is needed. Thanks,
>
> At the moment the smmuv3 notify_flag_changed callback implementation
> (smmuv3_notify_flag_changed) emits a warning when it detects a MAP
> notifier gets registered:
>
> warn_report("SMMUv3 does not support notification on MAP: "
> "device %s will not function properly", pcidev->name);
>
> and then the replay gets executed, looping forever.
>
> I could exit instead of emitting a warning but the drawback is that on
> vfio hotplug, it will also exit whereas we would rather simply reject
> the hotplug.
There are solutions to the above by modifying the existing framework
rather than creating a parallel solution though. For instance, could
memory_region_register_iommu_notifier() reject the notifier if the flag
change is incompatible, allowing the fault to propagate back to vfio
and taking a similar exit path as provided here.
> I think the solution based on the IOMMU MR attribute handles both the
> static and hotplug solutions. Also looking further, I will need this
> IOMMU MR attribute for 2stage SMMU integration (see [RFC v5 14/29]
> vfio: Force nested if iommu requires it). I know that it is standing
> for a while and it is still hypothetical but setting up 2stage will
> require specific treatments in the vfio common.c code, opt-in the
> 2stage mode, register specific iommu mr notifiers. Using the IOMMU MR
> attribute allows me to detect which kind of VFIO/IOMMU integration I
> need to setup.
Hmm, I'm certainly more on board with that use case. I guess the
question is whether the problem statement presented here justifies what
seems to be a parallel solution to what we have today, or could have
with some enhancements. Thanks,
Alex
> >>
> >> So let's fail on that case.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>
> >> ---
> >>
> >> v3 -> v4:
> >> - use IOMMU_ATTR_HW_NESTED_PAGING
> >> - do not abort anymore but jump to fail
> >> ---
> >> hw/vfio/common.c | 10 ++++++++++
> >> 1 file changed, 10 insertions(+)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 3e03c495d8..e8c009d019 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -606,9 +606,19 @@ static void
> >> vfio_listener_region_add(MemoryListener *listener, if
> >> (memory_region_is_iommu(section->mr)) { VFIOGuestIOMMU *giommu;
> >> IOMMUMemoryRegion *iommu_mr =
> >> IOMMU_MEMORY_REGION(section->mr);
> >> + bool nested;
> >> int iommu_idx;
> >>
> >> trace_vfio_listener_region_add_iommu(iova, end);
> >> +
> >> + if (!memory_region_iommu_get_attr(iommu_mr,
> >> +
> >> IOMMU_ATTR_NEED_HW_NESTED_PAGING,
> >> + (void *)&nested) &&
> >> nested) {
> >> + error_report("VFIO/vIOMMU integration based on HW
> >> nested paging "
> >> + "is not yet supported");
> >> + ret = -EINVAL;
> >> + goto fail;
> >> + }
> >> /*
> >> * FIXME: For VFIO iommu types which have KVM
> >> acceleration to
> >> * avoid bouncing all map/unmaps through qemu this way,
> >> this
> >
> >
WARNING: multiple messages have this Message-ID (diff)
From: Alex Williamson <alex.williamson@redhat.com>
To: Auger Eric <eric.auger@redhat.com>
Cc: peter.maydell@linaro.org, aik@ozlabs.ru, qemu-devel@nongnu.org,
peterx@redhat.com, qemu-arm@nongnu.org, pbonzini@redhat.com,
eric.auger.pro@gmail.com
Subject: Re: [Qemu-devel] [PATCH v5 2/2] hw/vfio/common: Fail on VFIO/HW nested paging detection
Date: Fri, 30 Aug 2019 11:22:45 -0600 [thread overview]
Message-ID: <20190830112245.7e98d32d@x1.home> (raw)
In-Reply-To: <979ca496-3401-0d53-e42d-8e04922ece52@redhat.com>
On Fri, 30 Aug 2019 10:06:56 +0200
Auger Eric <eric.auger@redhat.com> wrote:
> Hi Alex,
>
> On 8/29/19 8:14 PM, Alex Williamson wrote:
> > On Thu, 29 Aug 2019 11:01:41 +0200
> > Eric Auger <eric.auger@redhat.com> wrote:
> >
> >> As of today, VFIO only works along with vIOMMU supporting
> >> caching mode. The SMMUv3 does not support this mode and
> >> requires HW nested paging to work properly with VFIO.
> >>
> >> So any attempt to run a VFIO device protected by such IOMMU
> >> would prevent the assigned device from working and at the
> >> moment the guest does not even boot as the default
> >> memory_region_iommu_replay() implementation attempts to
> >> translate the whole address space and completely stalls
> >> the guest.
> >
> > Why doesn't this stall an x86 guest?
> it does not stall on x86 since intel_iommu implements a custom replay
> (see vtd_iommu_replay) and you do not execute the dummy default one.
> This function performs a full page table walk, scanning all the valid
> entries and calling the MAP notifier on those. Although this operation
> is tedious it has nothing to compare against the dummy default replay
> function which calls translate() on the whole address range (on a page
> basis).
Ah right. OTOH, what are the arguments against smmuv3 providing a
replay function?
> > I'm a bit confused about what this provides versus the flag_changed
> > notifier looking for IOMMU_NOTIFIER_MAP, which AIUI is the common
> > deficiency between VT-d w/o caching-mode and SMMUv3 w/o nested mode.
> > The iommu notifier is registered prior to calling iommu_replay, so it
> > seems we already have an opportunity to do something there. Help me
> > understand why this is needed. Thanks,
>
> At the moment the smmuv3 notify_flag_changed callback implementation
> (smmuv3_notify_flag_changed) emits a warning when it detects a MAP
> notifier gets registered:
>
> warn_report("SMMUv3 does not support notification on MAP: "
> "device %s will not function properly", pcidev->name);
>
> and then the replay gets executed, looping forever.
>
> I could exit instead of emitting a warning but the drawback is that on
> vfio hotplug, it will also exit whereas we would rather simply reject
> the hotplug.
There are solutions to the above by modifying the existing framework
rather than creating a parallel solution though. For instance, could
memory_region_register_iommu_notifier() reject the notifier if the flag
change is incompatible, allowing the fault to propagate back to vfio
and taking a similar exit path as provided here.
> I think the solution based on the IOMMU MR attribute handles both the
> static and hotplug solutions. Also looking further, I will need this
> IOMMU MR attribute for 2stage SMMU integration (see [RFC v5 14/29]
> vfio: Force nested if iommu requires it). I know that it is standing
> for a while and it is still hypothetical but setting up 2stage will
> require specific treatments in the vfio common.c code, opt-in the
> 2stage mode, register specific iommu mr notifiers. Using the IOMMU MR
> attribute allows me to detect which kind of VFIO/IOMMU integration I
> need to setup.
Hmm, I'm certainly more on board with that use case. I guess the
question is whether the problem statement presented here justifies what
seems to be a parallel solution to what we have today, or could have
with some enhancements. Thanks,
Alex
> >>
> >> So let's fail on that case.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>
> >> ---
> >>
> >> v3 -> v4:
> >> - use IOMMU_ATTR_HW_NESTED_PAGING
> >> - do not abort anymore but jump to fail
> >> ---
> >> hw/vfio/common.c | 10 ++++++++++
> >> 1 file changed, 10 insertions(+)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 3e03c495d8..e8c009d019 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -606,9 +606,19 @@ static void
> >> vfio_listener_region_add(MemoryListener *listener, if
> >> (memory_region_is_iommu(section->mr)) { VFIOGuestIOMMU *giommu;
> >> IOMMUMemoryRegion *iommu_mr =
> >> IOMMU_MEMORY_REGION(section->mr);
> >> + bool nested;
> >> int iommu_idx;
> >>
> >> trace_vfio_listener_region_add_iommu(iova, end);
> >> +
> >> + if (!memory_region_iommu_get_attr(iommu_mr,
> >> +
> >> IOMMU_ATTR_NEED_HW_NESTED_PAGING,
> >> + (void *)&nested) &&
> >> nested) {
> >> + error_report("VFIO/vIOMMU integration based on HW
> >> nested paging "
> >> + "is not yet supported");
> >> + ret = -EINVAL;
> >> + goto fail;
> >> + }
> >> /*
> >> * FIXME: For VFIO iommu types which have KVM
> >> acceleration to
> >> * avoid bouncing all map/unmaps through qemu this way,
> >> this
> >
> >
next prev parent reply other threads:[~2019-08-30 17:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-29 9:01 [Qemu-arm] [PATCH v5 0/2] VFIO/SMMUv3: Fail on VFIO/HW nested paging detection Eric Auger
2019-08-29 9:01 ` [Qemu-devel] " Eric Auger
2019-08-29 9:01 ` [Qemu-arm] [PATCH v5 1/2] memory: Add IOMMU_ATTR_NEED_HW_NESTED_PAGING IOMMU memory region attribute Eric Auger
2019-08-29 9:01 ` [Qemu-devel] " Eric Auger
2019-08-29 9:01 ` [Qemu-arm] [PATCH v5 2/2] hw/vfio/common: Fail on VFIO/HW nested paging detection Eric Auger
2019-08-29 9:01 ` [Qemu-devel] " Eric Auger
2019-08-29 18:14 ` [Qemu-arm] " Alex Williamson
2019-08-29 18:14 ` [Qemu-devel] " Alex Williamson
2019-08-30 8:06 ` [Qemu-arm] " Auger Eric
2019-08-30 8:06 ` Auger Eric
2019-08-30 17:22 ` Alex Williamson [this message]
2019-08-30 17:22 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190830112245.7e98d32d@x1.home \
--to=alex.williamson@redhat.com \
--cc=aik@ozlabs.ru \
--cc=eric.auger.pro@gmail.com \
--cc=eric.auger@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.