Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eric Auger <eric.auger@redhat.com>
To: "Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
	"eric.auger.pro@gmail.com" <eric.auger.pro@gmail.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"qemu-arm@nongnu.org" <qemu-arm@nongnu.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"jean-philippe@linaro.org" <jean-philippe@linaro.org>
Cc: "alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"clg@redhap.com" <clg@redhap.com>,
	"bharat.bhushan@nxp.com" <bharat.bhushan@nxp.com>,
	"peter.maydell@linaro.org" <peter.maydell@linaro.org>
Subject: Re: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment
Date: Wed, 5 Jul 2023 08:19:59 +0200	[thread overview]
Message-ID: <1e6ae4e2-cd7c-16c4-440f-119399d0f551@redhat.com> (raw)
In-Reply-To: <SJ0PR11MB674435F2BA42A8925ABA336F922FA@SJ0PR11MB6744.namprd11.prod.outlook.com>

Hi Zhenzhong,
On 7/5/23 06:52, Duan, Zhenzhong wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Tuesday, July 4, 2023 7:15 PM
>> Subject: [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device
>> assignment
>>
>> When running on a 64kB page size host and protecting a VFIO device
>> with the virtio-iommu, qemu crashes with this kind of message:
>>
>> qemu-kvm: virtio-iommu page mask 0xfffffffffffff000 is incompatible
>> with mask 0x20010000
> Does 0x20010000 mean only  512MB and 64KB super page mapping is
> supported for host iommu hw? 4KB mapping not supported?
Yes that's correct. In that case the host has 64kB page and 4kB is not
supported.
>
> There is a check in guest kernel side hint only 4KB is supported, with
> this patch we force viommu->pgsize_bitmap to 0x20010000
> and fail below check? Does this device work in guest?
> Please correct me if I understand wrong.
my guest also has 64kB so the check hereafter succeeds. effectively in
case your host has a larger page size than your guest it fails with
[    2.147031] virtio-pci 0000:00:01.0: granule 0x10000 larger than
system page size 0x1000
[    7.231128] ixgbevf 0000:00:02.0: granule 0x10000 larger than system
page size 0x1000

>
> static int viommu_domain_finalise(struct viommu_endpoint *vdev,
>                                   struct iommu_domain *domain)
> {
> ...
>         viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
>         if (viommu_page_size > PAGE_SIZE) {
>                 dev_err(vdev->dev,
>                         "granule 0x%lx larger than system page size 0x%lx\n",
>                         viommu_page_size, PAGE_SIZE);
>                 return -ENODEV;
>         }
>
> Another question is: Presume 0x20010000 does mean only 512MB and 64KB
> is supported. Is host hva->hpa mapping ensured to be compatible with at least
> 64KB mapping? If host mmu only support 4KB mapping or other non-compatible
> with 0x20010000, will vfio_dma_map() fail?
the page size mask is retrieved with VFIO_IOMMU_GET_INFO uapi
which returns host vfio_iommu_type1 vfio_iommu->pgsize_bitmap. This
latter is initialized to host PAGE_MASK and later restricted on
vfio_iommu_type1_attach_group though the vfio_update_pgsize_bitmap() calls

so yes both IOMMU and CPU page size are garanteed to be compatible.

>
>> qemu: hardware error: vfio: DMA mapping failed, unable to continue
>>
>> This is due to the fact the IOMMU MR corresponding to the VFIO device
>> is enabled very late on domain attach, after the machine init.
>> The device reports a minimal 64kB page size but it is too late to be
>> applied. virtio_iommu_set_page_size_mask() fails and this causes
>> vfio_listener_region_add() to end up with hw_error();
>>
>> To work around this issue, we transiently enable the IOMMU MR on
>> machine init to collect the page size requirements and then restore
>> the bypass state.
>>
>> Fixes: 90519b9053 ("virtio-iommu: Add bypass mode support to assigned
>> device")
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>> include/hw/virtio/virtio-iommu.h |  2 ++
>> hw/virtio/virtio-iommu.c         | 30 ++++++++++++++++++++++++++++--
>> hw/virtio/trace-events           |  1 +
>> 3 files changed, 31 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-
>> iommu.h
>> index 2ad5ee320b..a93fc5383e 100644
>> --- a/include/hw/virtio/virtio-iommu.h
>> +++ b/include/hw/virtio/virtio-iommu.h
>> @@ -61,6 +61,8 @@ struct VirtIOIOMMU {
>>     QemuRecMutex mutex;
>>     GTree *endpoints;
>>     bool boot_bypass;
>> +    Notifier machine_done;
>> +    bool granule_frozen;
>> };
>>
>> #endif
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index 1cd258135d..1eaf81bab5 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -24,6 +24,7 @@
>> #include "hw/virtio/virtio.h"
>> #include "sysemu/kvm.h"
>> #include "sysemu/reset.h"
>> +#include "sysemu/sysemu.h"
>> #include "qapi/error.h"
>> #include "qemu/error-report.h"
>> #include "trace.h"
>> @@ -1106,12 +1107,12 @@ static int
>> virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
>>     }
>>
>>     /*
>> -     * After the machine is finalized, we can't change the mask anymore. If by
>> +     * Once the granule is frozen we can't change the mask anymore. If by
>>      * chance the hotplugged device supports the same granule, we can still
>>      * accept it. Having a different masks is possible but the guest will use
>>      * sub-optimal block sizes, so warn about it.
>>      */
>> -    if (phase_check(PHASE_MACHINE_READY)) {
>> +    if (s->granule_frozen) {
>>         int new_granule = ctz64(new_mask);
>>         int cur_granule = ctz64(cur_mask);
>>
>> @@ -1146,6 +1147,27 @@ static void virtio_iommu_system_reset(void
>> *opaque)
>>
>> }
>>
>> +static void virtio_iommu_freeze_granule(Notifier *notifier, void *data)
>> +{
>> +    VirtIOIOMMU *s = container_of(notifier, VirtIOIOMMU, machine_done);
>> +    bool boot_bypass = s->config.bypass;
>> +    int granule;
>> +
>> +    /*
>> +     * Transient IOMMU MR enable to collect page_size_mask requirement
>> +     * through memory_region_iommu_set_page_size_mask() called by
>> +     * VFIO region_add() callback
>> +     */
>> +    s->config.bypass = 0;
>> +    virtio_iommu_switch_address_space_all(s);
>> +    /* restore default */
>> +    s->config.bypass = boot_bypass;
>> +    virtio_iommu_switch_address_space_all(s);
>> +    s->granule_frozen = true;
>> +    granule = ctz64(s->config.page_size_mask);
>> +    trace_virtio_iommu_freeze_granule(BIT(granule));
>> +}
> It looks a bit heavy here just in order to get page_size_mask from host side.
> But maybe this is the only way with current interface.

the problem comes from the fact the regions are aliased due to the
bypass and vfio_listener_region_add() does not get a chance to be called
until the actual domain attach. So I do not see any other way to
transiently enable the region.

At least I could check if boot bypass is set before doing that dance. I
will respin with that.

Thanks

Eric
>
> Thanks
> Zhenzhong
>
>> +
>> static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
>> {
>>     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> @@ -1189,6 +1211,9 @@ static void virtio_iommu_device_realize(DeviceState
>> *dev, Error **errp)
>>         error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!");
>>     }
>>
>> +    s->machine_done.notify = virtio_iommu_freeze_granule;
>> +    qemu_add_machine_init_done_notifier(&s->machine_done);
>> +
>>     qemu_register_reset(virtio_iommu_system_reset, s);
>> }
>>
>> @@ -1198,6 +1223,7 @@ static void
>> virtio_iommu_device_unrealize(DeviceState *dev)
>>     VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
>>
>>     qemu_unregister_reset(virtio_iommu_system_reset, s);
>> +    qemu_remove_machine_init_done_notifier(&s->machine_done);
>>
>>     g_hash_table_destroy(s->as_by_busptr);
>>     if (s->domains) {
>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>> index 8f8d05cf9b..68b752e304 100644
>> --- a/hw/virtio/trace-events
>> +++ b/hw/virtio/trace-events
>> @@ -131,6 +131,7 @@ virtio_iommu_set_page_size_mask(const char *name,
>> uint64_t old, uint64_t new) "m
>> virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
>> virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
>> virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn,
>> bool on) "Device %02x:%02x.%x switching address space (iommu
>> enabled=%d)"
>> +virtio_iommu_freeze_granule(uint64_t page_size_mask) "granule set to
>> 0x%"PRIx64
>>
>> # virtio-mem.c
>> virtio_mem_send_response(uint16_t type) "type=%" PRIu16
>> --
>> 2.38.1

next prev parent reply	other threads:[~2023-07-05  6:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-04 11:15 [PATCH 0/2] VIRTIO-IOMMU/VFIO page size related fixes Eric Auger
2023-07-04 11:15 ` [PATCH 1/2] virtio-iommu: Fix 64kB host page size VFIO device assignment Eric Auger
2023-07-05  4:52   ` Duan, Zhenzhong
2023-07-05  6:19     ` Eric Auger [this message]
2023-07-05  8:11       ` Duan, Zhenzhong
2023-07-05  8:29     ` Jean-Philippe Brucker
2023-07-05 10:13       ` Duan, Zhenzhong
2023-07-05 11:33         ` Jean-Philippe Brucker
2023-07-06  9:00           ` Duan, Zhenzhong
2023-07-04 11:15 ` [PATCH 2/2] virtio-iommu: Rework the trace in virtio_iommu_set_page_size_mask() Eric Auger
2023-07-05  4:55   ` Duan, Zhenzhong
2023-07-05  8:17     ` Duan, Zhenzhong
2023-07-05 13:16       ` Eric Auger
2023-07-06  8:56         ` Duan, Zhenzhong
2023-07-06 14:35         ` Jean-Philippe Brucker
2023-07-06 15:38           ` Eric Auger
2023-07-07  2:34           ` Duan, Zhenzhong
2023-07-06 18:52         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e6ae4e2-cd7c-16c4-440f-119399d0f551@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bharat.bhushan@nxp.com \
    --cc=clg@redhap.com \
    --cc=eric.auger.pro@gmail.com \
    --cc=jean-philippe@linaro.org \
    --cc=mst@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).