All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible
@ 2026-06-08  0:18 Gavin Shan
  2026-06-08  8:55 ` Daniel P. Berrangé
  0 siblings, 1 reply; 4+ messages in thread
From: Gavin Shan @ 2026-06-08  0:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, mst, jugraham, shan.gavin

On the guest where a NVidia's GH100 card is passed from the host, the
guest system hang can be observed on attempt to compile 'cuda-samples',
as reported by Julia.

   host$ lspci | grep GH100
   0009:01:00.0 3D controller: NVIDIA Corporation GH100 [GH200 120GB / 480GB] (rev a1)
   host$ /home/sandbox/gavin/qemu.main/build/qemu-system-aarch64 -accel kvm \
         -machine virt,gic-version=host,ras=on,highmem-mmio-size=4T         \
         -cpu host -smp cpus=32 -m size=8G                                  \
         -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=d0    \
         -device virtio-blk-pci,id=vb0,bus=pcie.0,drive=d0,num-queues=4     \
         -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.1.0

   guest$ cd cuda-samples/build
   guest$ make -j 20 clean
   guest$ make -j 20
               :
   [ 54%] Linking CUDA executable graphMemoryNodes
   [ 54%] Built target graphMemoryNodes
   <no more output afterwards, guest becomes frozen here>

   guest$ qemu-system-aarch64: virtio: bogus descriptor or out of resources
   [  555.814025] virtio_blk virtio0: [vda] new size: 268435456 512-byte logical blocks (137 GB/128 GiB)

When the GPU's driver (NVidia open driver) is loaded on guest bootup,
the memory blocks residing in the PCI BAR can be presented to the guest
through memory hot-add. The page cache can be allocated from the hot added
memory blocks when cuda-samples is being built. Afterwards, he page cache
is sent to QEMU's virtio-blk device as part of the DMA request, the bounce
buffer is used to accomodate the request as the corresponding memory
region (MemoryRegion) is a RAM DEVICE region in qemu. For this specific
case, false is returned from memory_access_is_direct() in the path where
the DMA request is handled.

  QEMU
  ====
  virtio_blk_handle_output
    virtio_blk_handle_vq
      virtio_blk_get_request
        virtqueue_pop
          virtqueue_split_pop
            virtqueue_map_desc
              address_space_map
                memory_access_is_direct         # Return false
                  memory_region_supports_direct_access

  (qemu) info mtree
          :
  memory-region: pci_bridge_pci
    0000000000000000-ffffffffffffffff (prio 0, container): pci_bridge_pci
      0000042000000000-0000043fffffffff (prio 1, i/o): 0009:01:00.0 base BAR 4
        0000042000000000-0000043fffffffff (prio 0, i/o): 0009:01:00.0 BAR 4
          0000042000000000-000004379fffffff (prio 0, ramd): 0009:01:00.0 BAR 4 mmaps[0]

By default, the max bounce buffer size is only 4096 bytes, even less
than one page when the guest page is 64KB. This tries to fix the issue
by inheriting the customized max bounce buffer size of the virtio bus's
parent through property 'x-max-bounce-buffer-size' when the customized
size is a larger one. With this applied, no guest system hang is seen
with '-device virtio-blk-pci,...,x-max-bounce-buffer-size=268435456'.

Reported-by: Julia Graham <jugraham@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/virtio/virtio-bus.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index cef944e015..e0933823f3 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -42,6 +42,7 @@ do { printf("virtio_bus: " fmt , ## __VA_ARGS__); } while (0)
 /* A VirtIODevice is being plugged */
 void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
 {
+    AddressSpace *as;
     DeviceState *qdev = DEVICE(vdev);
     BusState *qbus = BUS(qdev_get_parent_bus(qdev));
     VirtioBusState *bus = VIRTIO_BUS(qbus);
@@ -100,6 +101,19 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
                 return;
             }
         }
+    } else {
+        /*
+         * The maximal bounce buffer size of the virtio bus's parent may
+         * have been customized by property 'x-max-bounce-buffer-size'.
+         * Lets inherit the customized size if it's larger than the
+         * current one.
+         */
+        as = klass->get_dma_as ? klass->get_dma_as(qbus->parent) : NULL;
+        if (as) {
+            vdev->dma_as->max_bounce_buffer_size = MAX(
+                    vdev->dma_as->max_bounce_buffer_size,
+                    as->max_bounce_buffer_size);
+        }
     }
 }
 
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible
  2026-06-08  0:18 [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible Gavin Shan
@ 2026-06-08  8:55 ` Daniel P. Berrangé
  2026-06-08 11:11   ` Gavin Shan
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel P. Berrangé @ 2026-06-08  8:55 UTC (permalink / raw)
  To: Gavin Shan; +Cc: qemu-devel, qemu-arm, mst, jugraham, shan.gavin

On Mon, Jun 08, 2026 at 10:18:21AM +1000, Gavin Shan wrote:
> On the guest where a NVidia's GH100 card is passed from the host, the
> guest system hang can be observed on attempt to compile 'cuda-samples',
> as reported by Julia.

snip

> By default, the max bounce buffer size is only 4096 bytes, even less
> than one page when the guest page is 64KB. This tries to fix the issue
> by inheriting the customized max bounce buffer size of the virtio bus's
> parent through property 'x-max-bounce-buffer-size' when the customized
> size is a larger one. With this applied, no guest system hang is seen
> with '-device virtio-blk-pci,...,x-max-bounce-buffer-size=268435456'.

"x-max-bounce-buffer-size"  is an experimental / unsupported property.

We really shouldn't be expecting users to have to set this in a production
deployment in order to stop a guest from hanging.  Even if we dropped the
experimental marker from this property, users would still need to know to
provide this magic setting, so it would still be broken out of the box.

How can we  get a solution that "just works" out of the box, which is
fully supported, not relying on experimental properties ?

> 
> Reported-by: Julia Graham <jugraham@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  hw/virtio/virtio-bus.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible
  2026-06-08  8:55 ` Daniel P. Berrangé
@ 2026-06-08 11:11   ` Gavin Shan
  2026-06-08 11:38     ` Daniel P. Berrangé
  0 siblings, 1 reply; 4+ messages in thread
From: Gavin Shan @ 2026-06-08 11:11 UTC (permalink / raw)
  To: Daniel P. Berrangé, Peter Xu
  Cc: qemu-devel, qemu-arm, mst, jugraham, shan.gavin

Hi Daniel,

On 6/8/26 6:55 PM, Daniel P. Berrangé wrote:
> On Mon, Jun 08, 2026 at 10:18:21AM +1000, Gavin Shan wrote:
>> On the guest where a NVidia's GH100 card is passed from the host, the
>> guest system hang can be observed on attempt to compile 'cuda-samples',
>> as reported by Julia.
> 
> snip
> 

Thanks for looking into this.

>> By default, the max bounce buffer size is only 4096 bytes, even less
>> than one page when the guest page is 64KB. This tries to fix the issue
>> by inheriting the customized max bounce buffer size of the virtio bus's
>> parent through property 'x-max-bounce-buffer-size' when the customized
>> size is a larger one. With this applied, no guest system hang is seen
>> with '-device virtio-blk-pci,...,x-max-bounce-buffer-size=268435456'.
> 
> "x-max-bounce-buffer-size"  is an experimental / unsupported property.
> 
> We really shouldn't be expecting users to have to set this in a production
> deployment in order to stop a guest from hanging.  Even if we dropped the
> experimental marker from this property, users would still need to know to
> provide this magic setting, so it would still be broken out of the box.
> 
> How can we  get a solution that "just works" out of the box, which is
> fully supported, not relying on experimental properties ?
> 

How do we know that "x-max-bounce-buffer-size" is an experimental or unsupported
property? I guess the properties whose names start with "x-" are all treated as
experimental and unsupported?

For this case, the bounce buffer is inevitable as the memory region can't be
directly accessed. The memory region is initialized by memory_region_init_ram_device_ptr()
in hw/vfio/region.c::vfio_region_mmap(). So the question is how the allowed
bounce buffer size can be specified by users, and it's why the existing property
"x-max-bounce-buffer-size" is reused.

I even thought of a new property for MachineState (e.g. "limited-bounce-buffer"),
which is set to on by default, following the existing behavior. When it's set to
off by users, the max (allowed) buffer size won't be checked at all. However, I'm
not sure if this makes sense at all.

>>
>> Reported-by: Julia Graham <jugraham@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   hw/virtio/virtio-bus.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
> 
> With regards,
> Daniel

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible
  2026-06-08 11:11   ` Gavin Shan
@ 2026-06-08 11:38     ` Daniel P. Berrangé
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel P. Berrangé @ 2026-06-08 11:38 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Peter Xu, qemu-devel, qemu-arm, mst, jugraham, shan.gavin

On Mon, Jun 08, 2026 at 09:11:50PM +1000, Gavin Shan wrote:
> Hi Daniel,
> 
> On 6/8/26 6:55 PM, Daniel P. Berrangé wrote:
> > On Mon, Jun 08, 2026 at 10:18:21AM +1000, Gavin Shan wrote:
> > > On the guest where a NVidia's GH100 card is passed from the host, the
> > > guest system hang can be observed on attempt to compile 'cuda-samples',
> > > as reported by Julia.
> > 
> > snip
> > 
> 
> Thanks for looking into this.

NB, I didn't really look into it beyond noticing the suggestion
that users set an "x-" property as a proposed solution to failing
to boot, which raised a red-flag to me from a usability POV.

I don't really know anything about the underlying technical problems
here, so can't offer specific guidance in that area.

> 
> > > By default, the max bounce buffer size is only 4096 bytes, even less
> > > than one page when the guest page is 64KB. This tries to fix the issue
> > > by inheriting the customized max bounce buffer size of the virtio bus's
> > > parent through property 'x-max-bounce-buffer-size' when the customized
> > > size is a larger one. With this applied, no guest system hang is seen
> > > with '-device virtio-blk-pci,...,x-max-bounce-buffer-size=268435456'.
> > 
> > "x-max-bounce-buffer-size"  is an experimental / unsupported property.
> > 
> > We really shouldn't be expecting users to have to set this in a production
> > deployment in order to stop a guest from hanging.  Even if we dropped the
> > experimental marker from this property, users would still need to know to
> > provide this magic setting, so it would still be broken out of the box.
> > 
> > How can we  get a solution that "just works" out of the box, which is
> > fully supported, not relying on experimental properties ?
> > 
> 
> How do we know that "x-max-bounce-buffer-size" is an experimental or unsupported
> property? I guess the properties whose names start with "x-" are all treated as
> experimental and unsupported?

Yes, any QEMU property starting with 'x-' is experimental/unstable/
unsupported and can be changed/withdrawn at any time.  Libvirt will
not provide any way to configure 'x-' properties, as it requires a
supported/stable solution from QEMU.

With regards,
Daniel
-- 
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-08 11:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08  0:18 [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible Gavin Shan
2026-06-08  8:55 ` Daniel P. Berrangé
2026-06-08 11:11   ` Gavin Shan
2026-06-08 11:38     ` Daniel P. Berrangé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.