From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gavin Shan <gshan@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org, jugraham@redhat.com,
shan.gavin@gmail.com, stefanha@redhat.com, qemu-block@nongnu.org
Subject: Re: [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible
Date: Wed, 10 Jun 2026 05:49:21 -0400 [thread overview]
Message-ID: <20260610041036-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20260608001821.850921-1-gshan@redhat.com>
On Mon, Jun 08, 2026 at 10:18:21AM +1000, Gavin Shan wrote:
> On the guest where a NVidia's GH100 card is passed from the host, the
> guest system hang can be observed on attempt to compile 'cuda-samples',
> as reported by Julia.
>
> host$ lspci | grep GH100
> 0009:01:00.0 3D controller: NVIDIA Corporation GH100 [GH200 120GB / 480GB] (rev a1)
> host$ /home/sandbox/gavin/qemu.main/build/qemu-system-aarch64 -accel kvm \
> -machine virt,gic-version=host,ras=on,highmem-mmio-size=4T \
> -cpu host -smp cpus=32 -m size=8G \
> -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=d0 \
> -device virtio-blk-pci,id=vb0,bus=pcie.0,drive=d0,num-queues=4 \
> -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.1.0
>
> guest$ cd cuda-samples/build
> guest$ make -j 20 clean
> guest$ make -j 20
> :
> [ 54%] Linking CUDA executable graphMemoryNodes
> [ 54%] Built target graphMemoryNodes
> <no more output afterwards, guest becomes frozen here>
>
> guest$ qemu-system-aarch64: virtio: bogus descriptor or out of resources
> [ 555.814025] virtio_blk virtio0: [vda] new size: 268435456 512-byte logical blocks (137 GB/128 GiB)
>
> When the GPU's driver (NVidia open driver) is loaded on guest bootup,
> the memory blocks residing in the PCI BAR can be presented to the guest
> through memory hot-add. The page cache can be allocated from the hot added
> memory blocks when cuda-samples is being built. Afterwards, he page cache
> is sent to QEMU's virtio-blk device as part of the DMA request, the bounce
> buffer is used to accomodate the request as the corresponding memory
> region (MemoryRegion) is a RAM DEVICE region in qemu. For this specific
> case, false is returned from memory_access_is_direct() in the path where
> the DMA request is handled.
>
> QEMU
> ====
> virtio_blk_handle_output
> virtio_blk_handle_vq
> virtio_blk_get_request
> virtqueue_pop
> virtqueue_split_pop
> virtqueue_map_desc
> address_space_map
> memory_access_is_direct # Return false
> memory_region_supports_direct_access
>
> (qemu) info mtree
> :
> memory-region: pci_bridge_pci
> 0000000000000000-ffffffffffffffff (prio 0, container): pci_bridge_pci
> 0000042000000000-0000043fffffffff (prio 1, i/o): 0009:01:00.0 base BAR 4
> 0000042000000000-0000043fffffffff (prio 0, i/o): 0009:01:00.0 BAR 4
> 0000042000000000-000004379fffffff (prio 0, ramd): 0009:01:00.0 BAR 4 mmaps[0]
>
> By default, the max bounce buffer size is only 4096 bytes, even less
> than one page when the guest page is 64KB. This tries to fix the issue
> by inheriting the customized max bounce buffer size of the virtio bus's
> parent through property 'x-max-bounce-buffer-size' when the customized
> size is a larger one. With this applied, no guest system hang is seen
> with '-device virtio-blk-pci,...,x-max-bounce-buffer-size=268435456'.
>
> Reported-by: Julia Graham <jugraham@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
> hw/virtio/virtio-bus.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index cef944e015..e0933823f3 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -42,6 +42,7 @@ do { printf("virtio_bus: " fmt , ## __VA_ARGS__); } while (0)
> /* A VirtIODevice is being plugged */
> void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> {
> + AddressSpace *as;
> DeviceState *qdev = DEVICE(vdev);
> BusState *qbus = BUS(qdev_get_parent_bus(qdev));
> VirtioBusState *bus = VIRTIO_BUS(qbus);
> @@ -100,6 +101,19 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> return;
> }
> }
> + } else {
> + /*
> + * The maximal bounce buffer size of the virtio bus's parent may
> + * have been customized by property 'x-max-bounce-buffer-size'.
> + * Lets inherit the customized size if it's larger than the
> + * current one.
> + */
> + as = klass->get_dma_as ? klass->get_dma_as(qbus->parent) : NULL;
> + if (as) {
> + vdev->dma_as->max_bounce_buffer_size = MAX(
> + vdev->dma_as->max_bounce_buffer_size,
> + as->max_bounce_buffer_size);
> + }
> }
> }
>
> --
> 2.54.0
Problem with all this is, users would not know how to size this.
So fundamentally, is not the issue that virtio blk (and scsi!) maps
all of the buffer all the time?
It's not hard to add something like virtio_pop_unmapped that would not map,
then build QEMUSGLists out of addr/len pairs and submit these.
Stefan, do you think doing it like this would be bad for perf? Good for
perf?
--
MST
prev parent reply other threads:[~2026-06-10 9:49 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 0:18 [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible Gavin Shan
2026-06-08 8:55 ` Daniel P. Berrangé
2026-06-08 11:11 ` Gavin Shan
2026-06-08 11:38 ` Daniel P. Berrangé
2026-06-09 2:08 ` Gavin Shan
2026-06-09 16:25 ` Peter Xu
2026-06-10 0:32 ` Gavin Shan
2026-06-10 9:54 ` Pavel Hrdina
2026-06-10 10:55 ` Gavin Shan
2026-06-10 12:12 ` Michael S. Tsirkin
2026-06-10 12:19 ` Gavin Shan
2026-06-10 12:27 ` Michael S. Tsirkin
2026-06-10 13:00 ` Gavin Shan
2026-06-10 13:54 ` Gavin Shan
2026-06-10 14:06 ` Michael S. Tsirkin
2026-06-10 12:23 ` Pavel Hrdina
2026-06-10 14:04 ` Gavin Shan
2026-06-10 14:08 ` Michael S. Tsirkin
2026-06-10 9:49 ` Michael S. Tsirkin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260610041036-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=gshan@redhat.com \
--cc=jugraham@redhat.com \
--cc=qemu-arm@nongnu.org \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shan.gavin@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.