From: "Michael S. Tsirkin" <mst@redhat.com>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: "Gavin Shan" <gshan@redhat.com>, "Peter Xu" <peterx@redhat.com>,
"Pavel Hrdina" <phrdina@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
qemu-devel@nongnu.org, qemu-arm@nongnu.org, jugraham@redhat.com,
shan.gavin@gmail.com, "Alex Williamson" <alex@shazbot.org>,
"David Hildenbrand" <david@kernel.org>
Subject: Re: [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible
Date: Thu, 11 Jun 2026 12:42:32 -0400 [thread overview]
Message-ID: <20260611123952-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAFEAcA9HUGfDgHjvCemPPNkp6FfvQp+4y7qCnw+ROFpG3+ybMw@mail.gmail.com>
On Thu, Jun 11, 2026 at 05:16:09PM +0100, Peter Maydell wrote:
> On Thu, 11 Jun 2026 at 16:57, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jun 11, 2026 at 04:29:49PM +0100, Peter Maydell wrote:
> > > On Thu, 11 Jun 2026 at 16:05, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > What is "OK"? If the BAR is RAM and I write into it, I will overwrite
> > > > data guest stored there. Is that "OK"?
> > >
> > > By "OK" I mean it behaves like RAM: any kind of load or store
> > > will work, at any width and any alignment.
> > > That is what makes
> > > it OK to return the pointer from address_space_map(), where
> > > the caller will treat it as any other C host pointer (e.g.
> > > assigning it to a C struct pointer and then writing to fields
> > > in that struct), like virtio does:
> > >
> > > iov[num_sg].iov_base = dma_memory_map(vdev->dma_as, pa, &len,
> > > is_write ?
> > > DMA_DIRECTION_FROM_DEVICE :
> > > DMA_DIRECTION_TO_DEVICE,
> > > MEMTXATTRS_UNSPECIFIED);
> > > if (!iov[num_sg].iov_base) {
> > > virtio_error(vdev, "virtio: bogus descriptor or out of resources");
> > > goto out;
> > > }
> > > [etc]
> > >
> > > This is only safe when the pointer is something we can
> > > handle as if it were a pointer to normal RAM (because the
> > > C compiler makes no promises about only doing correctly
> > > aligned and sized accesses: to it, memory is memory).
> >
> >
> >
> > Safe as in what? Work meaning what?
> >
> > mmap of a device register is a standard API in Linux, it gives you a
> > pointer and no, it does not imply that "any access" will not make your
> > box go up in smoke, and never did.
>
> Yes. So you need to be careful about what you do with the
> pointers you get into something you mmap()ed. In particular,
> if it's not actually RAM then you can't just hand it to the
> C compiler and say "this is a pointer to some data structure",
> because the C compiler is not going to restrict itself to
> only doing the things that the hardware says are in spec.
> That's why we don't want to mark these memory regions as
> "direct access OK".
>
> > > > If the BAR is RAM and I write into it, I will overwrite
> > > > data guest stored there. Is that "OK"?
> > >
> > > If the guest said "please DMA to this address" then they're
> > > expecting you to overwrite that data, yes.
> >
> > The "If the guest said" is critical here. It's not OK
> > at a random time and place of your choosing, and never will be.
> >
> > It's exactly what I said: if we are doing what guest does,
> > like fixed length memcpy does,
> > then we are good. If instead we are replacing guest accesses
> > with different ones like variable length memcpy does,
> > we'll get issues.
>
> But these accesses are not guest accesses. They're device
> accesses (by a device model doing DMA, including the virtio
> device). So "what access might we do" depends on the device spec.
> There's no "guest software did a 4 byte write" that we're
> converting into something else.
Device does it because guest told it to do it.
It is really the same.
> > > Merging in your other reply:
> > >
> > > >> The
> > > >> problem is that for some PCI devices (like the network card
> > > >> mentioned in 4a2e242bbb30's commit message) the BAR is *not*
> > > >> safe for arbitrary access (because the actual real host hardware
> > > >> inside it is not RAM).
> > > >
> > > > But we don't do arbitrary access. Why would we?
> > >
> > > If we mark the MR as "direct access OK", then we do, or might do.
> >
> > what does this might mean?
>
> It means that by marking the MR as "direct access" we say
> "these are permitted". Whether they happen or not depends
> on what the code in QEMU does and what the C compiler does
> with that. If you don't want them, don't say they're OK.
>
> > > The bug the commit is fixing is exactly that there is a codepath
> > > that uses the latitude that marking the MR as direct access permits.
> >
> >
> > No, the bug is that a 4 byte fixed length access was converted
> > to a variable length memcpy function call which started
> > doing single byte accesses.
> >
> >
> > >
> > > > There's no such thing as "not safe for direct access" in PCI.
> > > > All operations are memory operations.
> > > > What can be unsafe is accesses of specific width and length.
> > >
> > > I think we're talking at cross purposes here. By "safe for
> > > direct access" I mean exactly that arbitrary width and
> > > length and alignment are permitted.
> >
> > This is a strong demands that QEMU simply never needs.
>
> It does for address_space_map(). That's what that gives you.
yes no part of qem use that.
> > > This is what
> > > memory_region_supports_direct_access() is testing.
> > > If the PCI BAR can't handle arbitrary widths etc then that
> > > function mustn't return true for its MR.
> > >
> > > (Though if we ever get into trying to address_space_map()
> > > a BAR like that something is going to go wrong anyway, because
> > > the "not OK for direct access" path fills and empties the bounce
> > > buffer by calling flatview_read() and address_space_write(),
> > > which aren't going to honour any theoretical alignment requirements.
> > > Perhaps that kind of BAR should just flat out not be one we handle
> > > with a ram_device MR?)
> >
> >
> > mmap of a device is a standard thing that everyone did on unix, for
> > decades. just don't do things, with a device, that a driver does not do.
> >
> > > > > (I think there are places where we do need to be more careful about
> > > > > what we do with accesses to real RAM, for where we're emulating a
> > > > > device write to RAM that's updating a data structure shared with the
> > > > > guest, and things like writing multiple times can cause problems.
> > > > > https://lore.kernel.org/qemu-devel/CAFEAcA8dwHV8F48kb-013rxkG9kKcZhym9_qarKmoeUfeh0YWw@mail.gmail.com/
> > > > > is an unrelated example of that, which I haven't done detailed
> > > > > analysis of yet.)
> > >
> > > > If it does not work, then QEMU is broken:
> > > >
> > > >
> > > > /*
> > > > * Any compiler worth its salt will turn these memcpy into native unaligned
> > > > * operations. Thus we don't need to play games with packed attributes, or
> > > > * inline byte-by-byte stores.
> > > > * Some compilation environments (eg some fortify-source implementations)
> > > > * may intercept memcpy() in a way that defeats the compiler optimization,
> > > > * though, so we use __builtin_memcpy() to give ourselves the best chance
> > > > * of good performance.
> > > > */
> > >
> > > Good point. (Though if I were an optimizing compiler I might
> > > be tempted to optimize a switch() on the length where each
> > > case called memmove with a fixed length argument back into a
> > > single call to memmove...)
> > >
> > > I suspect your suggested flatview changes would fix that e1000
> > > bug. But I don't think they're the right thing for this problem.
>
> > Then I'd like to see an example where we have an actual good reason
> > to do execute arbitrary width accesses on device RAM when the
> > driver does not execute them, please.
>
> That's the case we started with, with this GPU where the
> virtio data structures are in this PCI BAR. We want the
> virtio backend to have the freedom to say "I'm just going
> to assume the ring buffer etc is in RAM, and I don't need to
> care about carefully ensuring that I only do word accesses".
virtio backend does not need to assume anything. any spec
compliant guest guarantees it. any address given to
a virtio device is ram.
> That's why it uses address_space_map()/dma_memory_map().
Nope. It could use QEMUSGList just as well. map is just easier.
>
> -- PMM
next prev parent reply other threads:[~2026-06-11 16:43 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-08 0:18 [PATCH RFCv1] virtio: Inherit max bounce buffer size from bus parent if possible Gavin Shan
2026-06-08 8:55 ` Daniel P. Berrangé
2026-06-08 11:11 ` Gavin Shan
2026-06-08 11:38 ` Daniel P. Berrangé
2026-06-09 2:08 ` Gavin Shan
2026-06-09 16:25 ` Peter Xu
2026-06-10 0:32 ` Gavin Shan
2026-06-10 9:54 ` Pavel Hrdina
2026-06-10 10:55 ` Gavin Shan
2026-06-10 12:12 ` Michael S. Tsirkin
2026-06-10 12:19 ` Gavin Shan
2026-06-10 12:27 ` Michael S. Tsirkin
2026-06-10 13:00 ` Gavin Shan
2026-06-10 13:54 ` Gavin Shan
2026-06-10 14:06 ` Michael S. Tsirkin
2026-06-10 15:36 ` Peter Xu
2026-06-10 16:11 ` Peter Maydell
2026-06-10 16:19 ` Michael S. Tsirkin
2026-06-10 19:10 ` Peter Xu
2026-06-10 21:03 ` Michael S. Tsirkin
2026-06-10 21:27 ` Peter Xu
2026-06-10 21:44 ` Michael S. Tsirkin
2026-06-10 16:18 ` Michael S. Tsirkin
2026-06-11 4:33 ` Gavin Shan
2026-06-11 5:31 ` Michael S. Tsirkin
2026-06-11 6:28 ` Gavin Shan
2026-06-11 6:34 ` Michael S. Tsirkin
2026-06-11 12:33 ` Gavin Shan
2026-06-11 12:48 ` Peter Maydell
2026-06-11 14:10 ` Michael S. Tsirkin
2026-06-11 14:55 ` Peter Maydell
2026-06-11 15:05 ` Michael S. Tsirkin
2026-06-11 15:25 ` Michael S. Tsirkin
2026-06-11 15:29 ` Peter Maydell
2026-06-11 15:57 ` Michael S. Tsirkin
2026-06-11 16:16 ` Peter Maydell
2026-06-11 16:42 ` Michael S. Tsirkin [this message]
2026-06-11 16:53 ` Peter Maydell
2026-06-11 17:02 ` Michael S. Tsirkin
2026-06-11 16:13 ` Michael S. Tsirkin
2026-06-11 6:51 ` Michael S. Tsirkin
2026-06-10 12:23 ` Pavel Hrdina
2026-06-10 14:04 ` Gavin Shan
2026-06-10 14:08 ` Michael S. Tsirkin
2026-06-10 9:49 ` Michael S. Tsirkin
2026-06-10 18:30 ` Stefan Hajnoczi
2026-06-10 21:00 ` Michael S. Tsirkin
2026-06-11 14:20 ` Stefan Hajnoczi
2026-06-11 14:45 ` Michael S. Tsirkin
2026-06-11 15:04 ` Peter Maydell
2026-06-11 15:09 ` Michael S. Tsirkin
2026-06-11 1:19 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260611123952-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=alex@shazbot.org \
--cc=berrange@redhat.com \
--cc=david@kernel.org \
--cc=gshan@redhat.com \
--cc=jugraham@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=phrdina@redhat.com \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.