* [PATCH, v2] physmem: avoid bounce buffer too small
@ 2024-02-28 12:59 Heinrich Schuchardt
2024-02-28 15:06 ` Philippe Mathieu-Daudé
0 siblings, 1 reply; 19+ messages in thread
From: Heinrich Schuchardt @ 2024-02-28 12:59 UTC (permalink / raw)
To: Paolo Bonzini, Peter Xu, David Hildenbrand
Cc: Philippe Mathieu-Daudé, qemu-devel, Heinrich Schuchardt
virtqueue_map_desc() is called with values of sz exceeding that may exceed
TARGET_PAGE_SIZE. sz = 0x2800 has been observed.
We only support a single bounce buffer. We have to avoid
virtqueue_map_desc() calling address_space_map() multiple times. Otherwise
we see an error
qemu: virtio: bogus descriptor or out of resources
Increase the minimum size of the bounce buffer to 0x10000 which matches
the largest value of TARGET_PAGE_SIZE for all architectures.
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
---
v2:
remove unrelated change
---
system/physmem.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/system/physmem.c b/system/physmem.c
index e3ebc19eef..3c82da1c86 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as,
*plen = 0;
return NULL;
}
- /* Avoid unbounded allocations */
- l = MIN(l, TARGET_PAGE_SIZE);
+ /*
+ * There is only one bounce buffer. The largest occuring value of
+ * parameter sz of virtqueue_map_desc() must fit into the bounce
+ * buffer.
+ */
+ l = MIN(l, 0x10000);
bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l);
bounce.addr = addr;
bounce.len = l;
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-28 12:59 [PATCH, v2] physmem: avoid bounce buffer too small Heinrich Schuchardt @ 2024-02-28 15:06 ` Philippe Mathieu-Daudé 2024-02-28 18:27 ` Heinrich Schuchardt 0 siblings, 1 reply; 19+ messages in thread From: Philippe Mathieu-Daudé @ 2024-02-28 15:06 UTC (permalink / raw) To: Heinrich Schuchardt, Paolo Bonzini, Peter Xu, David Hildenbrand Cc: qemu-devel, Richard Henderson Hi Heinrich, On 28/2/24 13:59, Heinrich Schuchardt wrote: > virtqueue_map_desc() is called with values of sz exceeding that may exceed > TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > > We only support a single bounce buffer. We have to avoid > virtqueue_map_desc() calling address_space_map() multiple times. Otherwise > we see an error > > qemu: virtio: bogus descriptor or out of resources > > Increase the minimum size of the bounce buffer to 0x10000 which matches > the largest value of TARGET_PAGE_SIZE for all architectures. > > Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > --- > v2: > remove unrelated change > --- > system/physmem.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/system/physmem.c b/system/physmem.c > index e3ebc19eef..3c82da1c86 100644 > --- a/system/physmem.c > +++ b/system/physmem.c > @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > *plen = 0; > return NULL; > } > - /* Avoid unbounded allocations */ > - l = MIN(l, TARGET_PAGE_SIZE); > + /* > + * There is only one bounce buffer. The largest occuring value of > + * parameter sz of virtqueue_map_desc() must fit into the bounce > + * buffer. > + */ > + l = MIN(l, 0x10000); Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or TARGETS_BIGGEST_PAGE_SIZE? Then along: QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l); > bounce.addr = addr; > bounce.len = l; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-28 15:06 ` Philippe Mathieu-Daudé @ 2024-02-28 18:27 ` Heinrich Schuchardt 2024-02-28 18:39 ` Peter Maydell 0 siblings, 1 reply; 19+ messages in thread From: Heinrich Schuchardt @ 2024-02-28 18:27 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > Hi Heinrich, > > On 28/2/24 13:59, Heinrich Schuchardt wrote: >> virtqueue_map_desc() is called with values of sz exceeding that may >> exceed >> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. >> >> We only support a single bounce buffer. We have to avoid >> virtqueue_map_desc() calling address_space_map() multiple times. >> Otherwise >> we see an error >> >> qemu: virtio: bogus descriptor or out of resources >> >> Increase the minimum size of the bounce buffer to 0x10000 which matches >> the largest value of TARGET_PAGE_SIZE for all architectures. >> >> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> >> --- >> v2: >> remove unrelated change >> --- >> system/physmem.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/system/physmem.c b/system/physmem.c >> index e3ebc19eef..3c82da1c86 100644 >> --- a/system/physmem.c >> +++ b/system/physmem.c >> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, >> *plen = 0; >> return NULL; >> } >> - /* Avoid unbounded allocations */ >> - l = MIN(l, TARGET_PAGE_SIZE); >> + /* >> + * There is only one bounce buffer. The largest occuring >> value of >> + * parameter sz of virtqueue_map_desc() must fit into the bounce >> + * buffer. >> + */ >> + l = MIN(l, 0x10000); > > Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > TARGETS_BIGGEST_PAGE_SIZE? > > Then along: > QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); Thank you Philippe for reviewing. TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the page size. How about MIN_BOUNCE_BUFFER_SIZE? Is include/exec/memory.h the right include for the constant? I don't think that TARGET_PAGE_SIZE has any relevance for setting the bounce buffer size. I only mentioned it to say that we are not decreasing the value on any existing architecture. I don't know why TARGET_PAGE_SIZE ever got into this piece of code. e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a reason for this choice. Maybe Paolo remembers. Best regards Heinrich > >> bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l); >> bounce.addr = addr; >> bounce.len = l; > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-28 18:27 ` Heinrich Schuchardt @ 2024-02-28 18:39 ` Peter Maydell 2024-02-28 19:07 ` Heinrich Schuchardt 0 siblings, 1 reply; 19+ messages in thread From: Peter Maydell @ 2024-02-28 18:39 UTC (permalink / raw) To: Heinrich Schuchardt Cc: Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > > Hi Heinrich, > > > > On 28/2/24 13:59, Heinrich Schuchardt wrote: > >> virtqueue_map_desc() is called with values of sz exceeding that may > >> exceed > >> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > >> > >> We only support a single bounce buffer. We have to avoid > >> virtqueue_map_desc() calling address_space_map() multiple times. > >> Otherwise > >> we see an error > >> > >> qemu: virtio: bogus descriptor or out of resources > >> > >> Increase the minimum size of the bounce buffer to 0x10000 which matches > >> the largest value of TARGET_PAGE_SIZE for all architectures. > >> > >> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > >> --- > >> v2: > >> remove unrelated change > >> --- > >> system/physmem.c | 8 ++++++-- > >> 1 file changed, 6 insertions(+), 2 deletions(-) > >> > >> diff --git a/system/physmem.c b/system/physmem.c > >> index e3ebc19eef..3c82da1c86 100644 > >> --- a/system/physmem.c > >> +++ b/system/physmem.c > >> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > >> *plen = 0; > >> return NULL; > >> } > >> - /* Avoid unbounded allocations */ > >> - l = MIN(l, TARGET_PAGE_SIZE); > >> + /* > >> + * There is only one bounce buffer. The largest occuring > >> value of > >> + * parameter sz of virtqueue_map_desc() must fit into the bounce > >> + * buffer. > >> + */ > >> + l = MIN(l, 0x10000); > > > > Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > > TARGETS_BIGGEST_PAGE_SIZE? > > > > Then along: > > QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > > Thank you Philippe for reviewing. > > TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the > page size. > How about MIN_BOUNCE_BUFFER_SIZE? > Is include/exec/memory.h the right include for the constant? > > I don't think that TARGET_PAGE_SIZE has any relevance for setting the > bounce buffer size. I only mentioned it to say that we are not > decreasing the value on any existing architecture. > > I don't know why TARGET_PAGE_SIZE ever got into this piece of code. > e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a > reason for this choice. Maybe Paolo remembers. The limitation to a page dates back to commit 6d16c2f88f2a in 2009, which was the first implementation of this function. I don't think there's a particular reason for that value beyond that it was probably a convenient value that was assumed to be likely "big enough". I think the idea with this bounce-buffer has always been that this isn't really a code path we expected to end up in very often -- it's supposed to be for when devices are doing DMA, which they will typically be doing to memory (backed by host RAM), not devices (backed by MMIO and needing a bounce buffer). So the whole mechanism is a bit "last fallback to stop things breaking entirely". The address_space_map() API says that it's allowed to return a subset of the range you ask for, so if the virtio code doesn't cope with the minimum being set to TARGET_PAGE_SIZE then either we need to fix that virtio code or we need to change the API of this function. (But I think you will also get a reduced range if you try to use it across a boundary between normal host-memory-backed RAM and a device MemoryRegion.) thanks -- PMM ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-28 18:39 ` Peter Maydell @ 2024-02-28 19:07 ` Heinrich Schuchardt 2024-02-29 1:11 ` Peter Xu 2024-02-29 9:38 ` Peter Maydell 0 siblings, 2 replies; 19+ messages in thread From: Heinrich Schuchardt @ 2024-02-28 19:07 UTC (permalink / raw) To: Peter Maydell Cc: Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini On 28.02.24 19:39, Peter Maydell wrote: > On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: >> >> On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: >>> Hi Heinrich, >>> >>> On 28/2/24 13:59, Heinrich Schuchardt wrote: >>>> virtqueue_map_desc() is called with values of sz exceeding that may >>>> exceed >>>> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. >>>> >>>> We only support a single bounce buffer. We have to avoid >>>> virtqueue_map_desc() calling address_space_map() multiple times. >>>> Otherwise >>>> we see an error >>>> >>>> qemu: virtio: bogus descriptor or out of resources >>>> >>>> Increase the minimum size of the bounce buffer to 0x10000 which matches >>>> the largest value of TARGET_PAGE_SIZE for all architectures. >>>> >>>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> >>>> --- >>>> v2: >>>> remove unrelated change >>>> --- >>>> system/physmem.c | 8 ++++++-- >>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/system/physmem.c b/system/physmem.c >>>> index e3ebc19eef..3c82da1c86 100644 >>>> --- a/system/physmem.c >>>> +++ b/system/physmem.c >>>> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, >>>> *plen = 0; >>>> return NULL; >>>> } >>>> - /* Avoid unbounded allocations */ >>>> - l = MIN(l, TARGET_PAGE_SIZE); >>>> + /* >>>> + * There is only one bounce buffer. The largest occuring >>>> value of >>>> + * parameter sz of virtqueue_map_desc() must fit into the bounce >>>> + * buffer. >>>> + */ >>>> + l = MIN(l, 0x10000); >>> >>> Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or >>> TARGETS_BIGGEST_PAGE_SIZE? >>> >>> Then along: >>> QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); >> >> Thank you Philippe for reviewing. >> >> TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the >> page size. >> How about MIN_BOUNCE_BUFFER_SIZE? >> Is include/exec/memory.h the right include for the constant? >> >> I don't think that TARGET_PAGE_SIZE has any relevance for setting the >> bounce buffer size. I only mentioned it to say that we are not >> decreasing the value on any existing architecture. >> >> I don't know why TARGET_PAGE_SIZE ever got into this piece of code. >> e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a >> reason for this choice. Maybe Paolo remembers. > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > which was the first implementation of this function. I don't think > there's a particular reason for that value beyond that it was > probably a convenient value that was assumed to be likely "big enough". > > I think the idea with this bounce-buffer has always been that this > isn't really a code path we expected to end up in very often -- > it's supposed to be for when devices are doing DMA, which they > will typically be doing to memory (backed by host RAM), not > devices (backed by MMIO and needing a bounce buffer). So the > whole mechanism is a bit "last fallback to stop things breaking > entirely". > > The address_space_map() API says that it's allowed to return > a subset of the range you ask for, so if the virtio code doesn't > cope with the minimum being set to TARGET_PAGE_SIZE then either > we need to fix that virtio code or we need to change the API > of this function. (But I think you will also get a reduced > range if you try to use it across a boundary between normal > host-memory-backed RAM and a device MemoryRegion.) If we allow a bounce buffer only to be used once (via the in_use flag), why do we allow only a single bounce buffer? Could address_space_map() allocate a new bounce buffer on every call and address_space_unmap() deallocate it? Isn't the design with a single bounce buffer bound to fail with a multi-threaded client as collision can be expected? Best regards Heinrich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-28 19:07 ` Heinrich Schuchardt @ 2024-02-29 1:11 ` Peter Xu 2024-02-29 10:22 ` Heinrich Schuchardt 2024-02-29 9:38 ` Peter Maydell 1 sibling, 1 reply; 19+ messages in thread From: Peter Xu @ 2024-02-29 1:11 UTC (permalink / raw) To: Heinrich Schuchardt Cc: Peter Maydell, Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Paolo Bonzini, Mattias Nissler On Wed, Feb 28, 2024 at 08:07:47PM +0100, Heinrich Schuchardt wrote: > On 28.02.24 19:39, Peter Maydell wrote: > > On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt > > <heinrich.schuchardt@canonical.com> wrote: > > > > > > On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > > > > Hi Heinrich, > > > > > > > > On 28/2/24 13:59, Heinrich Schuchardt wrote: > > > > > virtqueue_map_desc() is called with values of sz exceeding that may > > > > > exceed > > > > > TARGET_PAGE_SIZE. sz = 0x2800 has been observed. Pure (and can also be stupid) question: why virtqueue_map_desc() would map to !direct mem? Shouldn't those buffers normally allocated from guest RAM? > > > > > > > > > > We only support a single bounce buffer. We have to avoid > > > > > virtqueue_map_desc() calling address_space_map() multiple times. > > > > > Otherwise > > > > > we see an error > > > > > > > > > > qemu: virtio: bogus descriptor or out of resources > > > > > > > > > > Increase the minimum size of the bounce buffer to 0x10000 which matches > > > > > the largest value of TARGET_PAGE_SIZE for all architectures. > > > > > > > > > > Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > > > > > --- > > > > > v2: > > > > > remove unrelated change > > > > > --- > > > > > system/physmem.c | 8 ++++++-- > > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/system/physmem.c b/system/physmem.c > > > > > index e3ebc19eef..3c82da1c86 100644 > > > > > --- a/system/physmem.c > > > > > +++ b/system/physmem.c > > > > > @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > > > > > *plen = 0; > > > > > return NULL; > > > > > } > > > > > - /* Avoid unbounded allocations */ > > > > > - l = MIN(l, TARGET_PAGE_SIZE); > > > > > + /* > > > > > + * There is only one bounce buffer. The largest occuring > > > > > value of > > > > > + * parameter sz of virtqueue_map_desc() must fit into the bounce > > > > > + * buffer. > > > > > + */ > > > > > + l = MIN(l, 0x10000); > > > > > > > > Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > > > > TARGETS_BIGGEST_PAGE_SIZE? > > > > > > > > Then along: > > > > QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > > > > > > Thank you Philippe for reviewing. > > > > > > TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the > > > page size. > > > How about MIN_BOUNCE_BUFFER_SIZE? > > > Is include/exec/memory.h the right include for the constant? > > > > > > I don't think that TARGET_PAGE_SIZE has any relevance for setting the > > > bounce buffer size. I only mentioned it to say that we are not > > > decreasing the value on any existing architecture. > > > > > > I don't know why TARGET_PAGE_SIZE ever got into this piece of code. > > > e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a > > > reason for this choice. Maybe Paolo remembers. > > > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > which was the first implementation of this function. I don't think > > there's a particular reason for that value beyond that it was > > probably a convenient value that was assumed to be likely "big enough". > > > > I think the idea with this bounce-buffer has always been that this > > isn't really a code path we expected to end up in very often -- > > it's supposed to be for when devices are doing DMA, which they > > will typically be doing to memory (backed by host RAM), not > > devices (backed by MMIO and needing a bounce buffer). So the > > whole mechanism is a bit "last fallback to stop things breaking > > entirely". > > > > The address_space_map() API says that it's allowed to return > > a subset of the range you ask for, so if the virtio code doesn't > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > we need to fix that virtio code or we need to change the API > > of this function. (But I think you will also get a reduced > > range if you try to use it across a boundary between normal > > host-memory-backed RAM and a device MemoryRegion.) > > If we allow a bounce buffer only to be used once (via the in_use flag), why > do we allow only a single bounce buffer? > > Could address_space_map() allocate a new bounce buffer on every call and > address_space_unmap() deallocate it? > > Isn't the design with a single bounce buffer bound to fail with a > multi-threaded client as collision can be expected? See: https://lore.kernel.org/r/20240212080617.2559498-1-mnissler@rivosinc.com For some reason that series didn't land, but it seems to be helpful in this case too if e.g. there can be multiple of such devices. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 1:11 ` Peter Xu @ 2024-02-29 10:22 ` Heinrich Schuchardt 2024-02-29 10:36 ` Mattias Nissler 2024-02-29 10:46 ` Jonathan Cameron via 0 siblings, 2 replies; 19+ messages in thread From: Heinrich Schuchardt @ 2024-02-29 10:22 UTC (permalink / raw) To: Peter Xu Cc: Peter Maydell, Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Paolo Bonzini, Mattias Nissler On 29.02.24 02:11, Peter Xu wrote: > On Wed, Feb 28, 2024 at 08:07:47PM +0100, Heinrich Schuchardt wrote: >> On 28.02.24 19:39, Peter Maydell wrote: >>> On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt >>> <heinrich.schuchardt@canonical.com> wrote: >>>> >>>> On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: >>>>> Hi Heinrich, >>>>> >>>>> On 28/2/24 13:59, Heinrich Schuchardt wrote: >>>>>> virtqueue_map_desc() is called with values of sz exceeding that may >>>>>> exceed >>>>>> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > > Pure (and can also be stupid) question: why virtqueue_map_desc() would map > to !direct mem? Shouldn't those buffers normally allocated from guest RAM? > >>>>>> >>>>>> We only support a single bounce buffer. We have to avoid >>>>>> virtqueue_map_desc() calling address_space_map() multiple times. >>>>>> Otherwise >>>>>> we see an error >>>>>> >>>>>> qemu: virtio: bogus descriptor or out of resources >>>>>> >>>>>> Increase the minimum size of the bounce buffer to 0x10000 which matches >>>>>> the largest value of TARGET_PAGE_SIZE for all architectures. >>>>>> >>>>>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> >>>>>> --- >>>>>> v2: >>>>>> remove unrelated change >>>>>> --- >>>>>> system/physmem.c | 8 ++++++-- >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/system/physmem.c b/system/physmem.c >>>>>> index e3ebc19eef..3c82da1c86 100644 >>>>>> --- a/system/physmem.c >>>>>> +++ b/system/physmem.c >>>>>> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, >>>>>> *plen = 0; >>>>>> return NULL; >>>>>> } >>>>>> - /* Avoid unbounded allocations */ >>>>>> - l = MIN(l, TARGET_PAGE_SIZE); >>>>>> + /* >>>>>> + * There is only one bounce buffer. The largest occuring >>>>>> value of >>>>>> + * parameter sz of virtqueue_map_desc() must fit into the bounce >>>>>> + * buffer. >>>>>> + */ >>>>>> + l = MIN(l, 0x10000); >>>>> >>>>> Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or >>>>> TARGETS_BIGGEST_PAGE_SIZE? >>>>> >>>>> Then along: >>>>> QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); >>>> >>>> Thank you Philippe for reviewing. >>>> >>>> TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the >>>> page size. >>>> How about MIN_BOUNCE_BUFFER_SIZE? >>>> Is include/exec/memory.h the right include for the constant? >>>> >>>> I don't think that TARGET_PAGE_SIZE has any relevance for setting the >>>> bounce buffer size. I only mentioned it to say that we are not >>>> decreasing the value on any existing architecture. >>>> >>>> I don't know why TARGET_PAGE_SIZE ever got into this piece of code. >>>> e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a >>>> reason for this choice. Maybe Paolo remembers. >>> >>> The limitation to a page dates back to commit 6d16c2f88f2a in 2009, >>> which was the first implementation of this function. I don't think >>> there's a particular reason for that value beyond that it was >>> probably a convenient value that was assumed to be likely "big enough". >>> >>> I think the idea with this bounce-buffer has always been that this >>> isn't really a code path we expected to end up in very often -- >>> it's supposed to be for when devices are doing DMA, which they >>> will typically be doing to memory (backed by host RAM), not >>> devices (backed by MMIO and needing a bounce buffer). So the >>> whole mechanism is a bit "last fallback to stop things breaking >>> entirely". >>> >>> The address_space_map() API says that it's allowed to return >>> a subset of the range you ask for, so if the virtio code doesn't >>> cope with the minimum being set to TARGET_PAGE_SIZE then either >>> we need to fix that virtio code or we need to change the API >>> of this function. (But I think you will also get a reduced >>> range if you try to use it across a boundary between normal >>> host-memory-backed RAM and a device MemoryRegion.) >> >> If we allow a bounce buffer only to be used once (via the in_use flag), why >> do we allow only a single bounce buffer? >> >> Could address_space_map() allocate a new bounce buffer on every call and >> address_space_unmap() deallocate it? >> >> Isn't the design with a single bounce buffer bound to fail with a >> multi-threaded client as collision can be expected? > > See: > > https://lore.kernel.org/r/20240212080617.2559498-1-mnissler@rivosinc.com > > For some reason that series didn't land, but it seems to be helpful in this > case too if e.g. there can be multiple of such devices. > > Thanks, > Hello Peter Xu, thanks for pointing to your series. What I like about it is that it removes the limit of a single bounce buffer per AddressSpace. Unfortunately it does not solve my problem. You limit the sum of all of the allocations for a single AddressSpcace to DEFAULT_MAX_BOUNCE_BUFFER_SIZE = 4096 which is too small for my use case. Why do we need a limit? Why is it so tiny? Best regards Heinrich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 10:22 ` Heinrich Schuchardt @ 2024-02-29 10:36 ` Mattias Nissler 2024-02-29 10:46 ` Jonathan Cameron via 1 sibling, 0 replies; 19+ messages in thread From: Mattias Nissler @ 2024-02-29 10:36 UTC (permalink / raw) To: Heinrich Schuchardt Cc: Peter Xu, Peter Maydell, Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Paolo Bonzini On Thu, Feb 29, 2024 at 11:22 AM Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > On 29.02.24 02:11, Peter Xu wrote: > > On Wed, Feb 28, 2024 at 08:07:47PM +0100, Heinrich Schuchardt wrote: > >> On 28.02.24 19:39, Peter Maydell wrote: > >>> On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt > >>> <heinrich.schuchardt@canonical.com> wrote: > >>>> > >>>> On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > >>>>> Hi Heinrich, > >>>>> > >>>>> On 28/2/24 13:59, Heinrich Schuchardt wrote: > >>>>>> virtqueue_map_desc() is called with values of sz exceeding that may > >>>>>> exceed > >>>>>> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > > > > Pure (and can also be stupid) question: why virtqueue_map_desc() would map > > to !direct mem? Shouldn't those buffers normally allocated from guest RAM? > > > >>>>>> > >>>>>> We only support a single bounce buffer. We have to avoid > >>>>>> virtqueue_map_desc() calling address_space_map() multiple times. > >>>>>> Otherwise > >>>>>> we see an error > >>>>>> > >>>>>> qemu: virtio: bogus descriptor or out of resources > >>>>>> > >>>>>> Increase the minimum size of the bounce buffer to 0x10000 which matches > >>>>>> the largest value of TARGET_PAGE_SIZE for all architectures. > >>>>>> > >>>>>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > >>>>>> --- > >>>>>> v2: > >>>>>> remove unrelated change > >>>>>> --- > >>>>>> system/physmem.c | 8 ++++++-- > >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/system/physmem.c b/system/physmem.c > >>>>>> index e3ebc19eef..3c82da1c86 100644 > >>>>>> --- a/system/physmem.c > >>>>>> +++ b/system/physmem.c > >>>>>> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > >>>>>> *plen = 0; > >>>>>> return NULL; > >>>>>> } > >>>>>> - /* Avoid unbounded allocations */ > >>>>>> - l = MIN(l, TARGET_PAGE_SIZE); > >>>>>> + /* > >>>>>> + * There is only one bounce buffer. The largest occuring > >>>>>> value of > >>>>>> + * parameter sz of virtqueue_map_desc() must fit into the bounce > >>>>>> + * buffer. > >>>>>> + */ > >>>>>> + l = MIN(l, 0x10000); > >>>>> > >>>>> Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > >>>>> TARGETS_BIGGEST_PAGE_SIZE? > >>>>> > >>>>> Then along: > >>>>> QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > >>>> > >>>> Thank you Philippe for reviewing. > >>>> > >>>> TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the > >>>> page size. > >>>> How about MIN_BOUNCE_BUFFER_SIZE? > >>>> Is include/exec/memory.h the right include for the constant? > >>>> > >>>> I don't think that TARGET_PAGE_SIZE has any relevance for setting the > >>>> bounce buffer size. I only mentioned it to say that we are not > >>>> decreasing the value on any existing architecture. > >>>> > >>>> I don't know why TARGET_PAGE_SIZE ever got into this piece of code. > >>>> e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a > >>>> reason for this choice. Maybe Paolo remembers. > >>> > >>> The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > >>> which was the first implementation of this function. I don't think > >>> there's a particular reason for that value beyond that it was > >>> probably a convenient value that was assumed to be likely "big enough". > >>> > >>> I think the idea with this bounce-buffer has always been that this > >>> isn't really a code path we expected to end up in very often -- > >>> it's supposed to be for when devices are doing DMA, which they > >>> will typically be doing to memory (backed by host RAM), not > >>> devices (backed by MMIO and needing a bounce buffer). So the > >>> whole mechanism is a bit "last fallback to stop things breaking > >>> entirely". > >>> > >>> The address_space_map() API says that it's allowed to return > >>> a subset of the range you ask for, so if the virtio code doesn't > >>> cope with the minimum being set to TARGET_PAGE_SIZE then either > >>> we need to fix that virtio code or we need to change the API > >>> of this function. (But I think you will also get a reduced > >>> range if you try to use it across a boundary between normal > >>> host-memory-backed RAM and a device MemoryRegion.) > >> > >> If we allow a bounce buffer only to be used once (via the in_use flag), why > >> do we allow only a single bounce buffer? > >> > >> Could address_space_map() allocate a new bounce buffer on every call and > >> address_space_unmap() deallocate it? > >> > >> Isn't the design with a single bounce buffer bound to fail with a > >> multi-threaded client as collision can be expected? > > > > See: > > > > https://lore.kernel.org/r/20240212080617.2559498-1-mnissler@rivosinc.com > > > > For some reason that series didn't land, but it seems to be helpful in this > > case too if e.g. there can be multiple of such devices. > > > > Thanks, > > > > Hello Peter Xu, > > thanks for pointing to your series. What I like about it is that it > removes the limit of a single bounce buffer per AddressSpace. > > Unfortunately it does not solve my problem. You limit the sum of all of > the allocations for a single AddressSpcace to > DEFAULT_MAX_BOUNCE_BUFFER_SIZE = 4096 which is too small for my use case. Note that the limit is configured for address spaces attached to PCI devices with a parameter. > > Why do we need a limit? The rationale is to prevent a guest from allocating unlimited amounts of host memory. > Why is it so tiny? Nobody has come up with a good way to determine a "sufficient" amount that works with all use cases, while at the same time addressing the concern due to malicious guest behavior mentioned above. (Note that I'm merely reciting previous conversations as the author of the series Peter pointed you at) > > Best regards > > Heinrich > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 10:22 ` Heinrich Schuchardt 2024-02-29 10:36 ` Mattias Nissler @ 2024-02-29 10:46 ` Jonathan Cameron via 1 sibling, 0 replies; 19+ messages in thread From: Jonathan Cameron via @ 2024-02-29 10:46 UTC (permalink / raw) To: Heinrich Schuchardt Cc: Peter Xu, Peter Maydell, Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Paolo Bonzini, Mattias Nissler On Thu, 29 Feb 2024 11:22:24 +0100 Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > On 29.02.24 02:11, Peter Xu wrote: > > On Wed, Feb 28, 2024 at 08:07:47PM +0100, Heinrich Schuchardt wrote: > >> On 28.02.24 19:39, Peter Maydell wrote: > >>> On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt > >>> <heinrich.schuchardt@canonical.com> wrote: > >>>> > >>>> On 28.02.24 16:06, Philippe Mathieu-Daudé wrote: > >>>>> Hi Heinrich, > >>>>> > >>>>> On 28/2/24 13:59, Heinrich Schuchardt wrote: > >>>>>> virtqueue_map_desc() is called with values of sz exceeding that may > >>>>>> exceed > >>>>>> TARGET_PAGE_SIZE. sz = 0x2800 has been observed. > > > > Pure (and can also be stupid) question: why virtqueue_map_desc() would map > > to !direct mem? Shouldn't those buffers normally allocated from guest RAM? > > > >>>>>> > >>>>>> We only support a single bounce buffer. We have to avoid > >>>>>> virtqueue_map_desc() calling address_space_map() multiple times. > >>>>>> Otherwise > >>>>>> we see an error > >>>>>> > >>>>>> qemu: virtio: bogus descriptor or out of resources > >>>>>> > >>>>>> Increase the minimum size of the bounce buffer to 0x10000 which matches > >>>>>> the largest value of TARGET_PAGE_SIZE for all architectures. > >>>>>> > >>>>>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> > >>>>>> --- > >>>>>> v2: > >>>>>> remove unrelated change > >>>>>> --- > >>>>>> system/physmem.c | 8 ++++++-- > >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/system/physmem.c b/system/physmem.c > >>>>>> index e3ebc19eef..3c82da1c86 100644 > >>>>>> --- a/system/physmem.c > >>>>>> +++ b/system/physmem.c > >>>>>> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as, > >>>>>> *plen = 0; > >>>>>> return NULL; > >>>>>> } > >>>>>> - /* Avoid unbounded allocations */ > >>>>>> - l = MIN(l, TARGET_PAGE_SIZE); > >>>>>> + /* > >>>>>> + * There is only one bounce buffer. The largest occuring > >>>>>> value of > >>>>>> + * parameter sz of virtqueue_map_desc() must fit into the bounce > >>>>>> + * buffer. > >>>>>> + */ > >>>>>> + l = MIN(l, 0x10000); > >>>>> > >>>>> Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or > >>>>> TARGETS_BIGGEST_PAGE_SIZE? > >>>>> > >>>>> Then along: > >>>>> QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE); > >>>> > >>>> Thank you Philippe for reviewing. > >>>> > >>>> TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the > >>>> page size. > >>>> How about MIN_BOUNCE_BUFFER_SIZE? > >>>> Is include/exec/memory.h the right include for the constant? > >>>> > >>>> I don't think that TARGET_PAGE_SIZE has any relevance for setting the > >>>> bounce buffer size. I only mentioned it to say that we are not > >>>> decreasing the value on any existing architecture. > >>>> > >>>> I don't know why TARGET_PAGE_SIZE ever got into this piece of code. > >>>> e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a > >>>> reason for this choice. Maybe Paolo remembers. > >>> > >>> The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > >>> which was the first implementation of this function. I don't think > >>> there's a particular reason for that value beyond that it was > >>> probably a convenient value that was assumed to be likely "big enough". > >>> > >>> I think the idea with this bounce-buffer has always been that this > >>> isn't really a code path we expected to end up in very often -- > >>> it's supposed to be for when devices are doing DMA, which they > >>> will typically be doing to memory (backed by host RAM), not > >>> devices (backed by MMIO and needing a bounce buffer). So the > >>> whole mechanism is a bit "last fallback to stop things breaking > >>> entirely". > >>> > >>> The address_space_map() API says that it's allowed to return > >>> a subset of the range you ask for, so if the virtio code doesn't > >>> cope with the minimum being set to TARGET_PAGE_SIZE then either > >>> we need to fix that virtio code or we need to change the API > >>> of this function. (But I think you will also get a reduced > >>> range if you try to use it across a boundary between normal > >>> host-memory-backed RAM and a device MemoryRegion.) > >> > >> If we allow a bounce buffer only to be used once (via the in_use flag), why > >> do we allow only a single bounce buffer? > >> > >> Could address_space_map() allocate a new bounce buffer on every call and > >> address_space_unmap() deallocate it? > >> > >> Isn't the design with a single bounce buffer bound to fail with a > >> multi-threaded client as collision can be expected? > > > > See: > > > > https://lore.kernel.org/r/20240212080617.2559498-1-mnissler@rivosinc.com > > > > For some reason that series didn't land, but it seems to be helpful in this > > case too if e.g. there can be multiple of such devices. > > > > Thanks, > > > > Hello Peter Xu, > > thanks for pointing to your series. What I like about it is that it > removes the limit of a single bounce buffer per AddressSpace. > > Unfortunately it does not solve my problem. You limit the sum of all of > the allocations for a single AddressSpcace to > DEFAULT_MAX_BOUNCE_BUFFER_SIZE = 4096 which is too small for my use case. Agreed. For CXL memory (which indeed hits this with virtio-blk-pci for example) I'm carrying a far from subtle patch [1] on top of the series linked above (which I would very much like to land asap!) Note I also need: https://lore.kernel.org/qemu-devel/20240215142817.1904-1-Jonathan.Cameron@huawei.com/#t [PATCH 0/3] physmem: Fix MemoryRegion for second access to cached MMIO Address Space otherwise the second 8 byte read comes from the wrong address space. That used to accidentally trigger very large reads hence the rather excessive size set below. I haven't checked yet if a smaller size is fine as obviously something as hacky as the below isn't going anywhere! Perhaps we need a rather more global version of x-max-bounce-buffers from Mattias' series? [1] diff --git a/system/physmem.c b/system/physmem.c index 0a838f1b33..10e9d8da86 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -2554,6 +2554,7 @@ static void memory_map_init(void) memory_region_init(system_memory, NULL, "system", UINT64_MAX); address_space_init(&address_space_memory, system_memory, "memory"); + address_space_memory.max_bounce_buffer_size = 1024 * 1024 * 1024; system_io = g_malloc(sizeof(*system_io)); memory_region_init_io(system_io, NULL, &unassigned_io_ops, NULL, "io", 65536); > > Why do we need a limit? > Why is it so tiny? > > Best regards > > Heinrich > > > > ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-28 19:07 ` Heinrich Schuchardt 2024-02-29 1:11 ` Peter Xu @ 2024-02-29 9:38 ` Peter Maydell 2024-02-29 10:59 ` Jonathan Cameron via 1 sibling, 1 reply; 19+ messages in thread From: Peter Maydell @ 2024-02-29 9:38 UTC (permalink / raw) To: Heinrich Schuchardt Cc: Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > On 28.02.24 19:39, Peter Maydell wrote: > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > which was the first implementation of this function. I don't think > > there's a particular reason for that value beyond that it was > > probably a convenient value that was assumed to be likely "big enough". > > > > I think the idea with this bounce-buffer has always been that this > > isn't really a code path we expected to end up in very often -- > > it's supposed to be for when devices are doing DMA, which they > > will typically be doing to memory (backed by host RAM), not > > devices (backed by MMIO and needing a bounce buffer). So the > > whole mechanism is a bit "last fallback to stop things breaking > > entirely". > > > > The address_space_map() API says that it's allowed to return > > a subset of the range you ask for, so if the virtio code doesn't > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > we need to fix that virtio code or we need to change the API > > of this function. (But I think you will also get a reduced > > range if you try to use it across a boundary between normal > > host-memory-backed RAM and a device MemoryRegion.) > > If we allow a bounce buffer only to be used once (via the in_use flag), > why do we allow only a single bounce buffer? > > Could address_space_map() allocate a new bounce buffer on every call and > address_space_unmap() deallocate it? > > Isn't the design with a single bounce buffer bound to fail with a > multi-threaded client as collision can be expected? Yeah, I don't suppose multi-threaded was particularly expected. Again, this is really a "handle the case where the guest does something silly" setup, which is why only one bounce buffer. Why is your guest ending up in the bounce-buffer path? -- PMM ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 9:38 ` Peter Maydell @ 2024-02-29 10:59 ` Jonathan Cameron via 2024-02-29 11:11 ` Peter Maydell 0 siblings, 1 reply; 19+ messages in thread From: Jonathan Cameron via @ 2024-02-29 10:59 UTC (permalink / raw) To: Peter Maydell Cc: Heinrich Schuchardt, Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini, Mattias Nissler On Thu, 29 Feb 2024 09:38:29 +0000 Peter Maydell <peter.maydell@linaro.org> wrote: > On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: > > > > On 28.02.24 19:39, Peter Maydell wrote: > > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > > which was the first implementation of this function. I don't think > > > there's a particular reason for that value beyond that it was > > > probably a convenient value that was assumed to be likely "big enough". > > > > > > I think the idea with this bounce-buffer has always been that this > > > isn't really a code path we expected to end up in very often -- > > > it's supposed to be for when devices are doing DMA, which they > > > will typically be doing to memory (backed by host RAM), not > > > devices (backed by MMIO and needing a bounce buffer). So the > > > whole mechanism is a bit "last fallback to stop things breaking > > > entirely". > > > > > > The address_space_map() API says that it's allowed to return > > > a subset of the range you ask for, so if the virtio code doesn't > > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > > we need to fix that virtio code or we need to change the API > > > of this function. (But I think you will also get a reduced > > > range if you try to use it across a boundary between normal > > > host-memory-backed RAM and a device MemoryRegion.) > > > > If we allow a bounce buffer only to be used once (via the in_use flag), > > why do we allow only a single bounce buffer? > > > > Could address_space_map() allocate a new bounce buffer on every call and > > address_space_unmap() deallocate it? > > > > Isn't the design with a single bounce buffer bound to fail with a > > multi-threaded client as collision can be expected? > > Yeah, I don't suppose multi-threaded was particularly expected. > Again, this is really a "handle the case where the guest does > something silly" setup, which is why only one bounce buffer. > > Why is your guest ending up in the bounce-buffer path? Happens for me with emulated CXL memory. I think the case I saw was split descriptors in virtio via address space caches https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L4043 One bounce buffer is in use for the outer loop and another for the descriptors it is pointing to. Matthias' series makes this work fine. I need to circle back and check how big a cache this needs. I'm carrying a silly size because of the side effect of the address space bug here https://lore.kernel.org/qemu-devel/20240215142817.1904-1-Jonathan.Cameron@huawei.com/#t and can probably set it to much less than my currently 1GiB. Jonathan > > -- PMM > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 10:59 ` Jonathan Cameron via @ 2024-02-29 11:11 ` Peter Maydell 2024-02-29 11:17 ` Heinrich Schuchardt 2024-02-29 11:18 ` Mattias Nissler 0 siblings, 2 replies; 19+ messages in thread From: Peter Maydell @ 2024-02-29 11:11 UTC (permalink / raw) To: Jonathan Cameron Cc: Heinrich Schuchardt, Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini, Mattias Nissler On Thu, 29 Feb 2024 at 10:59, Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > On Thu, 29 Feb 2024 09:38:29 +0000 > Peter Maydell <peter.maydell@linaro.org> wrote: > > > On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt > > <heinrich.schuchardt@canonical.com> wrote: > > > > > > On 28.02.24 19:39, Peter Maydell wrote: > > > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > > > which was the first implementation of this function. I don't think > > > > there's a particular reason for that value beyond that it was > > > > probably a convenient value that was assumed to be likely "big enough". > > > > > > > > I think the idea with this bounce-buffer has always been that this > > > > isn't really a code path we expected to end up in very often -- > > > > it's supposed to be for when devices are doing DMA, which they > > > > will typically be doing to memory (backed by host RAM), not > > > > devices (backed by MMIO and needing a bounce buffer). So the > > > > whole mechanism is a bit "last fallback to stop things breaking > > > > entirely". > > > > > > > > The address_space_map() API says that it's allowed to return > > > > a subset of the range you ask for, so if the virtio code doesn't > > > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > > > we need to fix that virtio code or we need to change the API > > > > of this function. (But I think you will also get a reduced > > > > range if you try to use it across a boundary between normal > > > > host-memory-backed RAM and a device MemoryRegion.) > > > > > > If we allow a bounce buffer only to be used once (via the in_use flag), > > > why do we allow only a single bounce buffer? > > > > > > Could address_space_map() allocate a new bounce buffer on every call and > > > address_space_unmap() deallocate it? > > > > > > Isn't the design with a single bounce buffer bound to fail with a > > > multi-threaded client as collision can be expected? > > > > Yeah, I don't suppose multi-threaded was particularly expected. > > Again, this is really a "handle the case where the guest does > > something silly" setup, which is why only one bounce buffer. > > > > Why is your guest ending up in the bounce-buffer path? > > Happens for me with emulated CXL memory. Can we put that in the "something silly" bucket? :-) But yes, I'm not surprised that CXL runs into this. Heinrich, are you doing CXL testing, or is this some other workload? > I think the case I saw > was split descriptors in virtio via address space caches > https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L4043 > > One bounce buffer is in use for the outer loop and another for the descriptors > it is pointing to. Mmm. The other assumption made in the design of the address_space_map() API I think was that it was unlikely that a device would be trying to do two DMA operations simultaneously. This is clearly not true in practice. We definitely need to fix one end or other of this API. (I'm not sure why the bounce-buffer limit ought to be per-AddressSpace: is that just done in Matthias' series so that we can attach an x-thingy property to the individual PCI device?) -- PMM ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 11:11 ` Peter Maydell @ 2024-02-29 11:17 ` Heinrich Schuchardt 2024-02-29 12:34 ` Peter Maydell 2024-02-29 11:18 ` Mattias Nissler 1 sibling, 1 reply; 19+ messages in thread From: Heinrich Schuchardt @ 2024-02-29 11:17 UTC (permalink / raw) To: Peter Maydell Cc: Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini, Mattias Nissler, Jonathan Cameron On 29.02.24 12:11, Peter Maydell wrote: > On Thu, 29 Feb 2024 at 10:59, Jonathan Cameron > <Jonathan.Cameron@huawei.com> wrote: >> >> On Thu, 29 Feb 2024 09:38:29 +0000 >> Peter Maydell <peter.maydell@linaro.org> wrote: >> >>> On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt >>> <heinrich.schuchardt@canonical.com> wrote: >>>> >>>> On 28.02.24 19:39, Peter Maydell wrote: >>>>> The limitation to a page dates back to commit 6d16c2f88f2a in 2009, >>>>> which was the first implementation of this function. I don't think >>>>> there's a particular reason for that value beyond that it was >>>>> probably a convenient value that was assumed to be likely "big enough". >>>>> >>>>> I think the idea with this bounce-buffer has always been that this >>>>> isn't really a code path we expected to end up in very often -- >>>>> it's supposed to be for when devices are doing DMA, which they >>>>> will typically be doing to memory (backed by host RAM), not >>>>> devices (backed by MMIO and needing a bounce buffer). So the >>>>> whole mechanism is a bit "last fallback to stop things breaking >>>>> entirely". >>>>> >>>>> The address_space_map() API says that it's allowed to return >>>>> a subset of the range you ask for, so if the virtio code doesn't >>>>> cope with the minimum being set to TARGET_PAGE_SIZE then either >>>>> we need to fix that virtio code or we need to change the API >>>>> of this function. (But I think you will also get a reduced >>>>> range if you try to use it across a boundary between normal >>>>> host-memory-backed RAM and a device MemoryRegion.) >>>> >>>> If we allow a bounce buffer only to be used once (via the in_use flag), >>>> why do we allow only a single bounce buffer? >>>> >>>> Could address_space_map() allocate a new bounce buffer on every call and >>>> address_space_unmap() deallocate it? >>>> >>>> Isn't the design with a single bounce buffer bound to fail with a >>>> multi-threaded client as collision can be expected? >>> >>> Yeah, I don't suppose multi-threaded was particularly expected. >>> Again, this is really a "handle the case where the guest does >>> something silly" setup, which is why only one bounce buffer. >>> >>> Why is your guest ending up in the bounce-buffer path? >> >> Happens for me with emulated CXL memory. > > Can we put that in the "something silly" bucket? :-) > But yes, I'm not surprised that CXL runs into this. Heinrich, > are you doing CXL testing, or is this some other workload? I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: qemu-system-riscv64 \ -M virt,acpi=off -accel tcg -m 4096 \ -serial mon:stdio \ -device virtio-gpu-pci \ -device qemu-xhci \ -device usb-kbd \ -drive if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ -drive file=sct.img,format=raw,if=virtio \ -device virtio-net-device,netdev=net0 \ -netdev user,id=net0 This does not invoke any CXL related stuff. Best regards Heinrich > >> I think the case I saw >> was split descriptors in virtio via address space caches >> https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L4043 >> >> One bounce buffer is in use for the outer loop and another for the descriptors >> it is pointing to. > > Mmm. The other assumption made in the design of the address_space_map() > API I think was that it was unlikely that a device would be trying > to do two DMA operations simultaneously. This is clearly not > true in practice. We definitely need to fix one end or other of > this API. > > (I'm not sure why the bounce-buffer limit ought to be per-AddressSpace: > is that just done in Matthias' series so that we can attach an > x-thingy property to the individual PCI device?) > > -- PMM ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 11:17 ` Heinrich Schuchardt @ 2024-02-29 12:34 ` Peter Maydell 2024-02-29 12:52 ` Mattias Nissler 2024-02-29 14:17 ` Heinrich Schuchardt 0 siblings, 2 replies; 19+ messages in thread From: Peter Maydell @ 2024-02-29 12:34 UTC (permalink / raw) To: Heinrich Schuchardt Cc: Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini, Mattias Nissler, Jonathan Cameron On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > But yes, I'm not surprised that CXL runs into this. Heinrich, > > are you doing CXL testing, or is this some other workload? > > I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: > > qemu-system-riscv64 \ > -M virt,acpi=off -accel tcg -m 4096 \ > -serial mon:stdio \ > -device virtio-gpu-pci \ > -device qemu-xhci \ > -device usb-kbd \ > -drive > if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ > -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ > -drive file=sct.img,format=raw,if=virtio \ > -device virtio-net-device,netdev=net0 \ > -netdev user,id=net0 > > This does not invoke any CXL related stuff. Hmm, that doesn't seem like it ought to be running into this. What underlying memory region is the guest trying to do the virtio queue access to? -- PMM ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 12:34 ` Peter Maydell @ 2024-02-29 12:52 ` Mattias Nissler 2024-02-29 13:19 ` Peter Maydell 2024-02-29 14:17 ` Heinrich Schuchardt 1 sibling, 1 reply; 19+ messages in thread From: Mattias Nissler @ 2024-02-29 12:52 UTC (permalink / raw) To: Peter Maydell Cc: Heinrich Schuchardt, Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini, Jonathan Cameron On Thu, Feb 29, 2024 at 1:35 PM Peter Maydell <peter.maydell@linaro.org> wrote: > > On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: > > > But yes, I'm not surprised that CXL runs into this. Heinrich, > > > are you doing CXL testing, or is this some other workload? > > > > I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: > > > > qemu-system-riscv64 \ > > -M virt,acpi=off -accel tcg -m 4096 \ > > -serial mon:stdio \ > > -device virtio-gpu-pci \ > > -device qemu-xhci \ > > -device usb-kbd \ > > -drive > > if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ > > -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ > > -drive file=sct.img,format=raw,if=virtio \ > > -device virtio-net-device,netdev=net0 \ > > -netdev user,id=net0 > > > > This does not invoke any CXL related stuff. > > Hmm, that doesn't seem like it ought to be running into this. > What underlying memory region is the guest trying to do > the virtio queue access to? FWIW, I have seen multiple bounce buffer usage with the generic net TX path as well as the XHCI controller, so it might be either of these. Bounce buffering should only take place when the memory region can't be accessed directly though - I don't see why that's the case for the given command line. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 12:52 ` Mattias Nissler @ 2024-02-29 13:19 ` Peter Maydell 0 siblings, 0 replies; 19+ messages in thread From: Peter Maydell @ 2024-02-29 13:19 UTC (permalink / raw) To: Mattias Nissler Cc: Heinrich Schuchardt, Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini, Jonathan Cameron On Thu, 29 Feb 2024 at 12:52, Mattias Nissler <mnissler@rivosinc.com> wrote: > > On Thu, Feb 29, 2024 at 1:35 PM Peter Maydell <peter.maydell@linaro.org> wrote: > > > > On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt > > <heinrich.schuchardt@canonical.com> wrote: > > > > But yes, I'm not surprised that CXL runs into this. Heinrich, > > > > are you doing CXL testing, or is this some other workload? > > > > > > I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: > > > > > > qemu-system-riscv64 \ > > > -M virt,acpi=off -accel tcg -m 4096 \ > > > -serial mon:stdio \ > > > -device virtio-gpu-pci \ > > > -device qemu-xhci \ > > > -device usb-kbd \ > > > -drive > > > if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ > > > -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ > > > -drive file=sct.img,format=raw,if=virtio \ > > > -device virtio-net-device,netdev=net0 \ > > > -netdev user,id=net0 > > > > > > This does not invoke any CXL related stuff. > > > > Hmm, that doesn't seem like it ought to be running into this. > > What underlying memory region is the guest trying to do > > the virtio queue access to? > > FWIW, I have seen multiple bounce buffer usage with the generic net TX > path as well as the XHCI controller, so it might be either of these. > Bounce buffering should only take place when the memory region can't > be accessed directly though - I don't see why that's the case for the > given command line. Yeah, exactly -- the only thing I can see there where a reasonable guest program ought to be trying to do DMA to/from non-host-RAM would be if (a) it's reading data from directly out of the pflash image and (b) the flash device has got itself into programming mode somehow. But that doesn't happen for the other architectures that have pflash for boot firmware as far as I know. The bounce buffering stuff does need to be fixed but it's also worth looking at why we're hitting the code path at all here, because even if it works the performance is terrible (doubly so if you try to execute code out of that MR). -- PMM ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 12:34 ` Peter Maydell 2024-02-29 12:52 ` Mattias Nissler @ 2024-02-29 14:17 ` Heinrich Schuchardt 2024-02-29 14:52 ` Peter Maydell 1 sibling, 1 reply; 19+ messages in thread From: Heinrich Schuchardt @ 2024-02-29 14:17 UTC (permalink / raw) To: Peter Maydell Cc: Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini, Mattias Nissler, Jonathan Cameron On 29.02.24 13:34, Peter Maydell wrote: > On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: >>> But yes, I'm not surprised that CXL runs into this. Heinrich, >>> are you doing CXL testing, or is this some other workload? >> >> I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: >> >> qemu-system-riscv64 \ >> -M virt,acpi=off -accel tcg -m 4096 \ >> -serial mon:stdio \ >> -device virtio-gpu-pci \ >> -device qemu-xhci \ >> -device usb-kbd \ >> -drive >> if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ >> -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ >> -drive file=sct.img,format=raw,if=virtio \ >> -device virtio-net-device,netdev=net0 \ >> -netdev user,id=net0 >> >> This does not invoke any CXL related stuff. > > Hmm, that doesn't seem like it ought to be running into this. > What underlying memory region is the guest trying to do > the virtio queue access to? > > -- PMM The error occurs while the SCT is executing function BBTestReadBlocksConformanceAutoTest (https://github.com/tianocore/edk2-test/blob/cabb98d44be94e7547605435a0be7c4946d10f8b/uefi-sct/SctPkg/TestCase/UEFI/EFI/Protocol/BlockIo/BlackBoxTest/BlockIoBBTestConformance.c#L45) This code is accessing the drive defined as -drive file=sct.img,format=raw,if=virtio . In the conformance test correct error handling for invalid parameters of the UEFI block IO protocol is tested. This includes calling the UEFI API to * read with incorrectly aligned buffers * read with invalid LBA parameter * read with buffer size not being a multiple of the sector size In all these cases the UEFI API implemented by EDK II is expected to return an error. Best regards Heinrich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 14:17 ` Heinrich Schuchardt @ 2024-02-29 14:52 ` Peter Maydell 0 siblings, 0 replies; 19+ messages in thread From: Peter Maydell @ 2024-02-29 14:52 UTC (permalink / raw) To: Heinrich Schuchardt Cc: Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini, Mattias Nissler, Jonathan Cameron On Thu, 29 Feb 2024 at 14:17, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > On 29.02.24 13:34, Peter Maydell wrote: > > On Thu, 29 Feb 2024 at 11:17, Heinrich Schuchardt > > <heinrich.schuchardt@canonical.com> wrote: > >>> But yes, I'm not surprised that CXL runs into this. Heinrich, > >>> are you doing CXL testing, or is this some other workload? > >> > >> I am running the UEFI Self-Certification Tests (SCT) on EDK 2 using: > >> > >> qemu-system-riscv64 \ > >> -M virt,acpi=off -accel tcg -m 4096 \ > >> -serial mon:stdio \ > >> -device virtio-gpu-pci \ > >> -device qemu-xhci \ > >> -device usb-kbd \ > >> -drive > >> if=pflash,format=raw,unit=0,file=RISCV_VIRT_CODE.fd,readonly=on \ > >> -drive if=pflash,format=raw,unit=1,file=RISCV_VIRT_VARS.fd \ > >> -drive file=sct.img,format=raw,if=virtio \ > >> -device virtio-net-device,netdev=net0 \ > >> -netdev user,id=net0 > >> > >> This does not invoke any CXL related stuff. > > > > Hmm, that doesn't seem like it ought to be running into this. > > What underlying memory region is the guest trying to do > > the virtio queue access to? > The error occurs while the SCT is executing function > BBTestReadBlocksConformanceAutoTest > (https://github.com/tianocore/edk2-test/blob/cabb98d44be94e7547605435a0be7c4946d10f8b/uefi-sct/SctPkg/TestCase/UEFI/EFI/Protocol/BlockIo/BlackBoxTest/BlockIoBBTestConformance.c#L45) > > This code is accessing the drive defined as > -drive file=sct.img,format=raw,if=virtio . > > In the conformance test correct error handling for invalid parameters of > the UEFI block IO protocol is tested. This includes calling the UEFI API to > > * read with incorrectly aligned buffers > * read with invalid LBA parameter > * read with buffer size not being a multiple of the sector size > > In all these cases the UEFI API implemented by EDK II is expected to > return an error. OK, but where is it actually reading/writing to ? Even if the buffer isn't aligned right it ought not to be in a non-host-ram-backed memory region, I would have thought. -- PMM ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH, v2] physmem: avoid bounce buffer too small 2024-02-29 11:11 ` Peter Maydell 2024-02-29 11:17 ` Heinrich Schuchardt @ 2024-02-29 11:18 ` Mattias Nissler 1 sibling, 0 replies; 19+ messages in thread From: Mattias Nissler @ 2024-02-29 11:18 UTC (permalink / raw) To: Peter Maydell Cc: Jonathan Cameron, Heinrich Schuchardt, Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, David Hildenbrand, Peter Xu, Paolo Bonzini On Thu, Feb 29, 2024 at 12:12 PM Peter Maydell <peter.maydell@linaro.org> wrote: > > On Thu, 29 Feb 2024 at 10:59, Jonathan Cameron > <Jonathan.Cameron@huawei.com> wrote: > > > > On Thu, 29 Feb 2024 09:38:29 +0000 > > Peter Maydell <peter.maydell@linaro.org> wrote: > > > > > On Wed, 28 Feb 2024 at 19:07, Heinrich Schuchardt > > > <heinrich.schuchardt@canonical.com> wrote: > > > > > > > > On 28.02.24 19:39, Peter Maydell wrote: > > > > > The limitation to a page dates back to commit 6d16c2f88f2a in 2009, > > > > > which was the first implementation of this function. I don't think > > > > > there's a particular reason for that value beyond that it was > > > > > probably a convenient value that was assumed to be likely "big enough". > > > > > > > > > > I think the idea with this bounce-buffer has always been that this > > > > > isn't really a code path we expected to end up in very often -- > > > > > it's supposed to be for when devices are doing DMA, which they > > > > > will typically be doing to memory (backed by host RAM), not > > > > > devices (backed by MMIO and needing a bounce buffer). So the > > > > > whole mechanism is a bit "last fallback to stop things breaking > > > > > entirely". > > > > > > > > > > The address_space_map() API says that it's allowed to return > > > > > a subset of the range you ask for, so if the virtio code doesn't > > > > > cope with the minimum being set to TARGET_PAGE_SIZE then either > > > > > we need to fix that virtio code or we need to change the API > > > > > of this function. (But I think you will also get a reduced > > > > > range if you try to use it across a boundary between normal > > > > > host-memory-backed RAM and a device MemoryRegion.) > > > > > > > > If we allow a bounce buffer only to be used once (via the in_use flag), > > > > why do we allow only a single bounce buffer? > > > > > > > > Could address_space_map() allocate a new bounce buffer on every call and > > > > address_space_unmap() deallocate it? > > > > > > > > Isn't the design with a single bounce buffer bound to fail with a > > > > multi-threaded client as collision can be expected? > > > > > > Yeah, I don't suppose multi-threaded was particularly expected. > > > Again, this is really a "handle the case where the guest does > > > something silly" setup, which is why only one bounce buffer. > > > > > > Why is your guest ending up in the bounce-buffer path? > > > > Happens for me with emulated CXL memory. > > Can we put that in the "something silly" bucket? :-) > But yes, I'm not surprised that CXL runs into this. Heinrich, > are you doing CXL testing, or is this some other workload? > > > I think the case I saw > > was split descriptors in virtio via address space caches > > https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L4043 > > > > One bounce buffer is in use for the outer loop and another for the descriptors > > it is pointing to. > > Mmm. The other assumption made in the design of the address_space_map() > API I think was that it was unlikely that a device would be trying > to do two DMA operations simultaneously. This is clearly not > true in practice. We definitely need to fix one end or other of > this API. > > (I'm not sure why the bounce-buffer limit ought to be per-AddressSpace: > is that just done in Matthias' series so that we can attach an > x-thingy property to the individual PCI device?) Yes, that's the result of review feedback to the early iterations of my series. Specifically, (1) a limit is needed to prevent rogue guests from hogging unlimited amounts of memory and (2) global parameters are frowned upon. Setting a suitable limit is much more practical when targeted at a given device/driver combination. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-02-29 14:53 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-28 12:59 [PATCH, v2] physmem: avoid bounce buffer too small Heinrich Schuchardt 2024-02-28 15:06 ` Philippe Mathieu-Daudé 2024-02-28 18:27 ` Heinrich Schuchardt 2024-02-28 18:39 ` Peter Maydell 2024-02-28 19:07 ` Heinrich Schuchardt 2024-02-29 1:11 ` Peter Xu 2024-02-29 10:22 ` Heinrich Schuchardt 2024-02-29 10:36 ` Mattias Nissler 2024-02-29 10:46 ` Jonathan Cameron via 2024-02-29 9:38 ` Peter Maydell 2024-02-29 10:59 ` Jonathan Cameron via 2024-02-29 11:11 ` Peter Maydell 2024-02-29 11:17 ` Heinrich Schuchardt 2024-02-29 12:34 ` Peter Maydell 2024-02-29 12:52 ` Mattias Nissler 2024-02-29 13:19 ` Peter Maydell 2024-02-29 14:17 ` Heinrich Schuchardt 2024-02-29 14:52 ` Peter Maydell 2024-02-29 11:18 ` Mattias Nissler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).