* [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:35 ` Raphael Norwitz
2024-02-13 17:32 ` Michael S. Tsirkin
2024-02-02 21:53 ` [PATCH v1 02/15] libvhost-user: Dynamically allocate memory for memory slots David Hildenbrand
` (15 subsequent siblings)
16 siblings, 2 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
We barely had mmap_offset set in the past. With virtio-mem and
dynamic-memslots that will change.
In vu_add_mem_reg() and vu_set_mem_table_exec_postcopy(), we are
performing pointer arithmetics, which is wrong. Let's simply
use dev_region->mmap_addr instead of "void *mmap_addr".
Fixes: ec94c8e621de ("Support adding individual regions in libvhost-user")
Fixes: 9bb38019942c ("vhost+postcopy: Send address back to qemu")
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index a3b158c671..7e515ed15d 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -800,8 +800,8 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
* Return the address to QEMU so that it can translate the ufd
* fault addresses back.
*/
- msg_region->userspace_addr = (uintptr_t)(mmap_addr +
- dev_region->mmap_offset);
+ msg_region->userspace_addr = dev_region->mmap_addr +
+ dev_region->mmap_offset;
/* Send the message back to qemu with the addresses filled in. */
vmsg->fd_num = 0;
@@ -969,8 +969,8 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
/* Return the address to QEMU so that it can translate the ufd
* fault addresses back.
*/
- msg_region->userspace_addr = (uintptr_t)(mmap_addr +
- dev_region->mmap_offset);
+ msg_region->userspace_addr = dev_region->mmap_addr +
+ dev_region->mmap_offset;
close(vmsg->fds[i]);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation
2024-02-02 21:53 ` [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation David Hildenbrand
@ 2024-02-04 1:35 ` Raphael Norwitz
2024-02-04 14:36 ` David Hildenbrand
2024-02-13 17:32 ` Michael S. Tsirkin
1 sibling, 1 reply; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:35 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
As a heads up, I've left Nutanix and updated it in MAINTAINERS. Will
be updating it again shortly so tagging these with my new work email.
On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand <david@redhat.com> wrote:
>
> We barely had mmap_offset set in the past. With virtio-mem and
> dynamic-memslots that will change.
>
> In vu_add_mem_reg() and vu_set_mem_table_exec_postcopy(), we are
> performing pointer arithmetics, which is wrong. Let's simply
> use dev_region->mmap_addr instead of "void *mmap_addr".
>
> Fixes: ec94c8e621de ("Support adding individual regions in libvhost-user")
> Fixes: 9bb38019942c ("vhost+postcopy: Send address back to qemu")
> Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index a3b158c671..7e515ed15d 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -800,8 +800,8 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> * Return the address to QEMU so that it can translate the ufd
> * fault addresses back.
> */
> - msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> - dev_region->mmap_offset);
> + msg_region->userspace_addr = dev_region->mmap_addr +
> + dev_region->mmap_offset;
>
> /* Send the message back to qemu with the addresses filled in. */
> vmsg->fd_num = 0;
> @@ -969,8 +969,8 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
> /* Return the address to QEMU so that it can translate the ufd
> * fault addresses back.
> */
> - msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> - dev_region->mmap_offset);
> + msg_region->userspace_addr = dev_region->mmap_addr +
> + dev_region->mmap_offset;
> close(vmsg->fds[i]);
> }
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation
2024-02-04 1:35 ` Raphael Norwitz
@ 2024-02-04 14:36 ` David Hildenbrand
2024-02-04 22:01 ` Raphael Norwitz
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-04 14:36 UTC (permalink / raw)
To: Raphael Norwitz
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On 04.02.24 02:35, Raphael Norwitz wrote:
> As a heads up, I've left Nutanix and updated it in MAINTAINERS. Will
> be updating it again shortly so tagging these with my new work email.
>
Thanks for the fast review! The mail server already complained to me :)
Maybe consider adding yourself as reviewer for vhost as well? (which
covers libvhost-user), I took your mail address from git history, not
get_maintainers.pl.
> On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> We barely had mmap_offset set in the past. With virtio-mem and
>> dynamic-memslots that will change.
>>
>> In vu_add_mem_reg() and vu_set_mem_table_exec_postcopy(), we are
>> performing pointer arithmetics, which is wrong. Let's simply
>> use dev_region->mmap_addr instead of "void *mmap_addr".
>>
>> Fixes: ec94c8e621de ("Support adding individual regions in libvhost-user")
>> Fixes: 9bb38019942c ("vhost+postcopy: Send address back to qemu")
>> Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation
2024-02-04 14:36 ` David Hildenbrand
@ 2024-02-04 22:01 ` Raphael Norwitz
2024-02-05 7:32 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 22:01 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Sun, Feb 4, 2024 at 9:36 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 04.02.24 02:35, Raphael Norwitz wrote:
> > As a heads up, I've left Nutanix and updated it in MAINTAINERS. Will
> > be updating it again shortly so tagging these with my new work email.
> >
>
> Thanks for the fast review! The mail server already complained to me :)
>
> Maybe consider adding yourself as reviewer for vhost as well? (which
> covers libvhost-user), I took your mail address from git history, not
> get_maintainers.pl.
I don't expect I'll have much time to review code outside of
vhost-user-blk/vhost-user-scsi, but happy to add an entry if it helps
folks tag me on relevant patches.
>
> > On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> We barely had mmap_offset set in the past. With virtio-mem and
> >> dynamic-memslots that will change.
> >>
> >> In vu_add_mem_reg() and vu_set_mem_table_exec_postcopy(), we are
> >> performing pointer arithmetics, which is wrong. Let's simply
> >> use dev_region->mmap_addr instead of "void *mmap_addr".
> >>
> >> Fixes: ec94c8e621de ("Support adding individual regions in libvhost-user")
> >> Fixes: 9bb38019942c ("vhost+postcopy: Send address back to qemu")
> >> Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >
> > Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
>
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation
2024-02-04 22:01 ` Raphael Norwitz
@ 2024-02-05 7:32 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-02-05 7:32 UTC (permalink / raw)
To: Raphael Norwitz
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On 04.02.24 23:01, Raphael Norwitz wrote:
> On Sun, Feb 4, 2024 at 9:36 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 04.02.24 02:35, Raphael Norwitz wrote:
>>> As a heads up, I've left Nutanix and updated it in MAINTAINERS. Will
>>> be updating it again shortly so tagging these with my new work email.
>>>
>>
>> Thanks for the fast review! The mail server already complained to me :)
>>
>> Maybe consider adding yourself as reviewer for vhost as well? (which
>> covers libvhost-user), I took your mail address from git history, not
>> get_maintainers.pl.
>
> I don't expect I'll have much time to review code outside of
> vhost-user-blk/vhost-user-scsi, but happy to add an entry if it helps
> folks tag me on relevant patches.
If it helps, it might make sense to split out libvhost-user into a
separate MAINTAINERS section.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation
2024-02-02 21:53 ` [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation David Hildenbrand
2024-02-04 1:35 ` Raphael Norwitz
@ 2024-02-13 17:32 ` Michael S. Tsirkin
2024-02-13 18:25 ` David Hildenbrand
1 sibling, 1 reply; 46+ messages in thread
From: Michael S. Tsirkin @ 2024-02-13 17:32 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Jason Wang, Stefan Hajnoczi, Stefano Garzarella,
Germano Veit Michel, Raphael Norwitz
On Fri, Feb 02, 2024 at 10:53:18PM +0100, David Hildenbrand wrote:
> We barely had mmap_offset set in the past. With virtio-mem and
> dynamic-memslots that will change.
>
> In vu_add_mem_reg() and vu_set_mem_table_exec_postcopy(), we are
> performing pointer arithmetics, which is wrong.
Wrong how? I suspect you mean arithmetic on void * pointers is not portable?
> Let's simply
> use dev_region->mmap_addr instead of "void *mmap_addr".
>
> Fixes: ec94c8e621de ("Support adding individual regions in libvhost-user")
> Fixes: 9bb38019942c ("vhost+postcopy: Send address back to qemu")
> Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> subprojects/libvhost-user/libvhost-user.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index a3b158c671..7e515ed15d 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -800,8 +800,8 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> * Return the address to QEMU so that it can translate the ufd
> * fault addresses back.
> */
> - msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> - dev_region->mmap_offset);
> + msg_region->userspace_addr = dev_region->mmap_addr +
> + dev_region->mmap_offset;
>
> /* Send the message back to qemu with the addresses filled in. */
> vmsg->fd_num = 0;
> @@ -969,8 +969,8 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
> /* Return the address to QEMU so that it can translate the ufd
> * fault addresses back.
> */
> - msg_region->userspace_addr = (uintptr_t)(mmap_addr +
> - dev_region->mmap_offset);
> + msg_region->userspace_addr = dev_region->mmap_addr +
> + dev_region->mmap_offset;
> close(vmsg->fds[i]);
> }
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation
2024-02-13 17:32 ` Michael S. Tsirkin
@ 2024-02-13 18:25 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-02-13 18:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, Jason Wang, Stefan Hajnoczi, Stefano Garzarella,
Germano Veit Michel, Raphael Norwitz
On 13.02.24 18:32, Michael S. Tsirkin wrote:
> On Fri, Feb 02, 2024 at 10:53:18PM +0100, David Hildenbrand wrote:
>> We barely had mmap_offset set in the past. With virtio-mem and
>> dynamic-memslots that will change.
>>
>> In vu_add_mem_reg() and vu_set_mem_table_exec_postcopy(), we are
>> performing pointer arithmetics, which is wrong.
>
> Wrong how? I suspect you mean arithmetic on void * pointers is not portable?
You are absolutely right, no idea how I concluded that we might have a
different pointer size here.
I'll either convert this patch into a cleanup or drop it for v2, thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 02/15] libvhost-user: Dynamically allocate memory for memory slots
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
2024-02-02 21:53 ` [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:36 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 03/15] libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509 David Hildenbrand
` (14 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
Let's prepare for increasing VHOST_USER_MAX_RAM_SLOTS by dynamically
allocating dev->regions. We don't have any ABI guarantees (not
dynamically linked), so we can simply change the layout of VuDev.
Let's zero out the memory, just as we used to do.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 11 +++++++++++
subprojects/libvhost-user/libvhost-user.h | 2 +-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 7e515ed15d..8a5a7a2295 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2171,6 +2171,8 @@ vu_deinit(VuDev *dev)
free(dev->vq);
dev->vq = NULL;
+ free(dev->regions);
+ dev->regions = NULL;
}
bool
@@ -2205,9 +2207,18 @@ vu_init(VuDev *dev,
dev->backend_fd = -1;
dev->max_queues = max_queues;
+ dev->regions = malloc(VHOST_USER_MAX_RAM_SLOTS * sizeof(dev->regions[0]));
+ if (!dev->regions) {
+ DPRINT("%s: failed to malloc mem regions\n", __func__);
+ return false;
+ }
+ memset(dev->regions, 0, VHOST_USER_MAX_RAM_SLOTS * sizeof(dev->regions[0]));
+
dev->vq = malloc(max_queues * sizeof(dev->vq[0]));
if (!dev->vq) {
DPRINT("%s: failed to malloc virtqueues\n", __func__);
+ free(dev->regions);
+ dev->regions = NULL;
return false;
}
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index c2352904f0..c882b4e3a2 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -398,7 +398,7 @@ typedef struct VuDevInflightInfo {
struct VuDev {
int sock;
uint32_t nregions;
- VuDevRegion regions[VHOST_USER_MAX_RAM_SLOTS];
+ VuDevRegion *regions;
VuVirtq *vq;
VuDevInflightInfo inflight_info;
int log_call_fd;
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 02/15] libvhost-user: Dynamically allocate memory for memory slots
2024-02-02 21:53 ` [PATCH v1 02/15] libvhost-user: Dynamically allocate memory for memory slots David Hildenbrand
@ 2024-02-04 1:36 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:36 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand <david@redhat.com> wrote:
>
> Let's prepare for increasing VHOST_USER_MAX_RAM_SLOTS by dynamically
> allocating dev->regions. We don't have any ABI guarantees (not
> dynamically linked), so we can simply change the layout of VuDev.
>
> Let's zero out the memory, just as we used to do.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 11 +++++++++++
> subprojects/libvhost-user/libvhost-user.h | 2 +-
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index 7e515ed15d..8a5a7a2295 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -2171,6 +2171,8 @@ vu_deinit(VuDev *dev)
>
> free(dev->vq);
> dev->vq = NULL;
> + free(dev->regions);
> + dev->regions = NULL;
> }
>
> bool
> @@ -2205,9 +2207,18 @@ vu_init(VuDev *dev,
> dev->backend_fd = -1;
> dev->max_queues = max_queues;
>
> + dev->regions = malloc(VHOST_USER_MAX_RAM_SLOTS * sizeof(dev->regions[0]));
> + if (!dev->regions) {
> + DPRINT("%s: failed to malloc mem regions\n", __func__);
> + return false;
> + }
> + memset(dev->regions, 0, VHOST_USER_MAX_RAM_SLOTS * sizeof(dev->regions[0]));
> +
> dev->vq = malloc(max_queues * sizeof(dev->vq[0]));
> if (!dev->vq) {
> DPRINT("%s: failed to malloc virtqueues\n", __func__);
> + free(dev->regions);
> + dev->regions = NULL;
> return false;
> }
>
> diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> index c2352904f0..c882b4e3a2 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -398,7 +398,7 @@ typedef struct VuDevInflightInfo {
> struct VuDev {
> int sock;
> uint32_t nregions;
> - VuDevRegion regions[VHOST_USER_MAX_RAM_SLOTS];
> + VuDevRegion *regions;
> VuVirtq *vq;
> VuDevInflightInfo inflight_info;
> int log_call_fd;
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 03/15] libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
2024-02-02 21:53 ` [PATCH v1 01/15] libvhost-user: Fix msg_region->userspace_addr computation David Hildenbrand
2024-02-02 21:53 ` [PATCH v1 02/15] libvhost-user: Dynamically allocate memory for memory slots David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:42 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 04/15] libvhost-user: Factor out removing all mem regions David Hildenbrand
` (13 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
Let's support up to 509 mem slots, just like vhost in the kernel usually
does and the rust vhost-user implementation recently [1] started doing.
This is required to properly support memory hotplug, either using
multiple DIMMs (ACPI supports up to 256) or using virtio-mem.
The 509 used to be the KVM limit, it supported 512, but 3 were
used for internal purposes. Currently, KVM supports more than 512, but
it usually doesn't make use of more than ~260 (i.e., 256 DIMMs + boot
memory), except when other memory devices like PCI devices with BARs are
used. So, 509 seems to work well for vhost in the kernel.
Details can be found in the QEMU change that made virtio-mem consume
up to 256 mem slots across all virtio-mem devices. [2]
509 mem slots implies 509 VMAs/mappings in the worst case (even though,
in practice with virtio-mem we won't be seeing more than ~260 in most
setups).
With max_map_count under Linux defaulting to 64k, 509 mem slots
still correspond to less than 1% of the maximum number of mappings.
There are plenty left for the application to consume.
[1] https://github.com/rust-vmm/vhost/pull/224
[2] https://lore.kernel.org/all/20230926185738.277351-1-david@redhat.com/
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
index c882b4e3a2..deb40e77b3 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -31,10 +31,12 @@
#define VHOST_MEMORY_BASELINE_NREGIONS 8
/*
- * Set a reasonable maximum number of ram slots, which will be supported by
- * any architecture.
+ * vhost in the kernel usually supports 509 mem slots. 509 used to be the
+ * KVM limit, it supported 512, but 3 were used for internal purposes. This
+ * limit is sufficient to support many DIMMs and virtio-mem in
+ * "dynamic-memslots" mode.
*/
-#define VHOST_USER_MAX_RAM_SLOTS 32
+#define VHOST_USER_MAX_RAM_SLOTS 509
#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 03/15] libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509
2024-02-02 21:53 ` [PATCH v1 03/15] libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509 David Hildenbrand
@ 2024-02-04 1:42 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:42 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand <david@redhat.com> wrote:
>
> Let's support up to 509 mem slots, just like vhost in the kernel usually
> does and the rust vhost-user implementation recently [1] started doing.
> This is required to properly support memory hotplug, either using
> multiple DIMMs (ACPI supports up to 256) or using virtio-mem.
>
> The 509 used to be the KVM limit, it supported 512, but 3 were
> used for internal purposes. Currently, KVM supports more than 512, but
> it usually doesn't make use of more than ~260 (i.e., 256 DIMMs + boot
> memory), except when other memory devices like PCI devices with BARs are
> used. So, 509 seems to work well for vhost in the kernel.
>
> Details can be found in the QEMU change that made virtio-mem consume
> up to 256 mem slots across all virtio-mem devices. [2]
>
> 509 mem slots implies 509 VMAs/mappings in the worst case (even though,
> in practice with virtio-mem we won't be seeing more than ~260 in most
> setups).
>
> With max_map_count under Linux defaulting to 64k, 509 mem slots
> still correspond to less than 1% of the maximum number of mappings.
> There are plenty left for the application to consume.
>
> [1] https://github.com/rust-vmm/vhost/pull/224
> [2] https://lore.kernel.org/all/20230926185738.277351-1-david@redhat.com/
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.h | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h
> index c882b4e3a2..deb40e77b3 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -31,10 +31,12 @@
> #define VHOST_MEMORY_BASELINE_NREGIONS 8
>
> /*
> - * Set a reasonable maximum number of ram slots, which will be supported by
> - * any architecture.
> + * vhost in the kernel usually supports 509 mem slots. 509 used to be the
> + * KVM limit, it supported 512, but 3 were used for internal purposes. This
> + * limit is sufficient to support many DIMMs and virtio-mem in
> + * "dynamic-memslots" mode.
> */
> -#define VHOST_USER_MAX_RAM_SLOTS 32
> +#define VHOST_USER_MAX_RAM_SLOTS 509
>
> #define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 04/15] libvhost-user: Factor out removing all mem regions
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (2 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 03/15] libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509 David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:43 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 05/15] libvhost-user: Merge vu_set_mem_table_exec_postcopy() into vu_set_mem_table_exec() David Hildenbrand
` (12 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
Let's factor it out. Note that the check for MAP_FAILED was wrong as
we never set mmap_addr if mmap() failed. We'll remove the NULL check
separately.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 34 ++++++++++++-----------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 8a5a7a2295..d5b3468e43 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -240,6 +240,22 @@ qva_to_va(VuDev *dev, uint64_t qemu_addr)
return NULL;
}
+static void
+vu_remove_all_mem_regs(VuDev *dev)
+{
+ unsigned int i;
+
+ for (i = 0; i < dev->nregions; i++) {
+ VuDevRegion *r = &dev->regions[i];
+ void *ma = (void *)(uintptr_t)r->mmap_addr;
+
+ if (ma) {
+ munmap(ma, r->size + r->mmap_offset);
+ }
+ }
+ dev->nregions = 0;
+}
+
static void
vmsg_close_fds(VhostUserMsg *vmsg)
{
@@ -1003,14 +1019,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
unsigned int i;
VhostUserMemory m = vmsg->payload.memory, *memory = &m;
- for (i = 0; i < dev->nregions; i++) {
- VuDevRegion *r = &dev->regions[i];
- void *ma = (void *) (uintptr_t) r->mmap_addr;
-
- if (ma) {
- munmap(ma, r->size + r->mmap_offset);
- }
- }
+ vu_remove_all_mem_regs(dev);
dev->nregions = memory->nregions;
if (dev->postcopy_listening) {
@@ -2112,14 +2121,7 @@ vu_deinit(VuDev *dev)
{
unsigned int i;
- for (i = 0; i < dev->nregions; i++) {
- VuDevRegion *r = &dev->regions[i];
- void *m = (void *) (uintptr_t) r->mmap_addr;
- if (m != MAP_FAILED) {
- munmap(m, r->size + r->mmap_offset);
- }
- }
- dev->nregions = 0;
+ vu_remove_all_mem_regs(dev);
for (i = 0; i < dev->max_queues; i++) {
VuVirtq *vq = &dev->vq[i];
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 04/15] libvhost-user: Factor out removing all mem regions
2024-02-02 21:53 ` [PATCH v1 04/15] libvhost-user: Factor out removing all mem regions David Hildenbrand
@ 2024-02-04 1:43 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:43 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand <david@redhat.com> wrote:
>
> Let's factor it out. Note that the check for MAP_FAILED was wrong as
> we never set mmap_addr if mmap() failed. We'll remove the NULL check
> separately.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 34 ++++++++++++-----------
> 1 file changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index 8a5a7a2295..d5b3468e43 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -240,6 +240,22 @@ qva_to_va(VuDev *dev, uint64_t qemu_addr)
> return NULL;
> }
>
> +static void
> +vu_remove_all_mem_regs(VuDev *dev)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < dev->nregions; i++) {
> + VuDevRegion *r = &dev->regions[i];
> + void *ma = (void *)(uintptr_t)r->mmap_addr;
> +
> + if (ma) {
> + munmap(ma, r->size + r->mmap_offset);
> + }
> + }
> + dev->nregions = 0;
> +}
> +
> static void
> vmsg_close_fds(VhostUserMsg *vmsg)
> {
> @@ -1003,14 +1019,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> unsigned int i;
> VhostUserMemory m = vmsg->payload.memory, *memory = &m;
>
> - for (i = 0; i < dev->nregions; i++) {
> - VuDevRegion *r = &dev->regions[i];
> - void *ma = (void *) (uintptr_t) r->mmap_addr;
> -
> - if (ma) {
> - munmap(ma, r->size + r->mmap_offset);
> - }
> - }
> + vu_remove_all_mem_regs(dev);
> dev->nregions = memory->nregions;
>
> if (dev->postcopy_listening) {
> @@ -2112,14 +2121,7 @@ vu_deinit(VuDev *dev)
> {
> unsigned int i;
>
> - for (i = 0; i < dev->nregions; i++) {
> - VuDevRegion *r = &dev->regions[i];
> - void *m = (void *) (uintptr_t) r->mmap_addr;
> - if (m != MAP_FAILED) {
> - munmap(m, r->size + r->mmap_offset);
> - }
> - }
> - dev->nregions = 0;
> + vu_remove_all_mem_regs(dev);
>
> for (i = 0; i < dev->max_queues; i++) {
> VuVirtq *vq = &dev->vq[i];
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 05/15] libvhost-user: Merge vu_set_mem_table_exec_postcopy() into vu_set_mem_table_exec()
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (3 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 04/15] libvhost-user: Factor out removing all mem regions David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:44 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 06/15] libvhost-user: Factor out adding a memory region David Hildenbrand
` (11 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
Let's reduce some code duplication and prepare for further changes.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 119 +++++++---------------
1 file changed, 39 insertions(+), 80 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index d5b3468e43..d9e2214ad2 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -937,95 +937,23 @@ vu_get_shared_object(VuDev *dev, VhostUserMsg *vmsg)
}
static bool
-vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
+vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
{
- unsigned int i;
VhostUserMemory m = vmsg->payload.memory, *memory = &m;
- dev->nregions = memory->nregions;
-
- DPRINT("Nregions: %u\n", memory->nregions);
- for (i = 0; i < dev->nregions; i++) {
- void *mmap_addr;
- VhostUserMemoryRegion *msg_region = &memory->regions[i];
- VuDevRegion *dev_region = &dev->regions[i];
-
- DPRINT("Region %d\n", i);
- DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
- msg_region->guest_phys_addr);
- DPRINT(" memory_size: 0x%016"PRIx64"\n",
- msg_region->memory_size);
- DPRINT(" userspace_addr 0x%016"PRIx64"\n",
- msg_region->userspace_addr);
- DPRINT(" mmap_offset 0x%016"PRIx64"\n",
- msg_region->mmap_offset);
-
- dev_region->gpa = msg_region->guest_phys_addr;
- dev_region->size = msg_region->memory_size;
- dev_region->qva = msg_region->userspace_addr;
- dev_region->mmap_offset = msg_region->mmap_offset;
+ int prot = PROT_READ | PROT_WRITE;
+ unsigned int i;
- /* We don't use offset argument of mmap() since the
- * mapped address has to be page aligned, and we use huge
- * pages.
+ if (dev->postcopy_listening) {
+ /*
* In postcopy we're using PROT_NONE here to catch anyone
* accessing it before we userfault
*/
- mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
- PROT_NONE, MAP_SHARED | MAP_NORESERVE,
- vmsg->fds[i], 0);
-
- if (mmap_addr == MAP_FAILED) {
- vu_panic(dev, "region mmap error: %s", strerror(errno));
- } else {
- dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
- DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
- dev_region->mmap_addr);
- }
-
- /* Return the address to QEMU so that it can translate the ufd
- * fault addresses back.
- */
- msg_region->userspace_addr = dev_region->mmap_addr +
- dev_region->mmap_offset;
- close(vmsg->fds[i]);
- }
-
- /* Send the message back to qemu with the addresses filled in */
- vmsg->fd_num = 0;
- if (!vu_send_reply(dev, dev->sock, vmsg)) {
- vu_panic(dev, "failed to respond to set-mem-table for postcopy");
- return false;
- }
-
- /* Wait for QEMU to confirm that it's registered the handler for the
- * faults.
- */
- if (!dev->read_msg(dev, dev->sock, vmsg) ||
- vmsg->size != sizeof(vmsg->payload.u64) ||
- vmsg->payload.u64 != 0) {
- vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
- return false;
+ prot = PROT_NONE;
}
- /* OK, now we can go and register the memory and generate faults */
- (void)generate_faults(dev);
-
- return false;
-}
-
-static bool
-vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
-{
- unsigned int i;
- VhostUserMemory m = vmsg->payload.memory, *memory = &m;
-
vu_remove_all_mem_regs(dev);
dev->nregions = memory->nregions;
- if (dev->postcopy_listening) {
- return vu_set_mem_table_exec_postcopy(dev, vmsg);
- }
-
DPRINT("Nregions: %u\n", memory->nregions);
for (i = 0; i < dev->nregions; i++) {
void *mmap_addr;
@@ -1051,8 +979,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
* mapped address has to be page aligned, and we use huge
* pages. */
mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
- PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
- vmsg->fds[i], 0);
+ prot, MAP_SHARED | MAP_NORESERVE, vmsg->fds[i], 0);
if (mmap_addr == MAP_FAILED) {
vu_panic(dev, "region mmap error: %s", strerror(errno));
@@ -1062,9 +989,41 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
dev_region->mmap_addr);
}
+ if (dev->postcopy_listening) {
+ /*
+ * Return the address to QEMU so that it can translate the ufd
+ * fault addresses back.
+ */
+ msg_region->userspace_addr = dev_region->mmap_addr +
+ dev_region->mmap_offset;
+ }
close(vmsg->fds[i]);
}
+ if (dev->postcopy_listening) {
+ /* Send the message back to qemu with the addresses filled in */
+ vmsg->fd_num = 0;
+ if (!vu_send_reply(dev, dev->sock, vmsg)) {
+ vu_panic(dev, "failed to respond to set-mem-table for postcopy");
+ return false;
+ }
+
+ /*
+ * Wait for QEMU to confirm that it's registered the handler for the
+ * faults.
+ */
+ if (!dev->read_msg(dev, dev->sock, vmsg) ||
+ vmsg->size != sizeof(vmsg->payload.u64) ||
+ vmsg->payload.u64 != 0) {
+ vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
+ return false;
+ }
+
+ /* OK, now we can go and register the memory and generate faults */
+ (void)generate_faults(dev);
+ return false;
+ }
+
for (i = 0; i < dev->max_queues; i++) {
if (dev->vq[i].vring.desc) {
if (map_ring(dev, &dev->vq[i])) {
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 05/15] libvhost-user: Merge vu_set_mem_table_exec_postcopy() into vu_set_mem_table_exec()
2024-02-02 21:53 ` [PATCH v1 05/15] libvhost-user: Merge vu_set_mem_table_exec_postcopy() into vu_set_mem_table_exec() David Hildenbrand
@ 2024-02-04 1:44 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:44 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand <david@redhat.com> wrote:
>
> Let's reduce some code duplication and prepare for further changes.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 119 +++++++---------------
> 1 file changed, 39 insertions(+), 80 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index d5b3468e43..d9e2214ad2 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -937,95 +937,23 @@ vu_get_shared_object(VuDev *dev, VhostUserMsg *vmsg)
> }
>
> static bool
> -vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
> +vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> {
> - unsigned int i;
> VhostUserMemory m = vmsg->payload.memory, *memory = &m;
> - dev->nregions = memory->nregions;
> -
> - DPRINT("Nregions: %u\n", memory->nregions);
> - for (i = 0; i < dev->nregions; i++) {
> - void *mmap_addr;
> - VhostUserMemoryRegion *msg_region = &memory->regions[i];
> - VuDevRegion *dev_region = &dev->regions[i];
> -
> - DPRINT("Region %d\n", i);
> - DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
> - msg_region->guest_phys_addr);
> - DPRINT(" memory_size: 0x%016"PRIx64"\n",
> - msg_region->memory_size);
> - DPRINT(" userspace_addr 0x%016"PRIx64"\n",
> - msg_region->userspace_addr);
> - DPRINT(" mmap_offset 0x%016"PRIx64"\n",
> - msg_region->mmap_offset);
> -
> - dev_region->gpa = msg_region->guest_phys_addr;
> - dev_region->size = msg_region->memory_size;
> - dev_region->qva = msg_region->userspace_addr;
> - dev_region->mmap_offset = msg_region->mmap_offset;
> + int prot = PROT_READ | PROT_WRITE;
> + unsigned int i;
>
> - /* We don't use offset argument of mmap() since the
> - * mapped address has to be page aligned, and we use huge
> - * pages.
> + if (dev->postcopy_listening) {
> + /*
> * In postcopy we're using PROT_NONE here to catch anyone
> * accessing it before we userfault
> */
> - mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_NONE, MAP_SHARED | MAP_NORESERVE,
> - vmsg->fds[i], 0);
> -
> - if (mmap_addr == MAP_FAILED) {
> - vu_panic(dev, "region mmap error: %s", strerror(errno));
> - } else {
> - dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
> - DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
> - dev_region->mmap_addr);
> - }
> -
> - /* Return the address to QEMU so that it can translate the ufd
> - * fault addresses back.
> - */
> - msg_region->userspace_addr = dev_region->mmap_addr +
> - dev_region->mmap_offset;
> - close(vmsg->fds[i]);
> - }
> -
> - /* Send the message back to qemu with the addresses filled in */
> - vmsg->fd_num = 0;
> - if (!vu_send_reply(dev, dev->sock, vmsg)) {
> - vu_panic(dev, "failed to respond to set-mem-table for postcopy");
> - return false;
> - }
> -
> - /* Wait for QEMU to confirm that it's registered the handler for the
> - * faults.
> - */
> - if (!dev->read_msg(dev, dev->sock, vmsg) ||
> - vmsg->size != sizeof(vmsg->payload.u64) ||
> - vmsg->payload.u64 != 0) {
> - vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
> - return false;
> + prot = PROT_NONE;
> }
>
> - /* OK, now we can go and register the memory and generate faults */
> - (void)generate_faults(dev);
> -
> - return false;
> -}
> -
> -static bool
> -vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> -{
> - unsigned int i;
> - VhostUserMemory m = vmsg->payload.memory, *memory = &m;
> -
> vu_remove_all_mem_regs(dev);
> dev->nregions = memory->nregions;
>
> - if (dev->postcopy_listening) {
> - return vu_set_mem_table_exec_postcopy(dev, vmsg);
> - }
> -
> DPRINT("Nregions: %u\n", memory->nregions);
> for (i = 0; i < dev->nregions; i++) {
> void *mmap_addr;
> @@ -1051,8 +979,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> * mapped address has to be page aligned, and we use huge
> * pages. */
> mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
> - vmsg->fds[i], 0);
> + prot, MAP_SHARED | MAP_NORESERVE, vmsg->fds[i], 0);
>
> if (mmap_addr == MAP_FAILED) {
> vu_panic(dev, "region mmap error: %s", strerror(errno));
> @@ -1062,9 +989,41 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> dev_region->mmap_addr);
> }
>
> + if (dev->postcopy_listening) {
> + /*
> + * Return the address to QEMU so that it can translate the ufd
> + * fault addresses back.
> + */
> + msg_region->userspace_addr = dev_region->mmap_addr +
> + dev_region->mmap_offset;
> + }
> close(vmsg->fds[i]);
> }
>
> + if (dev->postcopy_listening) {
> + /* Send the message back to qemu with the addresses filled in */
> + vmsg->fd_num = 0;
> + if (!vu_send_reply(dev, dev->sock, vmsg)) {
> + vu_panic(dev, "failed to respond to set-mem-table for postcopy");
> + return false;
> + }
> +
> + /*
> + * Wait for QEMU to confirm that it's registered the handler for the
> + * faults.
> + */
> + if (!dev->read_msg(dev, dev->sock, vmsg) ||
> + vmsg->size != sizeof(vmsg->payload.u64) ||
> + vmsg->payload.u64 != 0) {
> + vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
> + return false;
> + }
> +
> + /* OK, now we can go and register the memory and generate faults */
> + (void)generate_faults(dev);
> + return false;
> + }
> +
> for (i = 0; i < dev->max_queues; i++) {
> if (dev->vq[i].vring.desc) {
> if (map_ring(dev, &dev->vq[i])) {
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 06/15] libvhost-user: Factor out adding a memory region
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (4 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 05/15] libvhost-user: Merge vu_set_mem_table_exec_postcopy() into vu_set_mem_table_exec() David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:44 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 07/15] libvhost-user: No need to check for NULL when unmapping David Hildenbrand
` (10 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
Let's factor it out, reducing quite some code duplication and perparing
for further changes.
If we fail to mmap a region and panic, we now simply don't add that
(broken) region.
Note that we now increment dev->nregions as we are successfully
adding memory regions, and don't increment dev->nregions if anything went
wrong.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 168 ++++++++--------------
1 file changed, 60 insertions(+), 108 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index d9e2214ad2..a2baefe84b 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -256,6 +256,61 @@ vu_remove_all_mem_regs(VuDev *dev)
dev->nregions = 0;
}
+static void
+_vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
+{
+ int prot = PROT_READ | PROT_WRITE;
+ VuDevRegion *r;
+ void *mmap_addr;
+
+ DPRINT("Adding region %d\n", dev->nregions);
+ DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
+ msg_region->guest_phys_addr);
+ DPRINT(" memory_size: 0x%016"PRIx64"\n",
+ msg_region->memory_size);
+ DPRINT(" userspace_addr 0x%016"PRIx64"\n",
+ msg_region->userspace_addr);
+ DPRINT(" mmap_offset 0x%016"PRIx64"\n",
+ msg_region->mmap_offset);
+
+ if (dev->postcopy_listening) {
+ /*
+ * In postcopy we're using PROT_NONE here to catch anyone
+ * accessing it before we userfault
+ */
+ prot = PROT_NONE;
+ }
+
+ /*
+ * We don't use offset argument of mmap() since the mapped address has
+ * to be page aligned, and we use huge pages.
+ */
+ mmap_addr = mmap(0, msg_region->memory_size + msg_region->mmap_offset,
+ prot, MAP_SHARED | MAP_NORESERVE, fd, 0);
+ if (mmap_addr == MAP_FAILED) {
+ vu_panic(dev, "region mmap error: %s", strerror(errno));
+ return;
+ }
+ DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
+ (uint64_t)(uintptr_t)mmap_addr);
+
+ r = &dev->regions[dev->nregions];
+ r->gpa = msg_region->guest_phys_addr;
+ r->size = msg_region->memory_size;
+ r->qva = msg_region->userspace_addr;
+ r->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
+ r->mmap_offset = msg_region->mmap_offset;
+ dev->nregions++;
+
+ if (dev->postcopy_listening) {
+ /*
+ * Return the address to QEMU so that it can translate the ufd
+ * fault addresses back.
+ */
+ msg_region->userspace_addr = r->mmap_addr + r->mmap_offset;
+ }
+}
+
static void
vmsg_close_fds(VhostUserMsg *vmsg)
{
@@ -727,10 +782,7 @@ generate_faults(VuDev *dev) {
static bool
vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
int i;
- bool track_ramblocks = dev->postcopy_listening;
VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
- VuDevRegion *dev_region = &dev->regions[dev->nregions];
- void *mmap_addr;
if (vmsg->fd_num != 1) {
vmsg_close_fds(vmsg);
@@ -760,69 +812,20 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
* we know all the postcopy client bases have been received, and we
* should start generating faults.
*/
- if (track_ramblocks &&
+ if (dev->postcopy_listening &&
vmsg->size == sizeof(vmsg->payload.u64) &&
vmsg->payload.u64 == 0) {
(void)generate_faults(dev);
return false;
}
- DPRINT("Adding region: %u\n", dev->nregions);
- DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
- msg_region->guest_phys_addr);
- DPRINT(" memory_size: 0x%016"PRIx64"\n",
- msg_region->memory_size);
- DPRINT(" userspace_addr 0x%016"PRIx64"\n",
- msg_region->userspace_addr);
- DPRINT(" mmap_offset 0x%016"PRIx64"\n",
- msg_region->mmap_offset);
-
- dev_region->gpa = msg_region->guest_phys_addr;
- dev_region->size = msg_region->memory_size;
- dev_region->qva = msg_region->userspace_addr;
- dev_region->mmap_offset = msg_region->mmap_offset;
-
- /*
- * We don't use offset argument of mmap() since the
- * mapped address has to be page aligned, and we use huge
- * pages.
- */
- if (track_ramblocks) {
- /*
- * In postcopy we're using PROT_NONE here to catch anyone
- * accessing it before we userfault.
- */
- mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
- PROT_NONE, MAP_SHARED | MAP_NORESERVE,
- vmsg->fds[0], 0);
- } else {
- mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
- PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
- vmsg->fds[0], 0);
- }
-
- if (mmap_addr == MAP_FAILED) {
- vu_panic(dev, "region mmap error: %s", strerror(errno));
- } else {
- dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
- DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
- dev_region->mmap_addr);
- }
-
+ _vu_add_mem_reg(dev, msg_region, vmsg->fds[0]);
close(vmsg->fds[0]);
- if (track_ramblocks) {
- /*
- * Return the address to QEMU so that it can translate the ufd
- * fault addresses back.
- */
- msg_region->userspace_addr = dev_region->mmap_addr +
- dev_region->mmap_offset;
-
+ if (dev->postcopy_listening) {
/* Send the message back to qemu with the addresses filled in. */
vmsg->fd_num = 0;
DPRINT("Successfully added new region in postcopy\n");
- dev->nregions++;
return true;
} else {
for (i = 0; i < dev->max_queues; i++) {
@@ -835,7 +838,6 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
}
DPRINT("Successfully added new region\n");
- dev->nregions++;
return false;
}
}
@@ -940,63 +942,13 @@ static bool
vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
{
VhostUserMemory m = vmsg->payload.memory, *memory = &m;
- int prot = PROT_READ | PROT_WRITE;
unsigned int i;
- if (dev->postcopy_listening) {
- /*
- * In postcopy we're using PROT_NONE here to catch anyone
- * accessing it before we userfault
- */
- prot = PROT_NONE;
- }
-
vu_remove_all_mem_regs(dev);
- dev->nregions = memory->nregions;
DPRINT("Nregions: %u\n", memory->nregions);
- for (i = 0; i < dev->nregions; i++) {
- void *mmap_addr;
- VhostUserMemoryRegion *msg_region = &memory->regions[i];
- VuDevRegion *dev_region = &dev->regions[i];
-
- DPRINT("Region %d\n", i);
- DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
- msg_region->guest_phys_addr);
- DPRINT(" memory_size: 0x%016"PRIx64"\n",
- msg_region->memory_size);
- DPRINT(" userspace_addr 0x%016"PRIx64"\n",
- msg_region->userspace_addr);
- DPRINT(" mmap_offset 0x%016"PRIx64"\n",
- msg_region->mmap_offset);
-
- dev_region->gpa = msg_region->guest_phys_addr;
- dev_region->size = msg_region->memory_size;
- dev_region->qva = msg_region->userspace_addr;
- dev_region->mmap_offset = msg_region->mmap_offset;
-
- /* We don't use offset argument of mmap() since the
- * mapped address has to be page aligned, and we use huge
- * pages. */
- mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
- prot, MAP_SHARED | MAP_NORESERVE, vmsg->fds[i], 0);
-
- if (mmap_addr == MAP_FAILED) {
- vu_panic(dev, "region mmap error: %s", strerror(errno));
- } else {
- dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
- DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
- dev_region->mmap_addr);
- }
-
- if (dev->postcopy_listening) {
- /*
- * Return the address to QEMU so that it can translate the ufd
- * fault addresses back.
- */
- msg_region->userspace_addr = dev_region->mmap_addr +
- dev_region->mmap_offset;
- }
+ for (i = 0; i < memory->nregions; i++) {
+ _vu_add_mem_reg(dev, &memory->regions[i], vmsg->fds[i]);
close(vmsg->fds[i]);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 06/15] libvhost-user: Factor out adding a memory region
2024-02-02 21:53 ` [PATCH v1 06/15] libvhost-user: Factor out adding a memory region David Hildenbrand
@ 2024-02-04 1:44 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:44 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand <david@redhat.com> wrote:
>
> Let's factor it out, reducing quite some code duplication and perparing
> for further changes.
>
> If we fail to mmap a region and panic, we now simply don't add that
> (broken) region.
>
> Note that we now increment dev->nregions as we are successfully
> adding memory regions, and don't increment dev->nregions if anything went
> wrong.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 168 ++++++++--------------
> 1 file changed, 60 insertions(+), 108 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index d9e2214ad2..a2baefe84b 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -256,6 +256,61 @@ vu_remove_all_mem_regs(VuDev *dev)
> dev->nregions = 0;
> }
>
> +static void
> +_vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> +{
> + int prot = PROT_READ | PROT_WRITE;
> + VuDevRegion *r;
> + void *mmap_addr;
> +
> + DPRINT("Adding region %d\n", dev->nregions);
> + DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
> + msg_region->guest_phys_addr);
> + DPRINT(" memory_size: 0x%016"PRIx64"\n",
> + msg_region->memory_size);
> + DPRINT(" userspace_addr 0x%016"PRIx64"\n",
> + msg_region->userspace_addr);
> + DPRINT(" mmap_offset 0x%016"PRIx64"\n",
> + msg_region->mmap_offset);
> +
> + if (dev->postcopy_listening) {
> + /*
> + * In postcopy we're using PROT_NONE here to catch anyone
> + * accessing it before we userfault
> + */
> + prot = PROT_NONE;
> + }
> +
> + /*
> + * We don't use offset argument of mmap() since the mapped address has
> + * to be page aligned, and we use huge pages.
> + */
> + mmap_addr = mmap(0, msg_region->memory_size + msg_region->mmap_offset,
> + prot, MAP_SHARED | MAP_NORESERVE, fd, 0);
> + if (mmap_addr == MAP_FAILED) {
> + vu_panic(dev, "region mmap error: %s", strerror(errno));
> + return;
> + }
> + DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
> + (uint64_t)(uintptr_t)mmap_addr);
> +
> + r = &dev->regions[dev->nregions];
> + r->gpa = msg_region->guest_phys_addr;
> + r->size = msg_region->memory_size;
> + r->qva = msg_region->userspace_addr;
> + r->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
> + r->mmap_offset = msg_region->mmap_offset;
> + dev->nregions++;
> +
> + if (dev->postcopy_listening) {
> + /*
> + * Return the address to QEMU so that it can translate the ufd
> + * fault addresses back.
> + */
> + msg_region->userspace_addr = r->mmap_addr + r->mmap_offset;
> + }
> +}
> +
> static void
> vmsg_close_fds(VhostUserMsg *vmsg)
> {
> @@ -727,10 +782,7 @@ generate_faults(VuDev *dev) {
> static bool
> vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> int i;
> - bool track_ramblocks = dev->postcopy_listening;
> VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
> - VuDevRegion *dev_region = &dev->regions[dev->nregions];
> - void *mmap_addr;
>
> if (vmsg->fd_num != 1) {
> vmsg_close_fds(vmsg);
> @@ -760,69 +812,20 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> * we know all the postcopy client bases have been received, and we
> * should start generating faults.
> */
> - if (track_ramblocks &&
> + if (dev->postcopy_listening &&
> vmsg->size == sizeof(vmsg->payload.u64) &&
> vmsg->payload.u64 == 0) {
> (void)generate_faults(dev);
> return false;
> }
>
> - DPRINT("Adding region: %u\n", dev->nregions);
> - DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
> - msg_region->guest_phys_addr);
> - DPRINT(" memory_size: 0x%016"PRIx64"\n",
> - msg_region->memory_size);
> - DPRINT(" userspace_addr 0x%016"PRIx64"\n",
> - msg_region->userspace_addr);
> - DPRINT(" mmap_offset 0x%016"PRIx64"\n",
> - msg_region->mmap_offset);
> -
> - dev_region->gpa = msg_region->guest_phys_addr;
> - dev_region->size = msg_region->memory_size;
> - dev_region->qva = msg_region->userspace_addr;
> - dev_region->mmap_offset = msg_region->mmap_offset;
> -
> - /*
> - * We don't use offset argument of mmap() since the
> - * mapped address has to be page aligned, and we use huge
> - * pages.
> - */
> - if (track_ramblocks) {
> - /*
> - * In postcopy we're using PROT_NONE here to catch anyone
> - * accessing it before we userfault.
> - */
> - mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_NONE, MAP_SHARED | MAP_NORESERVE,
> - vmsg->fds[0], 0);
> - } else {
> - mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
> - vmsg->fds[0], 0);
> - }
> -
> - if (mmap_addr == MAP_FAILED) {
> - vu_panic(dev, "region mmap error: %s", strerror(errno));
> - } else {
> - dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
> - DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
> - dev_region->mmap_addr);
> - }
> -
> + _vu_add_mem_reg(dev, msg_region, vmsg->fds[0]);
> close(vmsg->fds[0]);
>
> - if (track_ramblocks) {
> - /*
> - * Return the address to QEMU so that it can translate the ufd
> - * fault addresses back.
> - */
> - msg_region->userspace_addr = dev_region->mmap_addr +
> - dev_region->mmap_offset;
> -
> + if (dev->postcopy_listening) {
> /* Send the message back to qemu with the addresses filled in. */
> vmsg->fd_num = 0;
> DPRINT("Successfully added new region in postcopy\n");
> - dev->nregions++;
> return true;
> } else {
> for (i = 0; i < dev->max_queues; i++) {
> @@ -835,7 +838,6 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> }
>
> DPRINT("Successfully added new region\n");
> - dev->nregions++;
> return false;
> }
> }
> @@ -940,63 +942,13 @@ static bool
> vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
> {
> VhostUserMemory m = vmsg->payload.memory, *memory = &m;
> - int prot = PROT_READ | PROT_WRITE;
> unsigned int i;
>
> - if (dev->postcopy_listening) {
> - /*
> - * In postcopy we're using PROT_NONE here to catch anyone
> - * accessing it before we userfault
> - */
> - prot = PROT_NONE;
> - }
> -
> vu_remove_all_mem_regs(dev);
> - dev->nregions = memory->nregions;
>
> DPRINT("Nregions: %u\n", memory->nregions);
> - for (i = 0; i < dev->nregions; i++) {
> - void *mmap_addr;
> - VhostUserMemoryRegion *msg_region = &memory->regions[i];
> - VuDevRegion *dev_region = &dev->regions[i];
> -
> - DPRINT("Region %d\n", i);
> - DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
> - msg_region->guest_phys_addr);
> - DPRINT(" memory_size: 0x%016"PRIx64"\n",
> - msg_region->memory_size);
> - DPRINT(" userspace_addr 0x%016"PRIx64"\n",
> - msg_region->userspace_addr);
> - DPRINT(" mmap_offset 0x%016"PRIx64"\n",
> - msg_region->mmap_offset);
> -
> - dev_region->gpa = msg_region->guest_phys_addr;
> - dev_region->size = msg_region->memory_size;
> - dev_region->qva = msg_region->userspace_addr;
> - dev_region->mmap_offset = msg_region->mmap_offset;
> -
> - /* We don't use offset argument of mmap() since the
> - * mapped address has to be page aligned, and we use huge
> - * pages. */
> - mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> - prot, MAP_SHARED | MAP_NORESERVE, vmsg->fds[i], 0);
> -
> - if (mmap_addr == MAP_FAILED) {
> - vu_panic(dev, "region mmap error: %s", strerror(errno));
> - } else {
> - dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
> - DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
> - dev_region->mmap_addr);
> - }
> -
> - if (dev->postcopy_listening) {
> - /*
> - * Return the address to QEMU so that it can translate the ufd
> - * fault addresses back.
> - */
> - msg_region->userspace_addr = dev_region->mmap_addr +
> - dev_region->mmap_offset;
> - }
> + for (i = 0; i < memory->nregions; i++) {
> + _vu_add_mem_reg(dev, &memory->regions[i], vmsg->fds[i]);
> close(vmsg->fds[i]);
> }
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 07/15] libvhost-user: No need to check for NULL when unmapping
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (5 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 06/15] libvhost-user: Factor out adding a memory region David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:45 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 08/15] libvhost-user: Don't zero out memory for memory regions David Hildenbrand
` (9 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
We never add a memory region if mmap() failed. Therefore, no need to check
for NULL.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index a2baefe84b..f99c888b48 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -247,11 +247,8 @@ vu_remove_all_mem_regs(VuDev *dev)
for (i = 0; i < dev->nregions; i++) {
VuDevRegion *r = &dev->regions[i];
- void *ma = (void *)(uintptr_t)r->mmap_addr;
- if (ma) {
- munmap(ma, r->size + r->mmap_offset);
- }
+ munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
}
dev->nregions = 0;
}
@@ -888,11 +885,8 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
for (i = 0; i < dev->nregions; i++) {
if (reg_equal(&dev->regions[i], msg_region)) {
VuDevRegion *r = &dev->regions[i];
- void *ma = (void *) (uintptr_t) r->mmap_addr;
- if (ma) {
- munmap(ma, r->size + r->mmap_offset);
- }
+ munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
/*
* Shift all affected entries by 1 to close the hole at index i and
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 07/15] libvhost-user: No need to check for NULL when unmapping
2024-02-02 21:53 ` [PATCH v1 07/15] libvhost-user: No need to check for NULL when unmapping David Hildenbrand
@ 2024-02-04 1:45 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:45 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand <david@redhat.com> wrote:
>
> We never add a memory region if mmap() failed. Therefore, no need to check
> for NULL.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index a2baefe84b..f99c888b48 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -247,11 +247,8 @@ vu_remove_all_mem_regs(VuDev *dev)
>
> for (i = 0; i < dev->nregions; i++) {
> VuDevRegion *r = &dev->regions[i];
> - void *ma = (void *)(uintptr_t)r->mmap_addr;
>
> - if (ma) {
> - munmap(ma, r->size + r->mmap_offset);
> - }
> + munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
> }
> dev->nregions = 0;
> }
> @@ -888,11 +885,8 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> for (i = 0; i < dev->nregions; i++) {
> if (reg_equal(&dev->regions[i], msg_region)) {
> VuDevRegion *r = &dev->regions[i];
> - void *ma = (void *) (uintptr_t) r->mmap_addr;
>
> - if (ma) {
> - munmap(ma, r->size + r->mmap_offset);
> - }
> + munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
>
> /*
> * Shift all affected entries by 1 to close the hole at index i and
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 08/15] libvhost-user: Don't zero out memory for memory regions
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (6 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 07/15] libvhost-user: No need to check for NULL when unmapping David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:46 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 09/15] libvhost-user: Don't search for duplicates when removing " David Hildenbrand
` (8 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
dev->nregions always covers only valid entries. Stop zeroing out other
array elements that are unused.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index f99c888b48..e1a1b9df88 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -888,13 +888,9 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
- /*
- * Shift all affected entries by 1 to close the hole at index i and
- * zero out the last entry.
- */
+ /* Shift all affected entries by 1 to close the hole at index. */
memmove(dev->regions + i, dev->regions + i + 1,
sizeof(VuDevRegion) * (dev->nregions - i - 1));
- memset(dev->regions + dev->nregions - 1, 0, sizeof(VuDevRegion));
DPRINT("Successfully removed a region\n");
dev->nregions--;
i--;
@@ -2119,7 +2115,6 @@ vu_init(VuDev *dev,
DPRINT("%s: failed to malloc mem regions\n", __func__);
return false;
}
- memset(dev->regions, 0, VHOST_USER_MAX_RAM_SLOTS * sizeof(dev->regions[0]));
dev->vq = malloc(max_queues * sizeof(dev->vq[0]));
if (!dev->vq) {
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 08/15] libvhost-user: Don't zero out memory for memory regions
2024-02-02 21:53 ` [PATCH v1 08/15] libvhost-user: Don't zero out memory for memory regions David Hildenbrand
@ 2024-02-04 1:46 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:46 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand <david@redhat.com> wrote:
>
> dev->nregions always covers only valid entries. Stop zeroing out other
> array elements that are unused.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index f99c888b48..e1a1b9df88 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -888,13 +888,9 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>
> munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
>
> - /*
> - * Shift all affected entries by 1 to close the hole at index i and
> - * zero out the last entry.
> - */
> + /* Shift all affected entries by 1 to close the hole at index. */
> memmove(dev->regions + i, dev->regions + i + 1,
> sizeof(VuDevRegion) * (dev->nregions - i - 1));
> - memset(dev->regions + dev->nregions - 1, 0, sizeof(VuDevRegion));
> DPRINT("Successfully removed a region\n");
> dev->nregions--;
> i--;
> @@ -2119,7 +2115,6 @@ vu_init(VuDev *dev,
> DPRINT("%s: failed to malloc mem regions\n", __func__);
> return false;
> }
> - memset(dev->regions, 0, VHOST_USER_MAX_RAM_SLOTS * sizeof(dev->regions[0]));
>
> dev->vq = malloc(max_queues * sizeof(dev->vq[0]));
> if (!dev->vq) {
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 09/15] libvhost-user: Don't search for duplicates when removing memory regions
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (7 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 08/15] libvhost-user: Don't zero out memory for memory regions David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:47 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 10/15] libvhost-user: Factor out search for memory region by GPA and simplify David Hildenbrand
` (7 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
We cannot have duplicate memory regions, something would be deeply
flawed elsewhere. Let's just stop the search once we found an entry.
We'll add more sanity checks when adding memory regions later.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index e1a1b9df88..22154b217f 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -896,8 +896,7 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
i--;
found = true;
-
- /* Continue the search for eventual duplicates. */
+ break;
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 09/15] libvhost-user: Don't search for duplicates when removing memory regions
2024-02-02 21:53 ` [PATCH v1 09/15] libvhost-user: Don't search for duplicates when removing " David Hildenbrand
@ 2024-02-04 1:47 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:47 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand <david@redhat.com> wrote:
>
> We cannot have duplicate memory regions, something would be deeply
> flawed elsewhere. Let's just stop the search once we found an entry.
>
> We'll add more sanity checks when adding memory regions later.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index e1a1b9df88..22154b217f 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -896,8 +896,7 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> i--;
>
> found = true;
> -
> - /* Continue the search for eventual duplicates. */
> + break;
> }
> }
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 10/15] libvhost-user: Factor out search for memory region by GPA and simplify
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (8 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 09/15] libvhost-user: Don't search for duplicates when removing " David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 1:47 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va() David Hildenbrand
` (6 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
Memory regions cannot overlap, and if we ever hit that case something
would be really flawed.
For example, when vhost code in QEMU decides to increase the size of memory
regions to cover full huge pages, it makes sure to never create overlaps,
and if there would be overlaps, it would bail out.
QEMU commits 48d7c9757749 ("vhost: Merge sections added to temporary
list"), c1ece84e7c93 ("vhost: Huge page align and merge") and
e7b94a84b6cb ("vhost: Allow adjoining regions") added and clarified that
handling and how overlaps are impossible.
Consequently, each GPA can belong to at most one memory region, and
everything else doesn't make sense. Let's factor out our search to prepare
for further changes.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 79 +++++++++++++----------
1 file changed, 45 insertions(+), 34 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 22154b217f..d036b54ed0 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -195,30 +195,47 @@ vu_panic(VuDev *dev, const char *msg, ...)
*/
}
+/* Search for a memory region that covers this guest physical address. */
+static VuDevRegion *
+vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
+{
+ unsigned int i;
+
+ /*
+ * Memory regions cannot overlap in guest physical address space. Each
+ * GPA belongs to exactly one memory region, so there can only be one
+ * match.
+ */
+ for (i = 0; i < dev->nregions; i++) {
+ VuDevRegion *cur = &dev->regions[i];
+
+ if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
+ return cur;
+ }
+ }
+ return NULL;
+}
+
/* Translate guest physical address to our virtual address. */
void *
vu_gpa_to_va(VuDev *dev, uint64_t *plen, uint64_t guest_addr)
{
- unsigned int i;
+ VuDevRegion *r;
if (*plen == 0) {
return NULL;
}
- /* Find matching memory region. */
- for (i = 0; i < dev->nregions; i++) {
- VuDevRegion *r = &dev->regions[i];
-
- if ((guest_addr >= r->gpa) && (guest_addr < (r->gpa + r->size))) {
- if ((guest_addr + *plen) > (r->gpa + r->size)) {
- *plen = r->gpa + r->size - guest_addr;
- }
- return (void *)(uintptr_t)
- guest_addr - r->gpa + r->mmap_addr + r->mmap_offset;
- }
+ r = vu_gpa_to_mem_region(dev, guest_addr);
+ if (!r) {
+ return NULL;
}
- return NULL;
+ if ((guest_addr + *plen) > (r->gpa + r->size)) {
+ *plen = r->gpa + r->size - guest_addr;
+ }
+ return (void *)(uintptr_t)guest_addr - r->gpa + r->mmap_addr +
+ r->mmap_offset;
}
/* Translate qemu virtual address to our virtual address. */
@@ -854,8 +871,8 @@ static inline bool reg_equal(VuDevRegion *vudev_reg,
static bool
vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
- unsigned int i;
- bool found = false;
+ unsigned int idx;
+ VuDevRegion *r;
if (vmsg->fd_num > 1) {
vmsg_close_fds(vmsg);
@@ -882,28 +899,22 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
DPRINT(" mmap_offset 0x%016"PRIx64"\n",
msg_region->mmap_offset);
- for (i = 0; i < dev->nregions; i++) {
- if (reg_equal(&dev->regions[i], msg_region)) {
- VuDevRegion *r = &dev->regions[i];
-
- munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
-
- /* Shift all affected entries by 1 to close the hole at index. */
- memmove(dev->regions + i, dev->regions + i + 1,
- sizeof(VuDevRegion) * (dev->nregions - i - 1));
- DPRINT("Successfully removed a region\n");
- dev->nregions--;
- i--;
-
- found = true;
- break;
- }
- }
-
- if (!found) {
+ r = vu_gpa_to_mem_region(dev, msg_region->guest_phys_addr);
+ if (!r || !reg_equal(r, msg_region)) {
+ vmsg_close_fds(vmsg);
vu_panic(dev, "Specified region not found\n");
+ return false;
}
+ munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
+
+ idx = r - dev->regions;
+ assert(idx < dev->nregions);
+ /* Shift all affected entries by 1 to close the hole. */
+ memmove(r, r + 1, sizeof(VuDevRegion) * (dev->nregions - idx - 1));
+ DPRINT("Successfully removed a region\n");
+ dev->nregions--;
+
vmsg_close_fds(vmsg);
return false;
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 10/15] libvhost-user: Factor out search for memory region by GPA and simplify
2024-02-02 21:53 ` [PATCH v1 10/15] libvhost-user: Factor out search for memory region by GPA and simplify David Hildenbrand
@ 2024-02-04 1:47 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 1:47 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:54 PM David Hildenbrand <david@redhat.com> wrote:
>
> Memory regions cannot overlap, and if we ever hit that case something
> would be really flawed.
>
> For example, when vhost code in QEMU decides to increase the size of memory
> regions to cover full huge pages, it makes sure to never create overlaps,
> and if there would be overlaps, it would bail out.
>
> QEMU commits 48d7c9757749 ("vhost: Merge sections added to temporary
> list"), c1ece84e7c93 ("vhost: Huge page align and merge") and
> e7b94a84b6cb ("vhost: Allow adjoining regions") added and clarified that
> handling and how overlaps are impossible.
>
> Consequently, each GPA can belong to at most one memory region, and
> everything else doesn't make sense. Let's factor out our search to prepare
> for further changes.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 79 +++++++++++++----------
> 1 file changed, 45 insertions(+), 34 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index 22154b217f..d036b54ed0 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -195,30 +195,47 @@ vu_panic(VuDev *dev, const char *msg, ...)
> */
> }
>
> +/* Search for a memory region that covers this guest physical address. */
> +static VuDevRegion *
> +vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
> +{
> + unsigned int i;
> +
> + /*
> + * Memory regions cannot overlap in guest physical address space. Each
> + * GPA belongs to exactly one memory region, so there can only be one
> + * match.
> + */
> + for (i = 0; i < dev->nregions; i++) {
> + VuDevRegion *cur = &dev->regions[i];
> +
> + if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
> + return cur;
> + }
> + }
> + return NULL;
> +}
> +
> /* Translate guest physical address to our virtual address. */
> void *
> vu_gpa_to_va(VuDev *dev, uint64_t *plen, uint64_t guest_addr)
> {
> - unsigned int i;
> + VuDevRegion *r;
>
> if (*plen == 0) {
> return NULL;
> }
>
> - /* Find matching memory region. */
> - for (i = 0; i < dev->nregions; i++) {
> - VuDevRegion *r = &dev->regions[i];
> -
> - if ((guest_addr >= r->gpa) && (guest_addr < (r->gpa + r->size))) {
> - if ((guest_addr + *plen) > (r->gpa + r->size)) {
> - *plen = r->gpa + r->size - guest_addr;
> - }
> - return (void *)(uintptr_t)
> - guest_addr - r->gpa + r->mmap_addr + r->mmap_offset;
> - }
> + r = vu_gpa_to_mem_region(dev, guest_addr);
> + if (!r) {
> + return NULL;
> }
>
> - return NULL;
> + if ((guest_addr + *plen) > (r->gpa + r->size)) {
> + *plen = r->gpa + r->size - guest_addr;
> + }
> + return (void *)(uintptr_t)guest_addr - r->gpa + r->mmap_addr +
> + r->mmap_offset;
> }
>
> /* Translate qemu virtual address to our virtual address. */
> @@ -854,8 +871,8 @@ static inline bool reg_equal(VuDevRegion *vudev_reg,
> static bool
> vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
> - unsigned int i;
> - bool found = false;
> + unsigned int idx;
> + VuDevRegion *r;
>
> if (vmsg->fd_num > 1) {
> vmsg_close_fds(vmsg);
> @@ -882,28 +899,22 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> DPRINT(" mmap_offset 0x%016"PRIx64"\n",
> msg_region->mmap_offset);
>
> - for (i = 0; i < dev->nregions; i++) {
> - if (reg_equal(&dev->regions[i], msg_region)) {
> - VuDevRegion *r = &dev->regions[i];
> -
> - munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
> -
> - /* Shift all affected entries by 1 to close the hole at index. */
> - memmove(dev->regions + i, dev->regions + i + 1,
> - sizeof(VuDevRegion) * (dev->nregions - i - 1));
> - DPRINT("Successfully removed a region\n");
> - dev->nregions--;
> - i--;
> -
> - found = true;
> - break;
> - }
> - }
> -
> - if (!found) {
> + r = vu_gpa_to_mem_region(dev, msg_region->guest_phys_addr);
> + if (!r || !reg_equal(r, msg_region)) {
> + vmsg_close_fds(vmsg);
> vu_panic(dev, "Specified region not found\n");
> + return false;
> }
>
> + munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
> +
> + idx = r - dev->regions;
> + assert(idx < dev->nregions);
> + /* Shift all affected entries by 1 to close the hole. */
> + memmove(r, r + 1, sizeof(VuDevRegion) * (dev->nregions - idx - 1));
> + DPRINT("Successfully removed a region\n");
> + dev->nregions--;
> +
> vmsg_close_fds(vmsg);
>
> return false;
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (9 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 10/15] libvhost-user: Factor out search for memory region by GPA and simplify David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 2:10 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 12/15] libvhost-user: Use most of mmap_offset as fd_offset David Hildenbrand
` (5 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
Let's speed up GPA to memory region / virtual address lookup. Store the
memory regions ordered by guest physical addresses, and use binary
search for address translation, as well as when adding/removing memory
regions.
Most importantly, this will speed up GPA->VA address translation when we
have many memslots.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 49 +++++++++++++++++++++--
1 file changed, 45 insertions(+), 4 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index d036b54ed0..75e47b7bb3 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -199,19 +199,30 @@ vu_panic(VuDev *dev, const char *msg, ...)
static VuDevRegion *
vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
{
- unsigned int i;
+ int low = 0;
+ int high = dev->nregions - 1;
/*
* Memory regions cannot overlap in guest physical address space. Each
* GPA belongs to exactly one memory region, so there can only be one
* match.
+ *
+ * We store our memory regions ordered by GPA and can simply perform a
+ * binary search.
*/
- for (i = 0; i < dev->nregions; i++) {
- VuDevRegion *cur = &dev->regions[i];
+ while (low <= high) {
+ unsigned int mid = low + (high - low) / 2;
+ VuDevRegion *cur = &dev->regions[mid];
if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
return cur;
}
+ if (guest_addr >= cur->gpa + cur->size) {
+ low = mid + 1;
+ }
+ if (guest_addr < cur->gpa) {
+ high = mid - 1;
+ }
}
return NULL;
}
@@ -273,9 +284,14 @@ vu_remove_all_mem_regs(VuDev *dev)
static void
_vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
{
+ const uint64_t start_gpa = msg_region->guest_phys_addr;
+ const uint64_t end_gpa = start_gpa + msg_region->memory_size;
int prot = PROT_READ | PROT_WRITE;
VuDevRegion *r;
void *mmap_addr;
+ int low = 0;
+ int high = dev->nregions - 1;
+ unsigned int idx;
DPRINT("Adding region %d\n", dev->nregions);
DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
@@ -295,6 +311,29 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
prot = PROT_NONE;
}
+ /*
+ * We will add memory regions into the array sorted by GPA. Perform a
+ * binary search to locate the insertion point: it will be at the low
+ * index.
+ */
+ while (low <= high) {
+ unsigned int mid = low + (high - low) / 2;
+ VuDevRegion *cur = &dev->regions[mid];
+
+ /* Overlap of GPA addresses. */
+ if (start_gpa < cur->gpa + cur->size && cur->gpa < end_gpa) {
+ vu_panic(dev, "regions with overlapping guest physical addresses");
+ return;
+ }
+ if (start_gpa >= cur->gpa + cur->size) {
+ low = mid + 1;
+ }
+ if (start_gpa < cur->gpa) {
+ high = mid - 1;
+ }
+ }
+ idx = low;
+
/*
* We don't use offset argument of mmap() since the mapped address has
* to be page aligned, and we use huge pages.
@@ -308,7 +347,9 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
(uint64_t)(uintptr_t)mmap_addr);
- r = &dev->regions[dev->nregions];
+ /* Shift all affected entries by 1 to open a hole at idx. */
+ r = &dev->regions[idx];
+ memmove(r + 1, r, sizeof(VuDevRegion) * (dev->nregions - idx));
r->gpa = msg_region->guest_phys_addr;
r->size = msg_region->memory_size;
r->qva = msg_region->userspace_addr;
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()
2024-02-02 21:53 ` [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va() David Hildenbrand
@ 2024-02-04 2:10 ` Raphael Norwitz
2024-02-04 14:51 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 2:10 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
One comment on this one.
On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand <david@redhat.com> wrote:
>
> Let's speed up GPA to memory region / virtual address lookup. Store the
> memory regions ordered by guest physical addresses, and use binary
> search for address translation, as well as when adding/removing memory
> regions.
>
> Most importantly, this will speed up GPA->VA address translation when we
> have many memslots.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> subprojects/libvhost-user/libvhost-user.c | 49 +++++++++++++++++++++--
> 1 file changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index d036b54ed0..75e47b7bb3 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -199,19 +199,30 @@ vu_panic(VuDev *dev, const char *msg, ...)
> static VuDevRegion *
> vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
> {
> - unsigned int i;
> + int low = 0;
> + int high = dev->nregions - 1;
>
> /*
> * Memory regions cannot overlap in guest physical address space. Each
> * GPA belongs to exactly one memory region, so there can only be one
> * match.
> + *
> + * We store our memory regions ordered by GPA and can simply perform a
> + * binary search.
> */
> - for (i = 0; i < dev->nregions; i++) {
> - VuDevRegion *cur = &dev->regions[i];
> + while (low <= high) {
> + unsigned int mid = low + (high - low) / 2;
> + VuDevRegion *cur = &dev->regions[mid];
>
> if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
> return cur;
> }
> + if (guest_addr >= cur->gpa + cur->size) {
> + low = mid + 1;
> + }
> + if (guest_addr < cur->gpa) {
> + high = mid - 1;
> + }
> }
> return NULL;
> }
> @@ -273,9 +284,14 @@ vu_remove_all_mem_regs(VuDev *dev)
> static void
> _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> {
> + const uint64_t start_gpa = msg_region->guest_phys_addr;
> + const uint64_t end_gpa = start_gpa + msg_region->memory_size;
> int prot = PROT_READ | PROT_WRITE;
> VuDevRegion *r;
> void *mmap_addr;
> + int low = 0;
> + int high = dev->nregions - 1;
> + unsigned int idx;
>
> DPRINT("Adding region %d\n", dev->nregions);
> DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
> @@ -295,6 +311,29 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> prot = PROT_NONE;
> }
>
> + /*
> + * We will add memory regions into the array sorted by GPA. Perform a
> + * binary search to locate the insertion point: it will be at the low
> + * index.
> + */
> + while (low <= high) {
> + unsigned int mid = low + (high - low) / 2;
> + VuDevRegion *cur = &dev->regions[mid];
> +
> + /* Overlap of GPA addresses. */
Looks like this check will only catch if the new region is fully
contained within an existing region. I think we need to check whether
either start or end region are in the range, i.e.:
if ((start_gpa > curr_gpa && start_gpa < cur->gpa + curr_size ) ||
(end_gpa > currr->gpa && end_gpa < cur->gpa + curr->size) )
> + if (start_gpa < cur->gpa + cur->size && cur->gpa < end_gpa) {
> + vu_panic(dev, "regions with overlapping guest physical addresses");
> + return;
> + }
> + if (start_gpa >= cur->gpa + cur->size) {
> + low = mid + 1;
> + }
> + if (start_gpa < cur->gpa) {
> + high = mid - 1;
> + }
> + }
> + idx = low;
> +
> /*
> * We don't use offset argument of mmap() since the mapped address has
> * to be page aligned, and we use huge pages.
> @@ -308,7 +347,9 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
> (uint64_t)(uintptr_t)mmap_addr);
>
> - r = &dev->regions[dev->nregions];
> + /* Shift all affected entries by 1 to open a hole at idx. */
> + r = &dev->regions[idx];
> + memmove(r + 1, r, sizeof(VuDevRegion) * (dev->nregions - idx));
> r->gpa = msg_region->guest_phys_addr;
> r->size = msg_region->memory_size;
> r->qva = msg_region->userspace_addr;
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()
2024-02-04 2:10 ` Raphael Norwitz
@ 2024-02-04 14:51 ` David Hildenbrand
2024-02-04 22:07 ` Raphael Norwitz
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-04 14:51 UTC (permalink / raw)
To: Raphael Norwitz
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On 04.02.24 03:10, Raphael Norwitz wrote:
> One comment on this one.
>
> On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> Let's speed up GPA to memory region / virtual address lookup. Store the
>> memory regions ordered by guest physical addresses, and use binary
>> search for address translation, as well as when adding/removing memory
>> regions.
>>
>> Most importantly, this will speed up GPA->VA address translation when we
>> have many memslots.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> subprojects/libvhost-user/libvhost-user.c | 49 +++++++++++++++++++++--
>> 1 file changed, 45 insertions(+), 4 deletions(-)
>>
>> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
>> index d036b54ed0..75e47b7bb3 100644
>> --- a/subprojects/libvhost-user/libvhost-user.c
>> +++ b/subprojects/libvhost-user/libvhost-user.c
>> @@ -199,19 +199,30 @@ vu_panic(VuDev *dev, const char *msg, ...)
>> static VuDevRegion *
>> vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
>> {
>> - unsigned int i;
>> + int low = 0;
>> + int high = dev->nregions - 1;
>>
>> /*
>> * Memory regions cannot overlap in guest physical address space. Each
>> * GPA belongs to exactly one memory region, so there can only be one
>> * match.
>> + *
>> + * We store our memory regions ordered by GPA and can simply perform a
>> + * binary search.
>> */
>> - for (i = 0; i < dev->nregions; i++) {
>> - VuDevRegion *cur = &dev->regions[i];
>> + while (low <= high) {
>> + unsigned int mid = low + (high - low) / 2;
>> + VuDevRegion *cur = &dev->regions[mid];
>>
>> if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
>> return cur;
>> }
>> + if (guest_addr >= cur->gpa + cur->size) {
>> + low = mid + 1;
>> + }
>> + if (guest_addr < cur->gpa) {
>> + high = mid - 1;
>> + }
>> }
>> return NULL;
>> }
>> @@ -273,9 +284,14 @@ vu_remove_all_mem_regs(VuDev *dev)
>> static void
>> _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
>> {
>> + const uint64_t start_gpa = msg_region->guest_phys_addr;
>> + const uint64_t end_gpa = start_gpa + msg_region->memory_size;
>> int prot = PROT_READ | PROT_WRITE;
>> VuDevRegion *r;
>> void *mmap_addr;
>> + int low = 0;
>> + int high = dev->nregions - 1;
>> + unsigned int idx;
>>
>> DPRINT("Adding region %d\n", dev->nregions);
>> DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
>> @@ -295,6 +311,29 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
>> prot = PROT_NONE;
>> }
>>
>> + /*
>> + * We will add memory regions into the array sorted by GPA. Perform a
>> + * binary search to locate the insertion point: it will be at the low
>> + * index.
>> + */
>> + while (low <= high) {
>> + unsigned int mid = low + (high - low) / 2;
>> + VuDevRegion *cur = &dev->regions[mid];
>> +
>> + /* Overlap of GPA addresses. */
>
> Looks like this check will only catch if the new region is fully
> contained within an existing region. I think we need to check whether
> either start or end region are in the range, i.e.:
That check should cover all cases of overlaps, not just fully contained.
See the QEMU implementation of range_overlaps_rang() that contains a
similar logic:
return !(range2->upb < range1->lob || range1->upb < range2->lob);
!(range2->upb < range1->lob || range1->upb < range2->lob);
= !(range2->upb < range1->lob) && !(range1->upb < range2->lob)
= range2->upb >= range1->lob && range1->upb >= range2->lob
= range1->lob <= range2->upb && range2->lob <= range1->upb
In QEMU, upb is inclusive, if it were exclusive (like we have here):
= range1->lob < range2->upb && range2->lob < range1->upb
Which is what we have here with:
range1->lob = start_gpa
range1->upb = end_gpa
range2->lob = cur->gpa
range2->upb = cur->gpa + cur->size
Also if you are interested, see
https://stackoverflow.com/questions/3269434/whats-the-most-efficient-way-to-test-if-two-ranges-overlap
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()
2024-02-04 14:51 ` David Hildenbrand
@ 2024-02-04 22:07 ` Raphael Norwitz
2024-02-05 7:32 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 22:07 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Sun, Feb 4, 2024 at 9:51 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 04.02.24 03:10, Raphael Norwitz wrote:
> > One comment on this one.
> >
> > On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> Let's speed up GPA to memory region / virtual address lookup. Store the
> >> memory regions ordered by guest physical addresses, and use binary
> >> search for address translation, as well as when adding/removing memory
> >> regions.
> >>
> >> Most importantly, this will speed up GPA->VA address translation when we
> >> have many memslots.
> >>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >> subprojects/libvhost-user/libvhost-user.c | 49 +++++++++++++++++++++--
> >> 1 file changed, 45 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> >> index d036b54ed0..75e47b7bb3 100644
> >> --- a/subprojects/libvhost-user/libvhost-user.c
> >> +++ b/subprojects/libvhost-user/libvhost-user.c
> >> @@ -199,19 +199,30 @@ vu_panic(VuDev *dev, const char *msg, ...)
> >> static VuDevRegion *
> >> vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
> >> {
> >> - unsigned int i;
> >> + int low = 0;
> >> + int high = dev->nregions - 1;
> >>
> >> /*
> >> * Memory regions cannot overlap in guest physical address space. Each
> >> * GPA belongs to exactly one memory region, so there can only be one
> >> * match.
> >> + *
> >> + * We store our memory regions ordered by GPA and can simply perform a
> >> + * binary search.
> >> */
> >> - for (i = 0; i < dev->nregions; i++) {
> >> - VuDevRegion *cur = &dev->regions[i];
> >> + while (low <= high) {
> >> + unsigned int mid = low + (high - low) / 2;
> >> + VuDevRegion *cur = &dev->regions[mid];
> >>
> >> if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
> >> return cur;
> >> }
> >> + if (guest_addr >= cur->gpa + cur->size) {
> >> + low = mid + 1;
> >> + }
> >> + if (guest_addr < cur->gpa) {
> >> + high = mid - 1;
> >> + }
> >> }
> >> return NULL;
> >> }
> >> @@ -273,9 +284,14 @@ vu_remove_all_mem_regs(VuDev *dev)
> >> static void
> >> _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> >> {
> >> + const uint64_t start_gpa = msg_region->guest_phys_addr;
> >> + const uint64_t end_gpa = start_gpa + msg_region->memory_size;
> >> int prot = PROT_READ | PROT_WRITE;
> >> VuDevRegion *r;
> >> void *mmap_addr;
> >> + int low = 0;
> >> + int high = dev->nregions - 1;
> >> + unsigned int idx;
> >>
> >> DPRINT("Adding region %d\n", dev->nregions);
> >> DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
> >> @@ -295,6 +311,29 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> >> prot = PROT_NONE;
> >> }
> >>
> >> + /*
> >> + * We will add memory regions into the array sorted by GPA. Perform a
> >> + * binary search to locate the insertion point: it will be at the low
> >> + * index.
> >> + */
> >> + while (low <= high) {
> >> + unsigned int mid = low + (high - low) / 2;
> >> + VuDevRegion *cur = &dev->regions[mid];
> >> +
> >> + /* Overlap of GPA addresses. */
> >
> > Looks like this check will only catch if the new region is fully
> > contained within an existing region. I think we need to check whether
> > either start or end region are in the range, i.e.:
>
> That check should cover all cases of overlaps, not just fully contained.
>
> See the QEMU implementation of range_overlaps_rang() that contains a
> similar logic:
>
> return !(range2->upb < range1->lob || range1->upb < range2->lob);
>
> !(range2->upb < range1->lob || range1->upb < range2->lob);
> = !(range2->upb < range1->lob) && !(range1->upb < range2->lob)
> = range2->upb >= range1->lob && range1->upb >= range2->lob
> = range1->lob <= range2->upb && range2->lob <= range1->upb
>
> In QEMU, upb is inclusive, if it were exclusive (like we have here):
>
> = range1->lob < range2->upb && range2->lob < range1->upb
>
> Which is what we have here with:
>
> range1->lob = start_gpa
> range1->upb = end_gpa
> range2->lob = cur->gpa
> range2->upb = cur->gpa + cur->size
>
> Also if you are interested, see
>
> https://stackoverflow.com/questions/3269434/whats-the-most-efficient-way-to-test-if-two-ranges-overlap
>
> Thanks!
Got it, thanks for the full explanation. With that:
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()
2024-02-04 22:07 ` Raphael Norwitz
@ 2024-02-05 7:32 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-02-05 7:32 UTC (permalink / raw)
To: Raphael Norwitz
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On 04.02.24 23:07, Raphael Norwitz wrote:
> On Sun, Feb 4, 2024 at 9:51 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 04.02.24 03:10, Raphael Norwitz wrote:
>>> One comment on this one.
>>>
>>> On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> Let's speed up GPA to memory region / virtual address lookup. Store the
>>>> memory regions ordered by guest physical addresses, and use binary
>>>> search for address translation, as well as when adding/removing memory
>>>> regions.
>>>>
>>>> Most importantly, this will speed up GPA->VA address translation when we
>>>> have many memslots.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> subprojects/libvhost-user/libvhost-user.c | 49 +++++++++++++++++++++--
>>>> 1 file changed, 45 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
>>>> index d036b54ed0..75e47b7bb3 100644
>>>> --- a/subprojects/libvhost-user/libvhost-user.c
>>>> +++ b/subprojects/libvhost-user/libvhost-user.c
>>>> @@ -199,19 +199,30 @@ vu_panic(VuDev *dev, const char *msg, ...)
>>>> static VuDevRegion *
>>>> vu_gpa_to_mem_region(VuDev *dev, uint64_t guest_addr)
>>>> {
>>>> - unsigned int i;
>>>> + int low = 0;
>>>> + int high = dev->nregions - 1;
>>>>
>>>> /*
>>>> * Memory regions cannot overlap in guest physical address space. Each
>>>> * GPA belongs to exactly one memory region, so there can only be one
>>>> * match.
>>>> + *
>>>> + * We store our memory regions ordered by GPA and can simply perform a
>>>> + * binary search.
>>>> */
>>>> - for (i = 0; i < dev->nregions; i++) {
>>>> - VuDevRegion *cur = &dev->regions[i];
>>>> + while (low <= high) {
>>>> + unsigned int mid = low + (high - low) / 2;
>>>> + VuDevRegion *cur = &dev->regions[mid];
>>>>
>>>> if (guest_addr >= cur->gpa && guest_addr < cur->gpa + cur->size) {
>>>> return cur;
>>>> }
>>>> + if (guest_addr >= cur->gpa + cur->size) {
>>>> + low = mid + 1;
>>>> + }
>>>> + if (guest_addr < cur->gpa) {
>>>> + high = mid - 1;
>>>> + }
>>>> }
>>>> return NULL;
>>>> }
>>>> @@ -273,9 +284,14 @@ vu_remove_all_mem_regs(VuDev *dev)
>>>> static void
>>>> _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
>>>> {
>>>> + const uint64_t start_gpa = msg_region->guest_phys_addr;
>>>> + const uint64_t end_gpa = start_gpa + msg_region->memory_size;
>>>> int prot = PROT_READ | PROT_WRITE;
>>>> VuDevRegion *r;
>>>> void *mmap_addr;
>>>> + int low = 0;
>>>> + int high = dev->nregions - 1;
>>>> + unsigned int idx;
>>>>
>>>> DPRINT("Adding region %d\n", dev->nregions);
>>>> DPRINT(" guest_phys_addr: 0x%016"PRIx64"\n",
>>>> @@ -295,6 +311,29 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
>>>> prot = PROT_NONE;
>>>> }
>>>>
>>>> + /*
>>>> + * We will add memory regions into the array sorted by GPA. Perform a
>>>> + * binary search to locate the insertion point: it will be at the low
>>>> + * index.
>>>> + */
>>>> + while (low <= high) {
>>>> + unsigned int mid = low + (high - low) / 2;
>>>> + VuDevRegion *cur = &dev->regions[mid];
>>>> +
>>>> + /* Overlap of GPA addresses. */
>>>
>>> Looks like this check will only catch if the new region is fully
>>> contained within an existing region. I think we need to check whether
>>> either start or end region are in the range, i.e.:
>>
>> That check should cover all cases of overlaps, not just fully contained.
>>
>> See the QEMU implementation of range_overlaps_rang() that contains a
>> similar logic:
>>
>> return !(range2->upb < range1->lob || range1->upb < range2->lob);
>>
>> !(range2->upb < range1->lob || range1->upb < range2->lob);
>> = !(range2->upb < range1->lob) && !(range1->upb < range2->lob)
>> = range2->upb >= range1->lob && range1->upb >= range2->lob
>> = range1->lob <= range2->upb && range2->lob <= range1->upb
>>
>> In QEMU, upb is inclusive, if it were exclusive (like we have here):
>>
>> = range1->lob < range2->upb && range2->lob < range1->upb
>>
>> Which is what we have here with:
>>
>> range1->lob = start_gpa
>> range1->upb = end_gpa
>> range2->lob = cur->gpa
>> range2->upb = cur->gpa + cur->size
>>
>> Also if you are interested, see
>>
>> https://stackoverflow.com/questions/3269434/whats-the-most-efficient-way-to-test-if-two-ranges-overlap
>>
>> Thanks!
>
> Got it, thanks for the full explanation. With that:
>
> Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 12/15] libvhost-user: Use most of mmap_offset as fd_offset
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (10 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 11/15] libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va() David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 2:11 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 13/15] libvhost-user: Factor out vq usability check David Hildenbrand
` (4 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
In the past, QEMU would create memory regions that could partially cover
hugetlb pages, making mmap() fail if we would use the mmap_offset as an
fd_offset. For that reason, we never used the mmap_offset as an offset into
the fd and instead always mapped the fd from the very start.
However, that can easily result in us mmap'ing a lot of unnecessary
parts of an fd, possibly repeatedly.
QEMU nowadays does not create memory regions that partially cover huge
pages -- it never really worked with postcopy. QEMU handles merging of
regions that partially cover huge pages (due to holes in boot memory) since
2018 in c1ece84e7c93 ("vhost: Huge page align and merge").
Let's be a bit careful and not unconditionally convert the
mmap_offset into an fd_offset. Instead, let's simply detect the hugetlb
size and pass as much as we can as fd_offset, making sure that we call
mmap() with a properly aligned offset.
With QEMU and a virtio-mem device that is fully plugged (50GiB using 50
memslots) the qemu-storage daemon process consumes in the VA space
1281GiB before this change and 58GiB after this change.
Example debug output:
================ Vhost user message ================
Request: VHOST_USER_ADD_MEM_REG (37)
Flags: 0x9
Size: 40
Fds: 59
Adding region 50
guest_phys_addr: 0x0000000d80000000
memory_size: 0x0000000040000000
userspace_addr 0x00007f54ebffe000
mmap_offset 0x0000000c00000000
fd_offset: 0x0000000c00000000
new mmap_offset: 0x0000000000000000
mmap_addr: 0x00007f7ecc000000
Successfully added new region
================ Vhost user message ================
Request: VHOST_USER_ADD_MEM_REG (37)
Flags: 0x9
Size: 40
Fds: 59
Adding region 51
guest_phys_addr: 0x0000000dc0000000
memory_size: 0x0000000040000000
userspace_addr 0x00007f552bffe000
mmap_offset 0x0000000c40000000
fd_offset: 0x0000000c40000000
new mmap_offset: 0x0000000000000000
mmap_addr: 0x00007f7e8c000000
Successfully added new region
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 50 ++++++++++++++++++++---
1 file changed, 45 insertions(+), 5 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 75e47b7bb3..7d8293dc84 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -43,6 +43,8 @@
#include <fcntl.h>
#include <sys/ioctl.h>
#include <linux/vhost.h>
+#include <sys/vfs.h>
+#include <linux/magic.h>
#ifdef __NR_userfaultfd
#include <linux/userfaultfd.h>
@@ -281,12 +283,36 @@ vu_remove_all_mem_regs(VuDev *dev)
dev->nregions = 0;
}
+static size_t
+get_fd_pagesize(int fd)
+{
+ static size_t pagesize;
+#if defined(__linux__)
+ struct statfs fs;
+ int ret;
+
+ do {
+ ret = fstatfs(fd, &fs);
+ } while (ret != 0 && errno == EINTR);
+
+ if (!ret && fs.f_type == HUGETLBFS_MAGIC) {
+ return fs.f_bsize;
+ }
+#endif
+
+ if (!pagesize) {
+ pagesize = getpagesize();
+ }
+ return pagesize;
+}
+
static void
_vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
{
const uint64_t start_gpa = msg_region->guest_phys_addr;
const uint64_t end_gpa = start_gpa + msg_region->memory_size;
int prot = PROT_READ | PROT_WRITE;
+ uint64_t mmap_offset, fd_offset;
VuDevRegion *r;
void *mmap_addr;
int low = 0;
@@ -335,11 +361,25 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
idx = low;
/*
- * We don't use offset argument of mmap() since the mapped address has
- * to be page aligned, and we use huge pages.
+ * Convert most of msg_region->mmap_offset to fd_offset. In almost all
+ * cases, this will leave us with mmap_offset == 0, mmap()'ing only
+ * what we really need. Only if a memory region would partially cover
+ * hugetlb pages, we'd get mmap_offset != 0, which usually doesn't happen
+ * anymore (i.e., modern QEMU).
+ *
+ * Note that mmap() with hugetlb would fail if the offset into the file
+ * is not aligned to the huge page size.
*/
- mmap_addr = mmap(0, msg_region->memory_size + msg_region->mmap_offset,
- prot, MAP_SHARED | MAP_NORESERVE, fd, 0);
+ fd_offset = ALIGN_DOWN(msg_region->mmap_offset, get_fd_pagesize(fd));
+ mmap_offset = msg_region->mmap_offset - fd_offset;
+
+ DPRINT(" fd_offset: 0x%016"PRIx64"\n",
+ fd_offset);
+ DPRINT(" adj mmap_offset: 0x%016"PRIx64"\n",
+ mmap_offset);
+
+ mmap_addr = mmap(0, msg_region->memory_size + mmap_offset,
+ prot, MAP_SHARED | MAP_NORESERVE, fd, fd_offset);
if (mmap_addr == MAP_FAILED) {
vu_panic(dev, "region mmap error: %s", strerror(errno));
return;
@@ -354,7 +394,7 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
r->size = msg_region->memory_size;
r->qva = msg_region->userspace_addr;
r->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
- r->mmap_offset = msg_region->mmap_offset;
+ r->mmap_offset = mmap_offset;
dev->nregions++;
if (dev->postcopy_listening) {
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 12/15] libvhost-user: Use most of mmap_offset as fd_offset
2024-02-02 21:53 ` [PATCH v1 12/15] libvhost-user: Use most of mmap_offset as fd_offset David Hildenbrand
@ 2024-02-04 2:11 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 2:11 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand <david@redhat.com> wrote:
>
> In the past, QEMU would create memory regions that could partially cover
> hugetlb pages, making mmap() fail if we would use the mmap_offset as an
> fd_offset. For that reason, we never used the mmap_offset as an offset into
> the fd and instead always mapped the fd from the very start.
>
> However, that can easily result in us mmap'ing a lot of unnecessary
> parts of an fd, possibly repeatedly.
>
> QEMU nowadays does not create memory regions that partially cover huge
> pages -- it never really worked with postcopy. QEMU handles merging of
> regions that partially cover huge pages (due to holes in boot memory) since
> 2018 in c1ece84e7c93 ("vhost: Huge page align and merge").
>
> Let's be a bit careful and not unconditionally convert the
> mmap_offset into an fd_offset. Instead, let's simply detect the hugetlb
> size and pass as much as we can as fd_offset, making sure that we call
> mmap() with a properly aligned offset.
>
> With QEMU and a virtio-mem device that is fully plugged (50GiB using 50
> memslots) the qemu-storage daemon process consumes in the VA space
> 1281GiB before this change and 58GiB after this change.
>
> Example debug output:
> ================ Vhost user message ================
> Request: VHOST_USER_ADD_MEM_REG (37)
> Flags: 0x9
> Size: 40
> Fds: 59
> Adding region 50
> guest_phys_addr: 0x0000000d80000000
> memory_size: 0x0000000040000000
> userspace_addr 0x00007f54ebffe000
> mmap_offset 0x0000000c00000000
> fd_offset: 0x0000000c00000000
> new mmap_offset: 0x0000000000000000
> mmap_addr: 0x00007f7ecc000000
> Successfully added new region
> ================ Vhost user message ================
> Request: VHOST_USER_ADD_MEM_REG (37)
> Flags: 0x9
> Size: 40
> Fds: 59
> Adding region 51
> guest_phys_addr: 0x0000000dc0000000
> memory_size: 0x0000000040000000
> userspace_addr 0x00007f552bffe000
> mmap_offset 0x0000000c40000000
> fd_offset: 0x0000000c40000000
> new mmap_offset: 0x0000000000000000
> mmap_addr: 0x00007f7e8c000000
> Successfully added new region
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 50 ++++++++++++++++++++---
> 1 file changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index 75e47b7bb3..7d8293dc84 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -43,6 +43,8 @@
> #include <fcntl.h>
> #include <sys/ioctl.h>
> #include <linux/vhost.h>
> +#include <sys/vfs.h>
> +#include <linux/magic.h>
>
> #ifdef __NR_userfaultfd
> #include <linux/userfaultfd.h>
> @@ -281,12 +283,36 @@ vu_remove_all_mem_regs(VuDev *dev)
> dev->nregions = 0;
> }
>
> +static size_t
> +get_fd_pagesize(int fd)
> +{
> + static size_t pagesize;
> +#if defined(__linux__)
> + struct statfs fs;
> + int ret;
> +
> + do {
> + ret = fstatfs(fd, &fs);
> + } while (ret != 0 && errno == EINTR);
> +
> + if (!ret && fs.f_type == HUGETLBFS_MAGIC) {
> + return fs.f_bsize;
> + }
> +#endif
> +
> + if (!pagesize) {
> + pagesize = getpagesize();
> + }
> + return pagesize;
> +}
> +
> static void
> _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> {
> const uint64_t start_gpa = msg_region->guest_phys_addr;
> const uint64_t end_gpa = start_gpa + msg_region->memory_size;
> int prot = PROT_READ | PROT_WRITE;
> + uint64_t mmap_offset, fd_offset;
> VuDevRegion *r;
> void *mmap_addr;
> int low = 0;
> @@ -335,11 +361,25 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> idx = low;
>
> /*
> - * We don't use offset argument of mmap() since the mapped address has
> - * to be page aligned, and we use huge pages.
> + * Convert most of msg_region->mmap_offset to fd_offset. In almost all
> + * cases, this will leave us with mmap_offset == 0, mmap()'ing only
> + * what we really need. Only if a memory region would partially cover
> + * hugetlb pages, we'd get mmap_offset != 0, which usually doesn't happen
> + * anymore (i.e., modern QEMU).
> + *
> + * Note that mmap() with hugetlb would fail if the offset into the file
> + * is not aligned to the huge page size.
> */
> - mmap_addr = mmap(0, msg_region->memory_size + msg_region->mmap_offset,
> - prot, MAP_SHARED | MAP_NORESERVE, fd, 0);
> + fd_offset = ALIGN_DOWN(msg_region->mmap_offset, get_fd_pagesize(fd));
> + mmap_offset = msg_region->mmap_offset - fd_offset;
> +
> + DPRINT(" fd_offset: 0x%016"PRIx64"\n",
> + fd_offset);
> + DPRINT(" adj mmap_offset: 0x%016"PRIx64"\n",
> + mmap_offset);
> +
> + mmap_addr = mmap(0, msg_region->memory_size + mmap_offset,
> + prot, MAP_SHARED | MAP_NORESERVE, fd, fd_offset);
> if (mmap_addr == MAP_FAILED) {
> vu_panic(dev, "region mmap error: %s", strerror(errno));
> return;
> @@ -354,7 +394,7 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> r->size = msg_region->memory_size;
> r->qva = msg_region->userspace_addr;
> r->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
> - r->mmap_offset = msg_region->mmap_offset;
> + r->mmap_offset = mmap_offset;
> dev->nregions++;
>
> if (dev->postcopy_listening) {
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 13/15] libvhost-user: Factor out vq usability check
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (11 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 12/15] libvhost-user: Use most of mmap_offset as fd_offset David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 2:11 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 14/15] libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions David Hildenbrand
` (3 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
Let's factor it out to prepare for further changes.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 24 +++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 7d8293dc84..febeb2eb89 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -283,6 +283,12 @@ vu_remove_all_mem_regs(VuDev *dev)
dev->nregions = 0;
}
+static bool
+vu_is_vq_usable(VuDev *dev, VuVirtq *vq)
+{
+ return likely(!dev->broken) && likely(vq->vring.avail);
+}
+
static size_t
get_fd_pagesize(int fd)
{
@@ -2378,8 +2384,7 @@ vu_queue_get_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int *in_bytes,
idx = vq->last_avail_idx;
total_bufs = in_total = out_total = 0;
- if (unlikely(dev->broken) ||
- unlikely(!vq->vring.avail)) {
+ if (!vu_is_vq_usable(dev, vq)) {
goto done;
}
@@ -2494,8 +2499,7 @@ vu_queue_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int in_bytes,
bool
vu_queue_empty(VuDev *dev, VuVirtq *vq)
{
- if (unlikely(dev->broken) ||
- unlikely(!vq->vring.avail)) {
+ if (!vu_is_vq_usable(dev, vq)) {
return true;
}
@@ -2534,8 +2538,7 @@ vring_notify(VuDev *dev, VuVirtq *vq)
static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
{
- if (unlikely(dev->broken) ||
- unlikely(!vq->vring.avail)) {
+ if (!vu_is_vq_usable(dev, vq)) {
return;
}
@@ -2860,8 +2863,7 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
unsigned int head;
VuVirtqElement *elem;
- if (unlikely(dev->broken) ||
- unlikely(!vq->vring.avail)) {
+ if (!vu_is_vq_usable(dev, vq)) {
return NULL;
}
@@ -3018,8 +3020,7 @@ vu_queue_fill(VuDev *dev, VuVirtq *vq,
{
struct vring_used_elem uelem;
- if (unlikely(dev->broken) ||
- unlikely(!vq->vring.avail)) {
+ if (!vu_is_vq_usable(dev, vq)) {
return;
}
@@ -3048,8 +3049,7 @@ vu_queue_flush(VuDev *dev, VuVirtq *vq, unsigned int count)
{
uint16_t old, new;
- if (unlikely(dev->broken) ||
- unlikely(!vq->vring.avail)) {
+ if (!vu_is_vq_usable(dev, vq)) {
return;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 13/15] libvhost-user: Factor out vq usability check
2024-02-02 21:53 ` [PATCH v1 13/15] libvhost-user: Factor out vq usability check David Hildenbrand
@ 2024-02-04 2:11 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 2:11 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand <david@redhat.com> wrote:
>
> Let's factor it out to prepare for further changes.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 24 +++++++++++------------
> 1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index 7d8293dc84..febeb2eb89 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -283,6 +283,12 @@ vu_remove_all_mem_regs(VuDev *dev)
> dev->nregions = 0;
> }
>
> +static bool
> +vu_is_vq_usable(VuDev *dev, VuVirtq *vq)
> +{
> + return likely(!dev->broken) && likely(vq->vring.avail);
> +}
> +
> static size_t
> get_fd_pagesize(int fd)
> {
> @@ -2378,8 +2384,7 @@ vu_queue_get_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int *in_bytes,
> idx = vq->last_avail_idx;
>
> total_bufs = in_total = out_total = 0;
> - if (unlikely(dev->broken) ||
> - unlikely(!vq->vring.avail)) {
> + if (!vu_is_vq_usable(dev, vq)) {
> goto done;
> }
>
> @@ -2494,8 +2499,7 @@ vu_queue_avail_bytes(VuDev *dev, VuVirtq *vq, unsigned int in_bytes,
> bool
> vu_queue_empty(VuDev *dev, VuVirtq *vq)
> {
> - if (unlikely(dev->broken) ||
> - unlikely(!vq->vring.avail)) {
> + if (!vu_is_vq_usable(dev, vq)) {
> return true;
> }
>
> @@ -2534,8 +2538,7 @@ vring_notify(VuDev *dev, VuVirtq *vq)
>
> static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
> {
> - if (unlikely(dev->broken) ||
> - unlikely(!vq->vring.avail)) {
> + if (!vu_is_vq_usable(dev, vq)) {
> return;
> }
>
> @@ -2860,8 +2863,7 @@ vu_queue_pop(VuDev *dev, VuVirtq *vq, size_t sz)
> unsigned int head;
> VuVirtqElement *elem;
>
> - if (unlikely(dev->broken) ||
> - unlikely(!vq->vring.avail)) {
> + if (!vu_is_vq_usable(dev, vq)) {
> return NULL;
> }
>
> @@ -3018,8 +3020,7 @@ vu_queue_fill(VuDev *dev, VuVirtq *vq,
> {
> struct vring_used_elem uelem;
>
> - if (unlikely(dev->broken) ||
> - unlikely(!vq->vring.avail)) {
> + if (!vu_is_vq_usable(dev, vq)) {
> return;
> }
>
> @@ -3048,8 +3049,7 @@ vu_queue_flush(VuDev *dev, VuVirtq *vq, unsigned int count)
> {
> uint16_t old, new;
>
> - if (unlikely(dev->broken) ||
> - unlikely(!vq->vring.avail)) {
> + if (!vu_is_vq_usable(dev, vq)) {
> return;
> }
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 14/15] libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (12 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 13/15] libvhost-user: Factor out vq usability check David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 2:15 ` Raphael Norwitz
2024-02-02 21:53 ` [PATCH v1 15/15] libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP David Hildenbrand
` (2 subsequent siblings)
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
Currently, we try to remap all rings whenever we add a single new memory
region. That doesn't quite make sense, because we already map rings when
setting the ring address, and panic if that goes wrong. Likely, that
handling was simply copied from set_mem_table code, where we actually
have to remap all rings.
Remapping all rings might require us to walk quite a lot of memory
regions to perform the address translations. Ideally, we'd simply remove
that remapping.
However, let's be a bit careful. There might be some weird corner cases
where we might temporarily remove a single memory region (e.g., resize
it), that would have worked for now. Further, a ring might be located on
hotplugged memory, and as the VM reboots, we might unplug that memory, to
hotplug memory before resetting the ring addresses.
So let's unmap affected rings as we remove a memory region, and try
dynamically mapping the ring again when required.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 107 ++++++++++++++++------
1 file changed, 78 insertions(+), 29 deletions(-)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index febeb2eb89..738e84ab63 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -283,10 +283,75 @@ vu_remove_all_mem_regs(VuDev *dev)
dev->nregions = 0;
}
+static bool
+map_ring(VuDev *dev, VuVirtq *vq)
+{
+ vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
+ vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
+ vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
+
+ DPRINT("Setting virtq addresses:\n");
+ DPRINT(" vring_desc at %p\n", vq->vring.desc);
+ DPRINT(" vring_used at %p\n", vq->vring.used);
+ DPRINT(" vring_avail at %p\n", vq->vring.avail);
+
+ return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
+}
+
static bool
vu_is_vq_usable(VuDev *dev, VuVirtq *vq)
{
- return likely(!dev->broken) && likely(vq->vring.avail);
+ if (unlikely(dev->broken)) {
+ return false;
+ }
+
+ if (likely(vq->vring.avail)) {
+ return true;
+ }
+
+ /*
+ * In corner cases, we might temporarily remove a memory region that
+ * mapped a ring. When removing a memory region we make sure to
+ * unmap any rings that would be impacted. Let's try to remap if we
+ * already succeeded mapping this ring once.
+ */
+ if (!vq->vra.desc_user_addr || !vq->vra.used_user_addr ||
+ !vq->vra.avail_user_addr) {
+ return false;
+ }
+ if (map_ring(dev, vq)) {
+ vu_panic(dev, "remapping queue on access");
+ return false;
+ }
+ return true;
+}
+
+static void
+unmap_rings(VuDev *dev, VuDevRegion *r)
+{
+ int i;
+
+ for (i = 0; i < dev->max_queues; i++) {
+ VuVirtq *vq = &dev->vq[i];
+ const uintptr_t desc = (uintptr_t)vq->vring.desc;
+ const uintptr_t used = (uintptr_t)vq->vring.used;
+ const uintptr_t avail = (uintptr_t)vq->vring.avail;
+
+ if (desc < r->mmap_addr || desc >= r->mmap_addr + r->size) {
+ continue;
+ }
+ if (used < r->mmap_addr || used >= r->mmap_addr + r->size) {
+ continue;
+ }
+ if (avail < r->mmap_addr || avail >= r->mmap_addr + r->size) {
+ continue;
+ }
+
+ DPRINT("Unmapping rings of queue %d\n", i);
+ vq->vring.desc = NULL;
+ vq->vring.used = NULL;
+ vq->vring.avail = NULL;
+ }
}
static size_t
@@ -784,21 +849,6 @@ vu_reset_device_exec(VuDev *dev, VhostUserMsg *vmsg)
return false;
}
-static bool
-map_ring(VuDev *dev, VuVirtq *vq)
-{
- vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
- vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
- vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
-
- DPRINT("Setting virtq addresses:\n");
- DPRINT(" vring_desc at %p\n", vq->vring.desc);
- DPRINT(" vring_used at %p\n", vq->vring.used);
- DPRINT(" vring_avail at %p\n", vq->vring.avail);
-
- return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
-}
-
static bool
generate_faults(VuDev *dev) {
unsigned int i;
@@ -882,7 +932,6 @@ generate_faults(VuDev *dev) {
static bool
vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
- int i;
VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
if (vmsg->fd_num != 1) {
@@ -928,19 +977,9 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
vmsg->fd_num = 0;
DPRINT("Successfully added new region in postcopy\n");
return true;
- } else {
- for (i = 0; i < dev->max_queues; i++) {
- if (dev->vq[i].vring.desc) {
- if (map_ring(dev, &dev->vq[i])) {
- vu_panic(dev, "remapping queue %d for new memory region",
- i);
- }
- }
- }
-
- DPRINT("Successfully added new region\n");
- return false;
}
+ DPRINT("Successfully added new region\n");
+ return false;
}
static inline bool reg_equal(VuDevRegion *vudev_reg,
@@ -993,6 +1032,16 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
return false;
}
+ /*
+ * There might be valid cases where we temporarily remove memory regions
+ * to readd them again, or remove memory regions and don't use the rings
+ * anymore before we set the ring addresses and restart the device.
+ *
+ * Unmap all affected rings, remapping them on demand later. This should
+ * be a corner case.
+ */
+ unmap_rings(dev, r);
+
munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
idx = r - dev->regions;
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 14/15] libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions
2024-02-02 21:53 ` [PATCH v1 14/15] libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions David Hildenbrand
@ 2024-02-04 2:15 ` Raphael Norwitz
2024-02-04 14:58 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 2:15 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
Someone else with more knowledge of the VQ mapping code should also review.
On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand <david@redhat.com> wrote:
>
> Currently, we try to remap all rings whenever we add a single new memory
> region. That doesn't quite make sense, because we already map rings when
> setting the ring address, and panic if that goes wrong. Likely, that
> handling was simply copied from set_mem_table code, where we actually
> have to remap all rings.
>
> Remapping all rings might require us to walk quite a lot of memory
> regions to perform the address translations. Ideally, we'd simply remove
> that remapping.
>
> However, let's be a bit careful. There might be some weird corner cases
> where we might temporarily remove a single memory region (e.g., resize
> it), that would have worked for now. Further, a ring might be located on
> hotplugged memory, and as the VM reboots, we might unplug that memory, to
> hotplug memory before resetting the ring addresses.
>
> So let's unmap affected rings as we remove a memory region, and try
> dynamically mapping the ring again when required.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 107 ++++++++++++++++------
> 1 file changed, 78 insertions(+), 29 deletions(-)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index febeb2eb89..738e84ab63 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -283,10 +283,75 @@ vu_remove_all_mem_regs(VuDev *dev)
> dev->nregions = 0;
> }
>
> +static bool
> +map_ring(VuDev *dev, VuVirtq *vq)
> +{
> + vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
> + vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
> + vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
> +
> + DPRINT("Setting virtq addresses:\n");
> + DPRINT(" vring_desc at %p\n", vq->vring.desc);
> + DPRINT(" vring_used at %p\n", vq->vring.used);
> + DPRINT(" vring_avail at %p\n", vq->vring.avail);
> +
> + return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
> +}
> +
> static bool
Consider changing the function name to indicate that it may actually map a vq?
Maybe vu_maybe_map_vq()?
> vu_is_vq_usable(VuDev *dev, VuVirtq *vq)
> {
> - return likely(!dev->broken) && likely(vq->vring.avail);
> + if (unlikely(dev->broken)) {
> + return false;
> + }
> +
> + if (likely(vq->vring.avail)) {
> + return true;
> + }
> +
> + /*
> + * In corner cases, we might temporarily remove a memory region that
> + * mapped a ring. When removing a memory region we make sure to
> + * unmap any rings that would be impacted. Let's try to remap if we
> + * already succeeded mapping this ring once.
> + */
> + if (!vq->vra.desc_user_addr || !vq->vra.used_user_addr ||
> + !vq->vra.avail_user_addr) {
> + return false;
> + }
> + if (map_ring(dev, vq)) {
> + vu_panic(dev, "remapping queue on access");
> + return false;
> + }
> + return true;
> +}
> +
> +static void
> +unmap_rings(VuDev *dev, VuDevRegion *r)
> +{
> + int i;
> +
> + for (i = 0; i < dev->max_queues; i++) {
> + VuVirtq *vq = &dev->vq[i];
> + const uintptr_t desc = (uintptr_t)vq->vring.desc;
> + const uintptr_t used = (uintptr_t)vq->vring.used;
> + const uintptr_t avail = (uintptr_t)vq->vring.avail;
> +
> + if (desc < r->mmap_addr || desc >= r->mmap_addr + r->size) {
> + continue;
> + }
> + if (used < r->mmap_addr || used >= r->mmap_addr + r->size) {
> + continue;
> + }
> + if (avail < r->mmap_addr || avail >= r->mmap_addr + r->size) {
> + continue;
> + }
> +
> + DPRINT("Unmapping rings of queue %d\n", i);
> + vq->vring.desc = NULL;
> + vq->vring.used = NULL;
> + vq->vring.avail = NULL;
> + }
> }
>
> static size_t
> @@ -784,21 +849,6 @@ vu_reset_device_exec(VuDev *dev, VhostUserMsg *vmsg)
> return false;
> }
>
> -static bool
> -map_ring(VuDev *dev, VuVirtq *vq)
> -{
> - vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
> - vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
> - vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
> -
> - DPRINT("Setting virtq addresses:\n");
> - DPRINT(" vring_desc at %p\n", vq->vring.desc);
> - DPRINT(" vring_used at %p\n", vq->vring.used);
> - DPRINT(" vring_avail at %p\n", vq->vring.avail);
> -
> - return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
> -}
> -
> static bool
> generate_faults(VuDev *dev) {
> unsigned int i;
> @@ -882,7 +932,6 @@ generate_faults(VuDev *dev) {
>
> static bool
> vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> - int i;
> VhostUserMemoryRegion m = vmsg->payload.memreg.region, *msg_region = &m;
>
> if (vmsg->fd_num != 1) {
> @@ -928,19 +977,9 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> vmsg->fd_num = 0;
> DPRINT("Successfully added new region in postcopy\n");
> return true;
> - } else {
> - for (i = 0; i < dev->max_queues; i++) {
> - if (dev->vq[i].vring.desc) {
> - if (map_ring(dev, &dev->vq[i])) {
> - vu_panic(dev, "remapping queue %d for new memory region",
> - i);
> - }
> - }
> - }
> -
> - DPRINT("Successfully added new region\n");
> - return false;
> }
> + DPRINT("Successfully added new region\n");
> + return false;
> }
>
> static inline bool reg_equal(VuDevRegion *vudev_reg,
> @@ -993,6 +1032,16 @@ vu_rem_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
> return false;
> }
>
> + /*
> + * There might be valid cases where we temporarily remove memory regions
> + * to readd them again, or remove memory regions and don't use the rings
> + * anymore before we set the ring addresses and restart the device.
> + *
> + * Unmap all affected rings, remapping them on demand later. This should
> + * be a corner case.
> + */
> + unmap_rings(dev, r);
> +
> munmap((void *)(uintptr_t)r->mmap_addr, r->size + r->mmap_offset);
>
> idx = r - dev->regions;
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 14/15] libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions
2024-02-04 2:15 ` Raphael Norwitz
@ 2024-02-04 14:58 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-02-04 14:58 UTC (permalink / raw)
To: Raphael Norwitz
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On 04.02.24 03:15, Raphael Norwitz wrote:
> Someone else with more knowledge of the VQ mapping code should also review.
>
> On Fri, Feb 2, 2024 at 4:55 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> Currently, we try to remap all rings whenever we add a single new memory
>> region. That doesn't quite make sense, because we already map rings when
>> setting the ring address, and panic if that goes wrong. Likely, that
>> handling was simply copied from set_mem_table code, where we actually
>> have to remap all rings.
>>
>> Remapping all rings might require us to walk quite a lot of memory
>> regions to perform the address translations. Ideally, we'd simply remove
>> that remapping.
>>
>> However, let's be a bit careful. There might be some weird corner cases
>> where we might temporarily remove a single memory region (e.g., resize
>> it), that would have worked for now. Further, a ring might be located on
>> hotplugged memory, and as the VM reboots, we might unplug that memory, to
>> hotplug memory before resetting the ring addresses.
>>
>> So let's unmap affected rings as we remove a memory region, and try
>> dynamically mapping the ring again when required.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Acked-by: Raphael Norwitz <raphael@enfabrica.net>
Thanks!
>
>> ---
>> subprojects/libvhost-user/libvhost-user.c | 107 ++++++++++++++++------
>> 1 file changed, 78 insertions(+), 29 deletions(-)
>>
>> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
>> index febeb2eb89..738e84ab63 100644
>> --- a/subprojects/libvhost-user/libvhost-user.c
>> +++ b/subprojects/libvhost-user/libvhost-user.c
>> @@ -283,10 +283,75 @@ vu_remove_all_mem_regs(VuDev *dev)
>> dev->nregions = 0;
>> }
>>
>> +static bool
>> +map_ring(VuDev *dev, VuVirtq *vq)
>> +{
>> + vq->vring.desc = qva_to_va(dev, vq->vra.desc_user_addr);
>> + vq->vring.used = qva_to_va(dev, vq->vra.used_user_addr);
>> + vq->vring.avail = qva_to_va(dev, vq->vra.avail_user_addr);
>> +
>> + DPRINT("Setting virtq addresses:\n");
>> + DPRINT(" vring_desc at %p\n", vq->vring.desc);
>> + DPRINT(" vring_used at %p\n", vq->vring.used);
>> + DPRINT(" vring_avail at %p\n", vq->vring.avail);
>> +
>> + return !(vq->vring.desc && vq->vring.used && vq->vring.avail);
>> +}
>> +
>> static bool
>
> Consider changing the function name to indicate that it may actually map a vq?
>
> Maybe vu_maybe_map_vq()?
I don't think that would be really better. It's an implementation detial
that we try to recover in these corner cases by remapping the rings.
In the majority of all cases this function will simply check whether the
device is broken and the vring was set up properly (which usually
implies mapping the rings).
So I think in the caller:
if (!vu_is_vq_usable()) {
return;
}
is easier to get than:
if (!vu_maybe_map_vq()) {
return;
}
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v1 15/15] libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (13 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 14/15] libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions David Hildenbrand
@ 2024-02-02 21:53 ` David Hildenbrand
2024-02-04 2:16 ` Raphael Norwitz
2024-02-07 11:40 ` [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code Stefano Garzarella
2024-02-13 17:33 ` Michael S. Tsirkin
16 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-02 21:53 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S . Tsirkin, Jason Wang,
Stefan Hajnoczi, Stefano Garzarella, Germano Veit Michel,
Raphael Norwitz
We already use MADV_NORESERVE to deal with sparse memory regions. Let's
also set madvise(MADV_DONTDUMP), otherwise a crash of the process can
result in us allocating all memory in the mmap'ed region for dumping
purposes.
This change implies that the mmap'ed rings won't be included in a
coredump. If ever required for debugging purposes, we could mark only
the mapped rings MADV_DODUMP.
Ignore errors during madvise() for now.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
subprojects/libvhost-user/libvhost-user.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index 738e84ab63..26c289518c 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -458,6 +458,12 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
(uint64_t)(uintptr_t)mmap_addr);
+#if defined(__linux__)
+ /* Don't include all guest memory in a coredump. */
+ madvise(mmap_addr, msg_region->memory_size + mmap_offset,
+ MADV_DONTDUMP);
+#endif
+
/* Shift all affected entries by 1 to open a hole at idx. */
r = &dev->regions[idx];
memmove(r + 1, r, sizeof(VuDevRegion) * (dev->nregions - idx));
--
2.43.0
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v1 15/15] libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP
2024-02-02 21:53 ` [PATCH v1 15/15] libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP David Hildenbrand
@ 2024-02-04 2:16 ` Raphael Norwitz
0 siblings, 0 replies; 46+ messages in thread
From: Raphael Norwitz @ 2024-02-04 2:16 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Stefano Garzarella, Germano Veit Michel, Raphael Norwitz
On Fri, Feb 2, 2024 at 4:56 PM David Hildenbrand <david@redhat.com> wrote:
>
> We already use MADV_NORESERVE to deal with sparse memory regions. Let's
> also set madvise(MADV_DONTDUMP), otherwise a crash of the process can
> result in us allocating all memory in the mmap'ed region for dumping
> purposes.
>
> This change implies that the mmap'ed rings won't be included in a
> coredump. If ever required for debugging purposes, we could mark only
> the mapped rings MADV_DODUMP.
>
> Ignore errors during madvise() for now.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
> ---
> subprojects/libvhost-user/libvhost-user.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
> index 738e84ab63..26c289518c 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -458,6 +458,12 @@ _vu_add_mem_reg(VuDev *dev, VhostUserMemoryRegion *msg_region, int fd)
> DPRINT(" mmap_addr: 0x%016"PRIx64"\n",
> (uint64_t)(uintptr_t)mmap_addr);
>
> +#if defined(__linux__)
> + /* Don't include all guest memory in a coredump. */
> + madvise(mmap_addr, msg_region->memory_size + mmap_offset,
> + MADV_DONTDUMP);
> +#endif
> +
> /* Shift all affected entries by 1 to open a hole at idx. */
> r = &dev->regions[idx];
> memmove(r + 1, r, sizeof(VuDevRegion) * (dev->nregions - idx));
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (14 preceding siblings ...)
2024-02-02 21:53 ` [PATCH v1 15/15] libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP David Hildenbrand
@ 2024-02-07 11:40 ` Stefano Garzarella
2024-02-09 22:36 ` David Hildenbrand
2024-02-13 17:33 ` Michael S. Tsirkin
16 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2024-02-07 11:40 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Germano Veit Michel, Raphael Norwitz
On Fri, Feb 02, 2024 at 10:53:17PM +0100, David Hildenbrand wrote:
>This series adds support for more memslots (509) to libvhost-user, to
>make it fully compatible with virtio-mem that uses up to 256 memslots
>accross all memory devices in "dynamic-memslot" mode (more details
>in patch #3).
>
>One simple fix upfront.
>
>With that in place, this series optimizes and extens memory region
>handling in libvhost-user:
>* Heavily deduplicate and clean up the memory region handling code
>* Speeds up GPA->VA translation with many memslots using binary search
>* Optimize mmap_offset handling to use it as fd_offset for mmap()
>* Avoid ring remapping when adding a single memory region
>* Avoid dumping all guest memory, possibly allocating memory in sparse
> memory mappings when the process crashes
>
>I'm being very careful to not break some weird corner case that modern
>QEMU might no longer trigger, but older one could have triggered or some
>other frontend might trigger.
>
>The only thing where I am not careful is to forbid memory regions that
>overlap in GPA space: it doesn't make any sense.
>
>With this series, virtio-mem (with dynamic-memslots=on) +
>qemu-storage-daemon works flawlessly and as expected in my tests.
I don't know much about this code, but I didn't find anything wrong with
it. Thank you also for the great cleanup!
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code
2024-02-07 11:40 ` [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code Stefano Garzarella
@ 2024-02-09 22:36 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-02-09 22:36 UTC (permalink / raw)
To: Stefano Garzarella
Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Stefan Hajnoczi,
Germano Veit Michel, Raphael Norwitz
On 07.02.24 12:40, Stefano Garzarella wrote:
> On Fri, Feb 02, 2024 at 10:53:17PM +0100, David Hildenbrand wrote:
>> This series adds support for more memslots (509) to libvhost-user, to
>> make it fully compatible with virtio-mem that uses up to 256 memslots
>> accross all memory devices in "dynamic-memslot" mode (more details
>> in patch #3).
>>
>> One simple fix upfront.
>>
>> With that in place, this series optimizes and extens memory region
>> handling in libvhost-user:
>> * Heavily deduplicate and clean up the memory region handling code
>> * Speeds up GPA->VA translation with many memslots using binary search
>> * Optimize mmap_offset handling to use it as fd_offset for mmap()
>> * Avoid ring remapping when adding a single memory region
>> * Avoid dumping all guest memory, possibly allocating memory in sparse
>> memory mappings when the process crashes
>>
>> I'm being very careful to not break some weird corner case that modern
>> QEMU might no longer trigger, but older one could have triggered or some
>> other frontend might trigger.
>>
>> The only thing where I am not careful is to forbid memory regions that
>> overlap in GPA space: it doesn't make any sense.
>>
>> With this series, virtio-mem (with dynamic-memslots=on) +
>> qemu-storage-daemon works flawlessly and as expected in my tests.
>
> I don't know much about this code, but I didn't find anything wrong with
> it. Thank you also for the great cleanup!
>
> Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Thanks Stefano for the review, highly appreciated!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code
2024-02-02 21:53 [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code David Hildenbrand
` (15 preceding siblings ...)
2024-02-07 11:40 ` [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code Stefano Garzarella
@ 2024-02-13 17:33 ` Michael S. Tsirkin
2024-02-13 18:27 ` David Hildenbrand
16 siblings, 1 reply; 46+ messages in thread
From: Michael S. Tsirkin @ 2024-02-13 17:33 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Jason Wang, Stefan Hajnoczi, Stefano Garzarella,
Germano Veit Michel, Raphael Norwitz
On Fri, Feb 02, 2024 at 10:53:17PM +0100, David Hildenbrand wrote:
> This series adds support for more memslots (509) to libvhost-user, to
> make it fully compatible with virtio-mem that uses up to 256 memslots
> accross all memory devices in "dynamic-memslot" mode (more details
> in patch #3).
Breaks build on some systems. E.g.
https://gitlab.com/mstredhat/qemu/-/jobs/6163591599
> One simple fix upfront.
>
> With that in place, this series optimizes and extens memory region
> handling in libvhost-user:
> * Heavily deduplicate and clean up the memory region handling code
> * Speeds up GPA->VA translation with many memslots using binary search
> * Optimize mmap_offset handling to use it as fd_offset for mmap()
> * Avoid ring remapping when adding a single memory region
> * Avoid dumping all guest memory, possibly allocating memory in sparse
> memory mappings when the process crashes
>
> I'm being very careful to not break some weird corner case that modern
> QEMU might no longer trigger, but older one could have triggered or some
> other frontend might trigger.
>
> The only thing where I am not careful is to forbid memory regions that
> overlap in GPA space: it doesn't make any sense.
>
> With this series, virtio-mem (with dynamic-memslots=on) +
> qemu-storage-daemon works flawlessly and as expected in my tests.
>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Stefano Garzarella <sgarzare@redhat.com>
> Cc: Germano Veit Michel <germano@redhat.com>
> Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
>
> David Hildenbrand (15):
> libvhost-user: Fix msg_region->userspace_addr computation
> libvhost-user: Dynamically allocate memory for memory slots
> libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509
> libvhost-user: Factor out removing all mem regions
> libvhost-user: Merge vu_set_mem_table_exec_postcopy() into
> vu_set_mem_table_exec()
> libvhost-user: Factor out adding a memory region
> libvhost-user: No need to check for NULL when unmapping
> libvhost-user: Don't zero out memory for memory regions
> libvhost-user: Don't search for duplicates when removing memory
> regions
> libvhost-user: Factor out search for memory region by GPA and simplify
> libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()
> libvhost-user: Use most of mmap_offset as fd_offset
> libvhost-user: Factor out vq usability check
> libvhost-user: Dynamically remap rings after (temporarily?) removing
> memory regions
> libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP
>
> subprojects/libvhost-user/libvhost-user.c | 593 ++++++++++++----------
> subprojects/libvhost-user/libvhost-user.h | 10 +-
> 2 files changed, 332 insertions(+), 271 deletions(-)
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code
2024-02-13 17:33 ` Michael S. Tsirkin
@ 2024-02-13 18:27 ` David Hildenbrand
2024-02-13 18:55 ` Michael S. Tsirkin
0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2024-02-13 18:27 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, Jason Wang, Stefan Hajnoczi, Stefano Garzarella,
Germano Veit Michel, Raphael Norwitz
On 13.02.24 18:33, Michael S. Tsirkin wrote:
> On Fri, Feb 02, 2024 at 10:53:17PM +0100, David Hildenbrand wrote:
>> This series adds support for more memslots (509) to libvhost-user, to
>> make it fully compatible with virtio-mem that uses up to 256 memslots
>> accross all memory devices in "dynamic-memslot" mode (more details
>> in patch #3).
>
>
> Breaks build on some systems. E.g.
> https://gitlab.com/mstredhat/qemu/-/jobs/6163591599
>
>
./subprojects/libvhost-user/libvhost-user.c:369:27: error: comparison of
integer expressions of different signedness: ‘long int’ and ‘unsigned
int’ [-Werror=sign-compare]
369 | if (!ret && fs.f_type == HUGETLBFS_MAGIC) {
| ^~
So easy to fix in v2, thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code
2024-02-13 18:27 ` David Hildenbrand
@ 2024-02-13 18:55 ` Michael S. Tsirkin
2024-02-14 11:06 ` David Hildenbrand
0 siblings, 1 reply; 46+ messages in thread
From: Michael S. Tsirkin @ 2024-02-13 18:55 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Jason Wang, Stefan Hajnoczi, Stefano Garzarella,
Germano Veit Michel, Raphael Norwitz
On Tue, Feb 13, 2024 at 07:27:44PM +0100, David Hildenbrand wrote:
> On 13.02.24 18:33, Michael S. Tsirkin wrote:
> > On Fri, Feb 02, 2024 at 10:53:17PM +0100, David Hildenbrand wrote:
> > > This series adds support for more memslots (509) to libvhost-user, to
> > > make it fully compatible with virtio-mem that uses up to 256 memslots
> > > accross all memory devices in "dynamic-memslot" mode (more details
> > > in patch #3).
> >
> >
> > Breaks build on some systems. E.g.
> > https://gitlab.com/mstredhat/qemu/-/jobs/6163591599
> >
> >
>
> ./subprojects/libvhost-user/libvhost-user.c:369:27: error: comparison of
> integer expressions of different signedness: ‘long int’ and ‘unsigned int’
> [-Werror=sign-compare]
> 369 | if (!ret && fs.f_type == HUGETLBFS_MAGIC) {
> | ^~
>
> So easy to fix in v2, thanks!
I think there is another problem around plugins though.
> --
> Cheers,
>
> David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code
2024-02-13 18:55 ` Michael S. Tsirkin
@ 2024-02-14 11:06 ` David Hildenbrand
0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2024-02-14 11:06 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, Jason Wang, Stefan Hajnoczi, Stefano Garzarella,
Germano Veit Michel, Raphael Norwitz
On 13.02.24 19:55, Michael S. Tsirkin wrote:
> On Tue, Feb 13, 2024 at 07:27:44PM +0100, David Hildenbrand wrote:
>> On 13.02.24 18:33, Michael S. Tsirkin wrote:
>>> On Fri, Feb 02, 2024 at 10:53:17PM +0100, David Hildenbrand wrote:
>>>> This series adds support for more memslots (509) to libvhost-user, to
>>>> make it fully compatible with virtio-mem that uses up to 256 memslots
>>>> accross all memory devices in "dynamic-memslot" mode (more details
>>>> in patch #3).
>>>
>>>
>>> Breaks build on some systems. E.g.
>>> https://gitlab.com/mstredhat/qemu/-/jobs/6163591599
>>>
>>>
>>
>> ./subprojects/libvhost-user/libvhost-user.c:369:27: error: comparison of
>> integer expressions of different signedness: ‘long int’ and ‘unsigned int’
>> [-Werror=sign-compare]
>> 369 | if (!ret && fs.f_type == HUGETLBFS_MAGIC) {
>> | ^~
>>
>> So easy to fix in v2, thanks!
>
>
> I think there is another problem around plugins though.
There is a wrong checkpatch error:
https://gitlab.com/mstredhat/qemu/-/jobs/6162397277
d96f29518232719b0c444ab93913e8515a6cb5c6:100: ERROR: use
qemu_real_host_page_size() instead of getpagesize()
total: 1 errors, 1 warnings, 81 lines checked
qemu_real_host_page_size() is not available in libvhost-user. But I
could just change that code to not require getpagesize() at all.
Apart from that, I don't spot anything libvhost-user related (some qtest
timeouts, a "error_setv: Assertion `*errp == NULL' failed."). Did I miss
something?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 46+ messages in thread