From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 091DECD3420 for ; Tue, 3 Sep 2024 11:55:12 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1slS7G-0008CR-VK; Tue, 03 Sep 2024 07:54:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1slS7D-0008By-PA for qemu-devel@nongnu.org; Tue, 03 Sep 2024 07:54:31 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1slS79-0000Ac-IP for qemu-devel@nongnu.org; Tue, 03 Sep 2024 07:54:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725364466; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bkn5SqMjOKjX19UZKeYGORqJOvVbzUtcY67nnyaffeQ=; b=fQACqaV3mkj6JTesEww0agQ4fiGk7h2RUe60FH/f5ASS+GPdqU7jUwy+WhDSz0jwMTSdfq cCwraZWBE2MKTNTsx1cPClPJLiF3rd5o/5FnVyYVDl44NOAaFCsLhYmKEWmwh+EmB8saM5 VpsQyenHiQcHa9H32wIMn/HZUCY+/YA= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-322-RNEaoTfpPfWalx_9jIWPrw-1; Tue, 03 Sep 2024 07:54:24 -0400 X-MC-Unique: RNEaoTfpPfWalx_9jIWPrw-1 Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-2054ff12bb9so24892775ad.1 for ; Tue, 03 Sep 2024 04:54:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725364464; x=1725969264; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bkn5SqMjOKjX19UZKeYGORqJOvVbzUtcY67nnyaffeQ=; b=BGOETW0ucky15XSMrWFxppG8W+wiDIh+i/pZhooaYXZCFsQmuLmo3csvKU2Y6pwWlM OHOUXbzIk6eDizEy44xjVgPPrSybcGwjIYrlXcvXb4yRVpg4Ui1Axro6S1nD82yfXIxv RBfOeV0JNlN/zH7La9X3r7gUsvKO4qaj4lC/aDUpmeVpeW4en2mhajhpWHgGGO92mJNJ njrYtowKDjLSnzF+j2vY3MIHmdpqFPJ0+lPQCYu1Oq/Lmbam7DPEokrNbo+2ClyYSJUg 3B/qq4jWLTtGL4YIrAfc4PPY7OcYcBgBVw+Vdj9gf6+VkrDffgQ/Ht+B3FZiQKoQCMAp FZjQ== X-Gm-Message-State: AOJu0YwXgNmtInE7h5HQpkNHrG6jWz+/SqskMcpcwh2zpGlTtr0nB8RD z1yMvsRHetL/BafcVkUa/u8LPNxWMQhFez3gbbJujm9l/i0sBGQrwDSyRJvHmRoCzUcb8F4AjIr 9U5noVGhe5H62Me05zkvfXjHgzow1/3Ca1dIxnxGjDH8ZQ0+Dx2FA5CqsB+bD/9GhjjjDhdfmhS vNBN3VR/q17nqCU0kJiBjY85Z7qog= X-Received: by 2002:a17:902:ea0e:b0:206:9519:1821 with SMTP id d9443c01a7336-20695191a88mr22151945ad.14.1725364463751; Tue, 03 Sep 2024 04:54:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHgwU5eWelhsY6RVBuYxleRTxE/mftxxye3Gl/4tPHrx6KK6WHyag14FDZszDQAxiWE1tCqPLTmc2bKHx/3o6Y= X-Received: by 2002:a17:902:ea0e:b0:206:9519:1821 with SMTP id d9443c01a7336-20695191a88mr22151615ad.14.1725364463289; Tue, 03 Sep 2024 04:54:23 -0700 (PDT) MIME-Version: 1.0 References: <20240628145710.1516121-1-aesteve@redhat.com> <20240628145710.1516121-2-aesteve@redhat.com> <20240711074510.GC563880@dynamic-pd01.res.v6.highway.a1.net> In-Reply-To: From: Albert Esteve Date: Tue, 3 Sep 2024 13:54:12 +0200 Message-ID: Subject: Re: [RFC PATCH v2 1/5] vhost-user: Add VIRTIO Shared Memory map request To: Stefan Hajnoczi Cc: qemu-devel@nongnu.org, jasowang@redhat.com, david@redhat.com, slp@redhat.com, =?UTF-8?B?QWxleCBCZW5uw6ll?= , "Michael S. Tsirkin" Content-Type: multipart/alternative; boundary="000000000000f17786062135b767" Received-SPF: pass client-ip=170.10.133.124; envelope-from=aesteve@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.142, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org --000000000000f17786062135b767 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Sep 3, 2024 at 11:54=E2=80=AFAM Albert Esteve = wrote: > > > On Thu, Jul 11, 2024 at 9:45=E2=80=AFAM Stefan Hajnoczi > wrote: > >> On Fri, Jun 28, 2024 at 04:57:06PM +0200, Albert Esteve wrote: >> > Add SHMEM_MAP/UNMAP requests to vhost-user to >> > handle VIRTIO Shared Memory mappings. >> > >> > This request allows backends to dynamically map >> > fds into a VIRTIO Shared Memory Region indentified >> > by its `shmid`. Then, the fd memory is advertised >> > to the driver as a base addres + offset, so it >> > can be read/written (depending on the mmap flags >> > requested) while its valid. >> > >> > The backend can munmap the memory range >> > in a given VIRTIO Shared Memory Region (again, >> > identified by its `shmid`), to free it. Upon >> > receiving this message, the front-end must >> > mmap the regions with PROT_NONE to reserve >> > the virtual memory space. >> > >> > The device model needs to create MemoryRegion >> > instances for the VIRTIO Shared Memory Regions >> > and add them to the `VirtIODevice` instance. >> > >> > Signed-off-by: Albert Esteve >> > --- >> > docs/interop/vhost-user.rst | 27 +++++ >> > hw/virtio/vhost-user.c | 122 +++++++++++++++++++++= + >> > hw/virtio/virtio.c | 12 +++ >> > include/hw/virtio/virtio.h | 5 + >> > subprojects/libvhost-user/libvhost-user.c | 65 ++++++++++++ >> > subprojects/libvhost-user/libvhost-user.h | 53 ++++++++++ >> > 6 files changed, 284 insertions(+) >> > >> > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst >> > index d8419fd2f1..d52ba719d5 100644 >> > --- a/docs/interop/vhost-user.rst >> > +++ b/docs/interop/vhost-user.rst >> > @@ -1859,6 +1859,33 @@ is sent by the front-end. >> > when the operation is successful, or non-zero otherwise. Note that >> if the >> > operation fails, no fd is sent to the backend. >> > >> > +``VHOST_USER_BACKEND_SHMEM_MAP`` >> > + :id: 9 >> > + :equivalent ioctl: N/A >> > + :request payload: fd and ``struct VhostUserMMap`` >> > + :reply payload: N/A >> > + >> > + This message can be submitted by the backends to advertise a new >> mapping >> > + to be made in a given VIRTIO Shared Memory Region. Upon receiving >> the message, >> > + The front-end will mmap the given fd into the VIRTIO Shared Memory >> Region >> > + with the requested ``shmid``. A reply is generated indicating >> whether mapping >> > + succeeded. >> > + >> > + Mapping over an already existing map is not allowed and request >> shall fail. >> > + Therefore, the memory range in the request must correspond with a >> valid, >> > + free region of the VIRTIO Shared Memory Region. >> > + >> > +``VHOST_USER_BACKEND_SHMEM_UNMAP`` >> > + :id: 10 >> > + :equivalent ioctl: N/A >> > + :request payload: ``struct VhostUserMMap`` >> > + :reply payload: N/A >> > + >> > + This message can be submitted by the backends so that the front-end >> un-mmap >> > + a given range (``offset``, ``len``) in the VIRTIO Shared Memory >> Region with >> >> s/offset/shm_offset/ >> >> > + the requested ``shmid``. >> >> Please clarify that must correspond to the entirety of a >> valid mapped region. >> >> By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs >> DAX Window: >> >> When a FUSE_SETUPMAPPING request perfectly overlaps a previous >> mapping, the previous mapping is replaced. When a mapping partially >> overlaps a previous mapping, the previous mapping is split into one or >> two smaller mappings. When a mapping is partially unmapped it is also >> split into one or two smaller mappings. >> >> Establishing new mappings or splitting existing mappings consumes >> resources. If the device runs out of resources the FUSE_SETUPMAPPING >> request fails until resources are available again following >> FUSE_REMOVEMAPPING. >> >> I think SETUPMAPPING/REMOVMAPPING can be implemented using >> SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing >> partial ranges, but as far as I know that's not necessary for virtiofs >> in practice. >> >> It's worth mentioning that mappings consume resources and that SHMEM_MAP >> can fail when there are no resources available. The process-wide limit >> is vm.max_map_count on Linux although a vhost-user frontend may reduce >> it further to control vhost-user resource usage. >> >> > + A reply is generated indicating whether unmapping succeeded. >> > + >> > .. _reply_ack: >> > >> > VHOST_USER_PROTOCOL_F_REPLY_ACK >> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >> > index cdf9af4a4b..7ee8a472c6 100644 >> > --- a/hw/virtio/vhost-user.c >> > +++ b/hw/virtio/vhost-user.c >> > @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest { >> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD =3D 6, >> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE =3D 7, >> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP =3D 8, >> > + VHOST_USER_BACKEND_SHMEM_MAP =3D 9, >> > + VHOST_USER_BACKEND_SHMEM_UNMAP =3D 10, >> > VHOST_USER_BACKEND_MAX >> > } VhostUserBackendRequest; >> > >> > @@ -192,6 +194,24 @@ typedef struct VhostUserShared { >> > unsigned char uuid[16]; >> > } VhostUserShared; >> > >> > +/* For the flags field of VhostUserMMap */ >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0) >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1) >> > + >> > +typedef struct { >> > + /* VIRTIO Shared Memory Region ID */ >> > + uint8_t shmid; >> > + uint8_t padding[7]; >> > + /* File offset */ >> > + uint64_t fd_offset; >> > + /* Offset within the VIRTIO Shared Memory Region */ >> > + uint64_t shm_offset; >> > + /* Size of the mapping */ >> > + uint64_t len; >> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */ >> > + uint64_t flags; >> > +} VhostUserMMap; >> > + >> > typedef struct { >> > VhostUserRequest request; >> > >> > @@ -224,6 +244,7 @@ typedef union { >> > VhostUserInflight inflight; >> > VhostUserShared object; >> > VhostUserTransferDeviceState transfer_state; >> > + VhostUserMMap mmap; >> > } VhostUserPayload; >> > >> > typedef struct VhostUserMsg { >> > @@ -1748,6 +1769,100 @@ >> vhost_user_backend_handle_shared_object_lookup(struct vhost_user *u, >> > return 0; >> > } >> > >> > +static int >> > +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev, >> > + VhostUserMMap *vu_mmap, >> > + int fd) >> > +{ >> > + void *addr =3D 0; >> > + MemoryRegion *mr =3D NULL; >> > + >> > + if (fd < 0) { >> > + error_report("Bad fd for map"); >> > + return -EBADF; >> > + } >> > + >> > + if (!dev->vdev->shmem_list || >> > + dev->vdev->n_shmem_regions <=3D vu_mmap->shmid) { >> > + error_report("Device only has %d VIRTIO Shared Memory Regions= . >> " >> > + "Requested ID: %d", >> > + dev->vdev->n_shmem_regions, vu_mmap->shmid); >> > + return -EFAULT; >> > + } >> > + >> > + mr =3D &dev->vdev->shmem_list[vu_mmap->shmid]; >> > + >> > + if (!mr) { >> > + error_report("VIRTIO Shared Memory Region at " >> > + "ID %d unitialized", vu_mmap->shmid); >> > + return -EFAULT; >> > + } >> > + >> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len || >> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) { >> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64, >> > + vu_mmap->shm_offset, vu_mmap->len); >> > + return -EFAULT; >> > + } >> > + >> > + void *shmem_ptr =3D memory_region_get_ram_ptr(mr); >> > + >> > + addr =3D mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len, >> >> Missing check for overlap between range [shm_offset, shm_offset + len) >> and existing mappings. >> > > Not sure how to do this check. Specifically, I am not sure how previous > ranges are stored within the MemoryRegion. Is looping through > mr->subregions > a valid option? > Maybe something like this would do? ``` if (memory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) { error_report("Requested memory (%" PRIx64 "+%" PRIx64 " overalps " "with previously mapped memory", vu_mmap->shm_offset, vu_mmap->len); return -EFAULT; } ``` > > >> >> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_R) ? PROT_READ : 0) | >> > + ((vu_mmap->flags & VHOST_USER_FLAG_MAP_W) ? PROT_WRITE : 0), >> > + MAP_SHARED | MAP_FIXED, fd, vu_mmap->fd_offset); >> > + >> > + if (addr =3D=3D MAP_FAILED) { >> > + error_report("Failed to mmap mem fd"); >> > + return -EFAULT; >> > + } >> > + >> > + return 0; >> > +} >> > + >> > +static int >> > +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev, >> > + VhostUserMMap *vu_mmap) >> > +{ >> > + void *addr =3D 0; >> > + MemoryRegion *mr =3D NULL; >> > + >> > + if (!dev->vdev->shmem_list || >> > + dev->vdev->n_shmem_regions <=3D vu_mmap->shmid) { >> > + error_report("Device only has %d VIRTIO Shared Memory Regions= . >> " >> > + "Requested ID: %d", >> > + dev->vdev->n_shmem_regions, vu_mmap->shmid); >> > + return -EFAULT; >> > + } >> > + >> > + mr =3D &dev->vdev->shmem_list[vu_mmap->shmid]; >> > + >> > + if (!mr) { >> > + error_report("VIRTIO Shared Memory Region at " >> > + "ID %d unitialized", vu_mmap->shmid); >> > + return -EFAULT; >> > + } >> > + >> > + if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len || >> > + (vu_mmap->shm_offset + vu_mmap->len) > mr->size) { >> > + error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64, >> > + vu_mmap->shm_offset, vu_mmap->len); >> > + return -EFAULT; >> > + } >> > + >> > + void *shmem_ptr =3D memory_region_get_ram_ptr(mr); >> > + >> > + addr =3D mmap(shmem_ptr + vu_mmap->shm_offset, vu_mmap->len, >> >> Missing check for existing mapping with exact range [shm_offset, len) >> match. >> >> > + PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, >> -1, 0); >> > + >> > + if (addr =3D=3D MAP_FAILED) { >> > + error_report("Failed to unmap memory"); >> > + return -EFAULT; >> > + } >> > + >> > + return 0; >> > +} >> > + >> > static void close_backend_channel(struct vhost_user *u) >> > { >> > g_source_destroy(u->backend_src); >> > @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc, >> GIOCondition condition, >> > ret =3D >> vhost_user_backend_handle_shared_object_lookup(dev->opaque, ioc, >> > &hdr, >> &payload); >> > break; >> > + case VHOST_USER_BACKEND_SHMEM_MAP: >> > + ret =3D vhost_user_backend_handle_shmem_map(dev, &payload.mma= p, >> > + fd ? fd[0] : -1); >> > + break; >> > + case VHOST_USER_BACKEND_SHMEM_UNMAP: >> > + ret =3D vhost_user_backend_handle_shmem_unmap(dev, >> &payload.mmap); >> > + break; >> > default: >> > error_report("Received unexpected msg type: %d.", hdr.request= ); >> > ret =3D -EINVAL; >> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c >> > index 893a072c9d..9f2da5b11e 100644 >> > --- a/hw/virtio/virtio.c >> > +++ b/hw/virtio/virtio.c >> > @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f= ) >> > return vmstate_save_state(f, &vmstate_virtio, vdev, NULL); >> > } >> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev) >> > +{ >> > + MemoryRegion *mr =3D g_new0(MemoryRegion, 1); >> > + ++vdev->n_shmem_regions; >> > + vdev->shmem_list =3D g_renew(MemoryRegion, vdev->shmem_list, >> > + vdev->n_shmem_regions); >> >> Where is shmem_list freed? >> >> The name "list" is misleading since this is an array, not a list. >> >> > + vdev->shmem_list[vdev->n_shmem_regions - 1] =3D *mr; >> > + return mr; >> > +} >> >> This looks weird. The contents of mr are copied into shmem_list[] and >> then the pointer to mr is returned? Did you mean for the field's type to >> be MemoryRegion **shmem_list and then vdev->shmem_list[...] =3D mr would >> stash the pointer? >> >> > + >> > /* A wrapper for use as a VMState .put function */ >> > static int virtio_device_put(QEMUFile *f, void *opaque, size_t size, >> > const VMStateField *field, JSONWriter >> *vmdesc) >> > @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t >> device_id, size_t config_size) >> > virtio_vmstate_change, vdev); >> > vdev->device_endian =3D virtio_default_endian(); >> > vdev->use_guest_notifier_mask =3D true; >> > + vdev->shmem_list =3D NULL; >> > + vdev->n_shmem_regions =3D 0; >> > } >> > >> > /* >> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h >> > index 7d5ffdc145..16d598aadc 100644 >> > --- a/include/hw/virtio/virtio.h >> > +++ b/include/hw/virtio/virtio.h >> > @@ -165,6 +165,9 @@ struct VirtIODevice >> > */ >> > EventNotifier config_notifier; >> > bool device_iotlb_enabled; >> > + /* Shared memory region for vhost-user mappings. */ >> > + MemoryRegion *shmem_list; >> > + int n_shmem_regions; >> > }; >> > >> > struct VirtioDeviceClass { >> > @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue >> *vq); >> > >> > int virtio_save(VirtIODevice *vdev, QEMUFile *f); >> > >> > +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev); >> > + >> > extern const VMStateInfo virtio_vmstate_info; >> > >> > #define VMSTATE_VIRTIO_DEVICE \ >> > diff --git a/subprojects/libvhost-user/libvhost-user.c >> b/subprojects/libvhost-user/libvhost-user.c >> > index a879149fef..28556d183a 100644 >> > --- a/subprojects/libvhost-user/libvhost-user.c >> > +++ b/subprojects/libvhost-user/libvhost-user.c >> > @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char >> uuid[UUID_LEN]) >> > return vu_send_message(dev, &msg); >> > } >> > >> > +bool >> > +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset, >> > + uint64_t shm_offset, uint64_t len, uint64_t flags) >> > +{ >> > + bool result =3D false; >> > + VhostUserMsg msg_reply; >> > + VhostUserMsg vmsg =3D { >> > + .request =3D VHOST_USER_BACKEND_SHMEM_MAP, >> > + .size =3D sizeof(vmsg.payload.mmap), >> > + .flags =3D VHOST_USER_VERSION, >> > + .payload.mmap =3D { >> > + .shmid =3D shmid, >> > + .fd_offset =3D fd_offset, >> > + .shm_offset =3D shm_offset, >> > + .len =3D len, >> > + .flags =3D flags, >> > + }, >> > + }; >> > + >> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)= ) >> { >> > + vmsg.flags |=3D VHOST_USER_NEED_REPLY_MASK; >> > + } >> > + >> > + pthread_mutex_lock(&dev->backend_mutex); >> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) { >> > + pthread_mutex_unlock(&dev->backend_mutex); >> > + return false; >> > + } >> > + >> > + /* Also unlocks the backend_mutex */ >> > + return vu_process_message_reply(dev, &vmsg); >> > +} >> > + >> > +bool >> > +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset, >> > + uint64_t shm_offset, uint64_t len) >> > +{ >> > + bool result =3D false; >> > + VhostUserMsg msg_reply; >> > + VhostUserMsg vmsg =3D { >> > + .request =3D VHOST_USER_BACKEND_SHMEM_UNMAP, >> > + .size =3D sizeof(vmsg.payload.mmap), >> > + .flags =3D VHOST_USER_VERSION, >> > + .payload.mmap =3D { >> > + .shmid =3D shmid, >> > + .fd_offset =3D fd_offset, >> >> What is the meaning of this field? I expected it to be set to 0. >> >> > + .shm_offset =3D shm_offset, >> > + .len =3D len, >> > + }, >> > + }; >> > + >> > + if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_REPLY_ACK)= ) >> { >> > + vmsg.flags |=3D VHOST_USER_NEED_REPLY_MASK; >> > + } >> > + >> > + pthread_mutex_lock(&dev->backend_mutex); >> > + if (!vu_message_write(dev, dev->backend_fd, &vmsg)) { >> > + pthread_mutex_unlock(&dev->backend_mutex); >> > + return false; >> > + } >> > + >> > + /* Also unlocks the backend_mutex */ >> > + return vu_process_message_reply(dev, &vmsg); >> > +} >> > + >> > static bool >> > vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg) >> > { >> > diff --git a/subprojects/libvhost-user/libvhost-user.h >> b/subprojects/libvhost-user/libvhost-user.h >> > index deb40e77b3..7f6c22cc1a 100644 >> > --- a/subprojects/libvhost-user/libvhost-user.h >> > +++ b/subprojects/libvhost-user/libvhost-user.h >> > @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest { >> > VHOST_USER_BACKEND_SHARED_OBJECT_ADD =3D 6, >> > VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE =3D 7, >> > VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP =3D 8, >> > + VHOST_USER_BACKEND_SHMEM_MAP =3D 9, >> > + VHOST_USER_BACKEND_SHMEM_UNMAP =3D 10, >> > VHOST_USER_BACKEND_MAX >> > } VhostUserBackendRequest; >> > >> > @@ -186,6 +188,24 @@ typedef struct VhostUserShared { >> > unsigned char uuid[UUID_LEN]; >> > } VhostUserShared; >> > >> > +/* For the flags field of VhostUserMMap */ >> > +#define VHOST_USER_FLAG_MAP_R (1u << 0) >> > +#define VHOST_USER_FLAG_MAP_W (1u << 1) >> > + >> > +typedef struct { >> > + /* VIRTIO Shared Memory Region ID */ >> > + uint8_t shmid; >> > + uint8_t padding[7]; >> > + /* File offset */ >> > + uint64_t fd_offset; >> > + /* Offset within the VIRTIO Shared Memory Region */ >> > + uint64_t shm_offset; >> > + /* Size of the mapping */ >> > + uint64_t len; >> > + /* Flags for the mmap operation, from VHOST_USER_FLAG_* */ >> > + uint64_t flags; >> > +} VhostUserMMap; >> > + >> > #if defined(_WIN32) && (defined(__x86_64__) || defined(__i386__)) >> > # define VU_PACKED __attribute__((gcc_struct, packed)) >> > #else >> > @@ -214,6 +234,7 @@ typedef struct VhostUserMsg { >> > VhostUserVringArea area; >> > VhostUserInflight inflight; >> > VhostUserShared object; >> > + VhostUserMMap mmap; >> > } payload; >> > >> > int fds[VHOST_MEMORY_BASELINE_NREGIONS]; >> > @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned >> char uuid[UUID_LEN]); >> > */ >> > bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN]); >> > >> > +/** >> > + * vu_shmem_map: >> > + * @dev: a VuDev context >> > + * @shmid: VIRTIO Shared Memory Region ID >> > + * @fd_offset: File offset >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region >> > + * @len: Size of the mapping >> > + * @flags: Flags for the mmap operation >> > + * >> > + * Advertises a new mapping to be made in a given VIRTIO Shared Memor= y >> Region. >> > + * >> > + * Returns: TRUE on success, FALSE on failure. >> > + */ >> > +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset, >> > + uint64_t shm_offset, uint64_t len, uint64_t flags); >> > + >> > +/** >> > + * vu_shmem_map: >> > + * @dev: a VuDev context >> > + * @shmid: VIRTIO Shared Memory Region ID >> > + * @fd_offset: File offset >> > + * @shm_offset: Offset within the VIRTIO Shared Memory Region >> > + * @len: Size of the mapping >> > + * >> > + * The front-end un-mmaps a given range in the VIRTIO Shared Memory >> Region >> > + * with the requested `shmid`. >> > + * >> > + * Returns: TRUE on success, FALSE on failure. >> > + */ >> > +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset, >> > + uint64_t shm_offset, uint64_t len); >> > + >> > /** >> > * vu_queue_set_notification: >> > * @dev: a VuDev context >> > -- >> > 2.45.2 >> > >> > --000000000000f17786062135b767 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Sep 3, 2024 at 11:54=E2=80=AF= AM Albert Esteve <aesteve@redhat.c= om> wrote:


On Thu, Jul 11, 2024 at 9:4= 5=E2=80=AFAM Stefan Hajnoczi <stefanha@redhat.com> wrote:
On Fri, Jun 28, 2024 at 04:57:06PM +0200,= Albert Esteve wrote:
> Add SHMEM_MAP/UNMAP requests to vhost-user to
> handle VIRTIO Shared Memory mappings.
>
> This request allows backends to dynamically map
> fds into a VIRTIO Shared Memory Region indentified
> by its `shmid`. Then, the fd memory is advertised
> to the driver as a base addres + offset, so it
> can be read/written (depending on the mmap flags
> requested) while its valid.
>
> The backend can munmap the memory range
> in a given VIRTIO Shared Memory Region (again,
> identified by its `shmid`), to free it. Upon
> receiving this message, the front-end must
> mmap the regions with PROT_NONE to reserve
> the virtual memory space.
>
> The device model needs to create MemoryRegion
> instances for the VIRTIO Shared Memory Regions
> and add them to the `VirtIODevice` instance.
>
> Signed-off-by: Albert Esteve <aesteve@redhat.com>
> ---
>=C2=A0 docs/interop/vhost-user.rst=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0|=C2=A0 27 +++++
>=C2=A0 hw/virtio/vhost-user.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 122 ++++++++++++++++++++++
>=C2=A0 hw/virtio/virtio.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 12 +++
>=C2=A0 include/hw/virtio/virtio.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A05 +
>=C2=A0 subprojects/libvhost-user/libvhost-user.c |=C2=A0 65 +++++++++++= +
>=C2=A0 subprojects/libvhost-user/libvhost-user.h |=C2=A0 53 ++++++++++<= br> >=C2=A0 6 files changed, 284 insertions(+)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst=
> index d8419fd2f1..d52ba719d5 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1859,6 +1859,33 @@ is sent by the front-end.
>=C2=A0 =C2=A0 when the operation is successful, or non-zero otherwise. = Note that if the
>=C2=A0 =C2=A0 operation fails, no fd is sent to the backend.
>=C2=A0
> +``VHOST_USER_BACKEND_SHMEM_MAP``
> +=C2=A0 :id: 9
> +=C2=A0 :equivalent ioctl: N/A
> +=C2=A0 :request payload: fd and ``struct VhostUserMMap``
> +=C2=A0 :reply payload: N/A
> +
> +=C2=A0 This message can be submitted by the backends to advertise a n= ew mapping
> +=C2=A0 to be made in a given VIRTIO Shared Memory Region. Upon receiv= ing the message,
> +=C2=A0 The front-end will mmap the given fd into the VIRTIO Shared Me= mory Region
> +=C2=A0 with the requested ``shmid``. A reply is generated indicating = whether mapping
> +=C2=A0 succeeded.
> +
> +=C2=A0 Mapping over an already existing map is not allowed and reques= t shall fail.
> +=C2=A0 Therefore, the memory range in the request must correspond wit= h a valid,
> +=C2=A0 free region of the VIRTIO Shared Memory Region.
> +
> +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> +=C2=A0 :id: 10
> +=C2=A0 :equivalent ioctl: N/A
> +=C2=A0 :request payload: ``struct VhostUserMMap``
> +=C2=A0 :reply payload: N/A
> +
> +=C2=A0 This message can be submitted by the backends so that the fron= t-end un-mmap
> +=C2=A0 a given range (``offset``, ``len``) in the VIRTIO Shared Memor= y Region with

s/offset/shm_offset/

> +=C2=A0 the requested ``shmid``.

Please clarify that <offset, len> must correspond to the entirety of = a
valid mapped region.

By the way, the VIRTIO 1.3 gives the following behavior for the virtiofs DAX Window:

=C2=A0 When a FUSE_SETUPMAPPING request perfectly overlaps a previous
=C2=A0 mapping, the previous mapping is replaced. When a mapping partially<= br> =C2=A0 overlaps a previous mapping, the previous mapping is split into one = or
=C2=A0 two smaller mappings. When a mapping is partially unmapped it is als= o
=C2=A0 split into one or two smaller mappings.

=C2=A0 Establishing new mappings or splitting existing mappings consumes =C2=A0 resources. If the device runs out of resources the FUSE_SETUPMAPPING=
=C2=A0 request fails until resources are available again following
=C2=A0 FUSE_REMOVEMAPPING.

I think SETUPMAPPING/REMOVMAPPING can be implemented using
SHMEM_MAP/UNMAP. SHMEM_MAP/UNMAP do not allow atomically replacing
partial ranges, but as far as I know that's not necessary for virtiofs<= br> in practice.

It's worth mentioning that mappings consume resources and that SHMEM_MA= P
can fail when there are no resources available. The process-wide limit
is vm.max_map_count on Linux although a vhost-user frontend may reduce
it further to control vhost-user resource usage.

> +=C2=A0 A reply is generated indicating whether unmapping succeeded. > +
>=C2=A0 .. _reply_ack:
>=C2=A0
>=C2=A0 VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..7ee8a472c6 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -115,6 +115,8 @@ typedef enum VhostUserBackendRequest {
>=C2=A0 =C2=A0 =C2=A0 VHOST_USER_BACKEND_SHARED_OBJECT_ADD =3D 6,
>=C2=A0 =C2=A0 =C2=A0 VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE =3D 7,
>=C2=A0 =C2=A0 =C2=A0 VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP =3D 8,
> +=C2=A0 =C2=A0 VHOST_USER_BACKEND_SHMEM_MAP =3D 9,
> +=C2=A0 =C2=A0 VHOST_USER_BACKEND_SHMEM_UNMAP =3D 10,
>=C2=A0 =C2=A0 =C2=A0 VHOST_USER_BACKEND_MAX
>=C2=A0 }=C2=A0 VhostUserBackendRequest;
>=C2=A0
> @@ -192,6 +194,24 @@ typedef struct VhostUserShared {
>=C2=A0 =C2=A0 =C2=A0 unsigned char uuid[16];
>=C2=A0 } VhostUserShared;
>=C2=A0
> +/* For the flags field of VhostUserMMap */
> +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> +
> +typedef struct {
> +=C2=A0 =C2=A0 /* VIRTIO Shared Memory Region ID */
> +=C2=A0 =C2=A0 uint8_t shmid;
> +=C2=A0 =C2=A0 uint8_t padding[7];
> +=C2=A0 =C2=A0 /* File offset */
> +=C2=A0 =C2=A0 uint64_t fd_offset;
> +=C2=A0 =C2=A0 /* Offset within the VIRTIO Shared Memory Region */
> +=C2=A0 =C2=A0 uint64_t shm_offset;
> +=C2=A0 =C2=A0 /* Size of the mapping */
> +=C2=A0 =C2=A0 uint64_t len;
> +=C2=A0 =C2=A0 /* Flags for the mmap operation, from VHOST_USER_FLAG_*= */
> +=C2=A0 =C2=A0 uint64_t flags;
> +} VhostUserMMap;
> +
>=C2=A0 typedef struct {
>=C2=A0 =C2=A0 =C2=A0 VhostUserRequest request;
>=C2=A0
> @@ -224,6 +244,7 @@ typedef union {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserInflight inflight;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserShared object;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserTransferDeviceState transfe= r_state;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserMMap mmap;
>=C2=A0 } VhostUserPayload;
>=C2=A0
>=C2=A0 typedef struct VhostUserMsg {
> @@ -1748,6 +1769,100 @@ vhost_user_backend_handle_shared_object_lookup= (struct vhost_user *u,
>=C2=A0 =C2=A0 =C2=A0 return 0;
>=C2=A0 }
>=C2=A0
> +static int
> +vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserMMap *vu_= mmap,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 int fd)
> +{
> +=C2=A0 =C2=A0 void *addr =3D 0;
> +=C2=A0 =C2=A0 MemoryRegion *mr =3D NULL;
> +
> +=C2=A0 =C2=A0 if (fd < 0) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("Bad fd for map");=
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return -EBADF;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 if (!dev->vdev->shmem_list ||
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 dev->vdev->n_shmem_regions <=3D = vu_mmap->shmid) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("Device only has %d VIR= TIO Shared Memory Regions. "
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0"Requested ID: %d",
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0dev->vdev->n_shmem_regions, vu_mmap->shmid);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return -EFAULT;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 mr =3D &dev->vdev->shmem_list[vu_mmap->shm= id];
> +
> +=C2=A0 =C2=A0 if (!mr) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("VIRTIO Shared Memory R= egion at "
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0"ID %d unitialized", vu_mmap->shmid);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return -EFAULT;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 if ((vu_mmap->shm_offset + vu_mmap->len) < vu_= mmap->len ||
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 (vu_mmap->shm_offset + vu_mmap->len= ) > mr->size) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("Bad offset/len for mma= p %" PRIx64 "+%" PRIx64,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0vu_mmap->shm_offset, vu_mmap->len);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return -EFAULT;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 void *shmem_ptr =3D memory_region_get_ram_ptr(mr);
> +
> +=C2=A0 =C2=A0 addr =3D mmap(shmem_ptr + vu_mmap->shm_offset, vu_mm= ap->len,

Missing check for overlap between range [shm_offset, shm_offset + len)
and existing mappings.

Not sure how to = do this check. Specifically, I am not sure how previous
ranges ar= e stored within the MemoryRegion. Is looping through mr->subregions
a valid option?

May= be something like this would do?
```
=C2=A0 =C2=A0 =C2=A0if (m= emory_region_find(mr, vu_mmap->shm_offset, vu_mmap->len).mr) {
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("Requested memory (%" PR= Ix64 "+%" PRIx64 " overalps "
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"with previousl= y mapped memory",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0vu_mmap->shm_offset, vu_mmap->len);
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 return -EFAULT;
=C2=A0 =C2=A0 }
```
=C2=A0

> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 ((vu_mmap->flags & VHOST_USER_FLAG= _MAP_R) ? PROT_READ : 0) |
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 ((vu_mmap->flags & VHOST_USER_FLAG= _MAP_W) ? PROT_WRITE : 0),
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 MAP_SHARED | MAP_FIXED, fd, vu_mmap->f= d_offset);
> +
> +=C2=A0 =C2=A0 if (addr =3D=3D MAP_FAILED) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("Failed to mmap mem fd&= quot;);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return -EFAULT;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 return 0;
> +}
> +
> +static int
> +vhost_user_backend_handle_shmem_unmap(struct vhost_dev *dev,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserMM= ap *vu_mmap)
> +{
> +=C2=A0 =C2=A0 void *addr =3D 0;
> +=C2=A0 =C2=A0 MemoryRegion *mr =3D NULL;
> +
> +=C2=A0 =C2=A0 if (!dev->vdev->shmem_list ||
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 dev->vdev->n_shmem_regions <=3D = vu_mmap->shmid) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("Device only has %d VIR= TIO Shared Memory Regions. "
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0"Requested ID: %d",
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0dev->vdev->n_shmem_regions, vu_mmap->shmid);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return -EFAULT;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 mr =3D &dev->vdev->shmem_list[vu_mmap->shm= id];
> +
> +=C2=A0 =C2=A0 if (!mr) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("VIRTIO Shared Memory R= egion at "
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0"ID %d unitialized", vu_mmap->shmid);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return -EFAULT;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 if ((vu_mmap->shm_offset + vu_mmap->len) < vu_= mmap->len ||
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 (vu_mmap->shm_offset + vu_mmap->len= ) > mr->size) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("Bad offset/len for mma= p %" PRIx64 "+%" PRIx64,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0vu_mmap->shm_offset, vu_mmap->len);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return -EFAULT;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 void *shmem_ptr =3D memory_region_get_ram_ptr(mr);
> +
> +=C2=A0 =C2=A0 addr =3D mmap(shmem_ptr + vu_mmap->shm_offset, vu_mm= ap->len,

Missing check for existing mapping with exact range [shm_offset, len)
match.

> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PROT_NONE, MA= P_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
> +
> +=C2=A0 =C2=A0 if (addr =3D=3D MAP_FAILED) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("Failed to unmap memory= ");
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return -EFAULT;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 return 0;
> +}
> +
>=C2=A0 static void close_backend_channel(struct vhost_user *u)
>=C2=A0 {
>=C2=A0 =C2=A0 =C2=A0 g_source_destroy(u->backend_src);
> @@ -1816,6 +1931,13 @@ static gboolean backend_read(QIOChannel *ioc, G= IOCondition condition,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D vhost_user_backend_handle_sh= ared_object_lookup(dev->opaque, ioc,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0&hdr, &payload);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break;
> +=C2=A0 =C2=A0 case VHOST_USER_BACKEND_SHMEM_MAP:
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D vhost_user_backend_handle_shmem_m= ap(dev, &payload.mmap,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 fd ? fd[0] : -1);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 break;
> +=C2=A0 =C2=A0 case VHOST_USER_BACKEND_SHMEM_UNMAP:
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D vhost_user_backend_handle_shmem_u= nmap(dev, &payload.mmap);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 break;
>=C2=A0 =C2=A0 =C2=A0 default:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 error_report("Received unexpect= ed msg type: %d.", hdr.request);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D -EINVAL;
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..9f2da5b11e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2856,6 +2856,16 @@ int virtio_save(VirtIODevice *vdev, QEMUFile *f= )
>=C2=A0 =C2=A0 =C2=A0 return vmstate_save_state(f, &vmstate_virtio, = vdev, NULL);
>=C2=A0 }
>=C2=A0
> +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev)
> +{
> +=C2=A0 =C2=A0 MemoryRegion *mr =3D g_new0(MemoryRegion, 1);
> +=C2=A0 =C2=A0 ++vdev->n_shmem_regions;
> +=C2=A0 =C2=A0 vdev->shmem_list =3D g_renew(MemoryRegion, vdev->= shmem_list,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vdev->n_shmem_regions);

Where is shmem_list freed?

The name "list" is misleading since this is an array, not a list.=

> +=C2=A0 =C2=A0 vdev->shmem_list[vdev->n_shmem_regions - 1] =3D *= mr;
> +=C2=A0 =C2=A0 return mr;
> +}

This looks weird. The contents of mr are copied into shmem_list[] and
then the pointer to mr is returned? Did you mean for the field's type t= o
be MemoryRegion **shmem_list and then vdev->shmem_list[...] =3D mr would=
stash the pointer?

> +
>=C2=A0 /* A wrapper for use as a VMState .put function */
>=C2=A0 static int virtio_device_put(QEMUFile *f, void *opaque, size_t s= ize,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 const VMStateField *field, JSONWr= iter *vmdesc)
> @@ -3264,6 +3274,8 @@ void virtio_init(VirtIODevice *vdev, uint16_t de= vice_id, size_t config_size)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 virtio_vmstate_change,= vdev);
>=C2=A0 =C2=A0 =C2=A0 vdev->device_endian =3D virtio_default_endian()= ;
>=C2=A0 =C2=A0 =C2=A0 vdev->use_guest_notifier_mask =3D true;
> +=C2=A0 =C2=A0 vdev->shmem_list =3D NULL;
> +=C2=A0 =C2=A0 vdev->n_shmem_regions =3D 0;
>=C2=A0 }
>=C2=A0
>=C2=A0 /*
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index 7d5ffdc145..16d598aadc 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -165,6 +165,9 @@ struct VirtIODevice
>=C2=A0 =C2=A0 =C2=A0 =C2=A0*/
>=C2=A0 =C2=A0 =C2=A0 EventNotifier config_notifier;
>=C2=A0 =C2=A0 =C2=A0 bool device_iotlb_enabled;
> +=C2=A0 =C2=A0 /* Shared memory region for vhost-user mappings. */
> +=C2=A0 =C2=A0 MemoryRegion *shmem_list;
> +=C2=A0 =C2=A0 int n_shmem_regions;
>=C2=A0 };
>=C2=A0
>=C2=A0 struct VirtioDeviceClass {
> @@ -280,6 +283,8 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *= vq);
>=C2=A0
>=C2=A0 int virtio_save(VirtIODevice *vdev, QEMUFile *f);
>=C2=A0
> +MemoryRegion *virtio_new_shmem_region(VirtIODevice *vdev);
> +
>=C2=A0 extern const VMStateInfo virtio_vmstate_info;
>=C2=A0
>=C2=A0 #define VMSTATE_VIRTIO_DEVICE \
> diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/l= ibvhost-user/libvhost-user.c
> index a879149fef..28556d183a 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -1586,6 +1586,71 @@ vu_rm_shared_object(VuDev *dev, unsigned char u= uid[UUID_LEN])
>=C2=A0 =C2=A0 =C2=A0 return vu_send_message(dev, &msg);
>=C2=A0 }
>=C2=A0
> +bool
> +vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uint64_t shm_offset, = uint64_t len, uint64_t flags)
> +{
> +=C2=A0 =C2=A0 bool result =3D false;
> +=C2=A0 =C2=A0 VhostUserMsg msg_reply;
> +=C2=A0 =C2=A0 VhostUserMsg vmsg =3D {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .request =3D VHOST_USER_BACKEND_SHMEM_MAP= ,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .size =3D sizeof(vmsg.payload.mmap),
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .flags =3D VHOST_USER_VERSION,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .payload.mmap =3D {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 .shmid =3D shmid,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 .fd_offset =3D fd_offset, > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 .shm_offset =3D shm_offset,=
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 .len =3D len,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 .flags =3D flags,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 },
> +=C2=A0 =C2=A0 };
> +
> +=C2=A0 =C2=A0 if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_= REPLY_ACK)) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 vmsg.flags |=3D VHOST_USER_NEED_REPLY_MAS= K;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 pthread_mutex_lock(&dev->backend_mutex);
> +=C2=A0 =C2=A0 if (!vu_message_write(dev, dev->backend_fd, &vms= g)) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 pthread_mutex_unlock(&dev->backend= _mutex);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return false;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 /* Also unlocks the backend_mutex */
> +=C2=A0 =C2=A0 return vu_process_message_reply(dev, &vmsg);
> +}
> +
> +bool
> +vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uint64_t shm_o= ffset, uint64_t len)
> +{
> +=C2=A0 =C2=A0 bool result =3D false;
> +=C2=A0 =C2=A0 VhostUserMsg msg_reply;
> +=C2=A0 =C2=A0 VhostUserMsg vmsg =3D {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .request =3D VHOST_USER_BACKEND_SHMEM_UNM= AP,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .size =3D sizeof(vmsg.payload.mmap),
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .flags =3D VHOST_USER_VERSION,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 .payload.mmap =3D {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 .shmid =3D shmid,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 .fd_offset =3D fd_offset,
What is the meaning of this field? I expected it to be set to 0.

> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 .shm_offset =3D shm_offset,=
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 .len =3D len,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 },
> +=C2=A0 =C2=A0 };
> +
> +=C2=A0 =C2=A0 if (vu_has_protocol_feature(dev, VHOST_USER_PROTOCOL_F_= REPLY_ACK)) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 vmsg.flags |=3D VHOST_USER_NEED_REPLY_MAS= K;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 pthread_mutex_lock(&dev->backend_mutex);
> +=C2=A0 =C2=A0 if (!vu_message_write(dev, dev->backend_fd, &vms= g)) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 pthread_mutex_unlock(&dev->backend= _mutex);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return false;
> +=C2=A0 =C2=A0 }
> +
> +=C2=A0 =C2=A0 /* Also unlocks the backend_mutex */
> +=C2=A0 =C2=A0 return vu_process_message_reply(dev, &vmsg);
> +}
> +
>=C2=A0 static bool
>=C2=A0 vu_set_vring_call_exec(VuDev *dev, VhostUserMsg *vmsg)
>=C2=A0 {
> diff --git a/subprojects/libvhost-user/libvhost-user.h b/subprojects/l= ibvhost-user/libvhost-user.h
> index deb40e77b3..7f6c22cc1a 100644
> --- a/subprojects/libvhost-user/libvhost-user.h
> +++ b/subprojects/libvhost-user/libvhost-user.h
> @@ -127,6 +127,8 @@ typedef enum VhostUserBackendRequest {
>=C2=A0 =C2=A0 =C2=A0 VHOST_USER_BACKEND_SHARED_OBJECT_ADD =3D 6,
>=C2=A0 =C2=A0 =C2=A0 VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE =3D 7,
>=C2=A0 =C2=A0 =C2=A0 VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP =3D 8,
> +=C2=A0 =C2=A0 VHOST_USER_BACKEND_SHMEM_MAP =3D 9,
> +=C2=A0 =C2=A0 VHOST_USER_BACKEND_SHMEM_UNMAP =3D 10,
>=C2=A0 =C2=A0 =C2=A0 VHOST_USER_BACKEND_MAX
>=C2=A0 }=C2=A0 VhostUserBackendRequest;
>=C2=A0
> @@ -186,6 +188,24 @@ typedef struct VhostUserShared {
>=C2=A0 =C2=A0 =C2=A0 unsigned char uuid[UUID_LEN];
>=C2=A0 } VhostUserShared;
>=C2=A0
> +/* For the flags field of VhostUserMMap */
> +#define VHOST_USER_FLAG_MAP_R (1u << 0)
> +#define VHOST_USER_FLAG_MAP_W (1u << 1)
> +
> +typedef struct {
> +=C2=A0 =C2=A0 /* VIRTIO Shared Memory Region ID */
> +=C2=A0 =C2=A0 uint8_t shmid;
> +=C2=A0 =C2=A0 uint8_t padding[7];
> +=C2=A0 =C2=A0 /* File offset */
> +=C2=A0 =C2=A0 uint64_t fd_offset;
> +=C2=A0 =C2=A0 /* Offset within the VIRTIO Shared Memory Region */
> +=C2=A0 =C2=A0 uint64_t shm_offset;
> +=C2=A0 =C2=A0 /* Size of the mapping */
> +=C2=A0 =C2=A0 uint64_t len;
> +=C2=A0 =C2=A0 /* Flags for the mmap operation, from VHOST_USER_FLAG_*= */
> +=C2=A0 =C2=A0 uint64_t flags;
> +} VhostUserMMap;
> +
>=C2=A0 #if defined(_WIN32) && (defined(__x86_64__) || defined(_= _i386__))
>=C2=A0 # define VU_PACKED __attribute__((gcc_struct, packed))
>=C2=A0 #else
> @@ -214,6 +234,7 @@ typedef struct VhostUserMsg {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserVringArea area;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserInflight inflight;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserShared object;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 VhostUserMMap mmap;
>=C2=A0 =C2=A0 =C2=A0 } payload;
>=C2=A0
>=C2=A0 =C2=A0 =C2=A0 int fds[VHOST_MEMORY_BASELINE_NREGIONS];
> @@ -597,6 +618,38 @@ bool vu_add_shared_object(VuDev *dev, unsigned ch= ar uuid[UUID_LEN]);
>=C2=A0 =C2=A0*/
>=C2=A0 bool vu_rm_shared_object(VuDev *dev, unsigned char uuid[UUID_LEN= ]);
>=C2=A0
> +/**
> + * vu_shmem_map:
> + * @dev: a VuDev context
> + * @shmid: VIRTIO Shared Memory Region ID
> + * @fd_offset: File offset
> + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> + * @len: Size of the mapping
> + * @flags: Flags for the mmap operation
> + *
> + * Advertises a new mapping to be made in a given VIRTIO Shared Memor= y Region.
> + *
> + * Returns: TRUE on success, FALSE on failure.
> + */
> +bool vu_shmem_map(VuDev *dev, uint8_t shmid, uint64_t fd_offset,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 uint64= _t shm_offset, uint64_t len, uint64_t flags);
> +
> +/**
> + * vu_shmem_map:
> + * @dev: a VuDev context
> + * @shmid: VIRTIO Shared Memory Region ID
> + * @fd_offset: File offset
> + * @shm_offset: Offset within the VIRTIO Shared Memory Region
> + * @len: Size of the mapping
> + *
> + * The front-end un-mmaps a given range in the VIRTIO Shared Memory R= egion
> + * with the requested `shmid`.
> + *
> + * Returns: TRUE on success, FALSE on failure.
> + */
> +bool vu_shmem_unmap(VuDev *dev, uint8_t shmid, uint64_t fd_offset, > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 uint64= _t shm_offset, uint64_t len);
> +
>=C2=A0 /**
>=C2=A0 =C2=A0* vu_queue_set_notification:
>=C2=A0 =C2=A0* @dev: a VuDev context
> --
> 2.45.2
>
--000000000000f17786062135b767--