qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Discrepancy between mmap call on DPDK/libvduse and rust vm-memory crate
@ 2024-04-12 10:15 Eugenio Perez Martin
  2024-04-14  9:01 ` Michael S. Tsirkin
  0 siblings, 1 reply; 5+ messages in thread
From: Eugenio Perez Martin @ 2024-04-12 10:15 UTC (permalink / raw)
  To: Jason Wang, Michael Tsirkin, Maxime Coquelin, qemu-devel,
	German Maglione, Hanna Czenczek

Hi!

I'm building a bridge to expose vhost-user devices through VDUSE. The
code is still immature but I'm able to forward packets using
dpdk-l2fwd through VDUSE to VM. I'm now developing exposing virtiofsd,
but I've hit an error I'd like to discuss.

VDUSE devices can get all the memory regions the driver is using by
VDUSE_IOTLB_GET_FD ioctl. It returns a file descriptor with a memory
region associated that can be mapped with mmap, and an information
entry about the map it contains:
* Start and end addresses from the driver POV
* Offset within the mmaped region of these start and end
* Device permissions over that region.

[start=0xc3000][last=0xe7fff][offset=0xc3000][perm=1]

Now when I try to map it, it is impossible for the userspace device to
call mmap with any offset different than 0. So the "straightforward"
mmap with size = entry.last-entry.start and offset = entry.offset does
not work. I don't know if this is a limitation of Linux or VDUSE.

Checking QEMU's
subprojects/libvduse/libvduse.c:vduse_iova_add_region() I see it
handles the offset by adding it up to the size, instead of using it
directly as a parameter in the mmap:

void *mmap_addr = mmap(0, size + offset, prot, MAP_SHARED, fd, 0);

I can replicate it on the bridge for sure.

Now I send the VhostUserMemoryRegion to the vhost-user application.
The struct has these members:
struct VhostUserMemoryRegion {
    uint64_t guest_phys_addr;
    uint64_t memory_size;
    uint64_t userspace_addr;
    uint64_t mmap_offset;
};

So I can send the offset to the vhost-user device. I can check that
dpdk-l2fwd uses the same trick of adding offset to the size of the
mapping region [1], at
lib/vhost/vhost_user.c:vhost_user_mmap_region():

mmap_size = region->size + mmap_offset;
mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
            MAP_SHARED | populate, region->fd, 0);

So mmap is called with offset == 0 and everybody is happy.

Now I'm moving to virtiofsd, and vm-memory crate in particular. And it
performs the mmap without the size += offset trick, at
MmapRegionBuilder<B>:build() [2].

I can try to apply the offset + size trick in my bridge but I don't
think it is the right solution. At first glance, the right solution is
to mmap with the offset as vm-memory crate do. But having libvduse and
DPDK apply the same trick sounds to me like it is a known limitation /
workaround I don't know about. What is the history of this? Can VDUSE
problem (if any) be solved? Am I missing something?

Thanks!

[1] https://github.com/DPDK/dpdk/blob/e2e546ab5bf5e024986ccb5310ab43982f3bb40c/lib/vhost/vhost_user.c#L1305
[2] https://github.com/rust-vmm/vm-memory/blob/main/src/mmap_unix.rs#L128



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Discrepancy between mmap call on DPDK/libvduse and rust vm-memory crate
  2024-04-12 10:15 Discrepancy between mmap call on DPDK/libvduse and rust vm-memory crate Eugenio Perez Martin
@ 2024-04-14  9:01 ` Michael S. Tsirkin
  2024-04-15  7:28   ` Yongji Xie
  2024-04-15 10:51   ` Eugenio Perez Martin
  0 siblings, 2 replies; 5+ messages in thread
From: Michael S. Tsirkin @ 2024-04-14  9:01 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Jason Wang, Maxime Coquelin, qemu-devel, German Maglione,
	Hanna Czenczek, Xie Yongji

On Fri, Apr 12, 2024 at 12:15:40PM +0200, Eugenio Perez Martin wrote:
> Hi!
> 
> I'm building a bridge to expose vhost-user devices through VDUSE. The
> code is still immature but I'm able to forward packets using
> dpdk-l2fwd through VDUSE to VM. I'm now developing exposing virtiofsd,
> but I've hit an error I'd like to discuss.
> 
> VDUSE devices can get all the memory regions the driver is using by
> VDUSE_IOTLB_GET_FD ioctl. It returns a file descriptor with a memory
> region associated that can be mapped with mmap, and an information
> entry about the map it contains:
> * Start and end addresses from the driver POV
> * Offset within the mmaped region of these start and end
> * Device permissions over that region.
> 
> [start=0xc3000][last=0xe7fff][offset=0xc3000][perm=1]
> 
> Now when I try to map it, it is impossible for the userspace device to
> call mmap with any offset different than 0.

How exactly did you allocate memory? hugetlbfs?

> So the "straightforward"
> mmap with size = entry.last-entry.start and offset = entry.offset does
> not work. I don't know if this is a limitation of Linux or VDUSE.
> 
> Checking QEMU's
> subprojects/libvduse/libvduse.c:vduse_iova_add_region() I see it
> handles the offset by adding it up to the size, instead of using it
> directly as a parameter in the mmap:
> 
> void *mmap_addr = mmap(0, size + offset, prot, MAP_SHARED, fd, 0);


CC Xie Yongji who wrote this code, too.


> I can replicate it on the bridge for sure.
> 
> Now I send the VhostUserMemoryRegion to the vhost-user application.
> The struct has these members:
> struct VhostUserMemoryRegion {
>     uint64_t guest_phys_addr;
>     uint64_t memory_size;
>     uint64_t userspace_addr;
>     uint64_t mmap_offset;
> };
> 
> So I can send the offset to the vhost-user device. I can check that
> dpdk-l2fwd uses the same trick of adding offset to the size of the
> mapping region [1], at
> lib/vhost/vhost_user.c:vhost_user_mmap_region():
> 
> mmap_size = region->size + mmap_offset;
> mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
>             MAP_SHARED | populate, region->fd, 0);
> 
> So mmap is called with offset == 0 and everybody is happy.
> 
> Now I'm moving to virtiofsd, and vm-memory crate in particular. And it
> performs the mmap without the size += offset trick, at
> MmapRegionBuilder<B>:build() [2].
> 
> I can try to apply the offset + size trick in my bridge but I don't
> think it is the right solution. At first glance, the right solution is
> to mmap with the offset as vm-memory crate do. But having libvduse and
> DPDK apply the same trick sounds to me like it is a known limitation /
> workaround I don't know about. What is the history of this? Can VDUSE
> problem (if any) be solved? Am I missing something?
> 
> Thanks!
> 
> [1] https://github.com/DPDK/dpdk/blob/e2e546ab5bf5e024986ccb5310ab43982f3bb40c/lib/vhost/vhost_user.c#L1305
> [2] https://github.com/rust-vmm/vm-memory/blob/main/src/mmap_unix.rs#L128



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Discrepancy between mmap call on DPDK/libvduse and rust vm-memory crate
  2024-04-14  9:01 ` Michael S. Tsirkin
@ 2024-04-15  7:28   ` Yongji Xie
  2024-04-15  8:45     ` Jason Wang
  2024-04-15 10:51   ` Eugenio Perez Martin
  1 sibling, 1 reply; 5+ messages in thread
From: Yongji Xie @ 2024-04-15  7:28 UTC (permalink / raw)
  To: Michael S. Tsirkin, Eugenio Perez Martin
  Cc: Jason Wang, Maxime Coquelin, qemu-devel, German Maglione,
	Hanna Czenczek

On Sun, Apr 14, 2024 at 5:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Apr 12, 2024 at 12:15:40PM +0200, Eugenio Perez Martin wrote:
> > Hi!
> >
> > I'm building a bridge to expose vhost-user devices through VDUSE. The
> > code is still immature but I'm able to forward packets using
> > dpdk-l2fwd through VDUSE to VM. I'm now developing exposing virtiofsd,
> > but I've hit an error I'd like to discuss.
> >
> > VDUSE devices can get all the memory regions the driver is using by
> > VDUSE_IOTLB_GET_FD ioctl. It returns a file descriptor with a memory
> > region associated that can be mapped with mmap, and an information
> > entry about the map it contains:
> > * Start and end addresses from the driver POV
> > * Offset within the mmaped region of these start and end
> > * Device permissions over that region.
> >
> > [start=0xc3000][last=0xe7fff][offset=0xc3000][perm=1]
> >
> > Now when I try to map it, it is impossible for the userspace device to
> > call mmap with any offset different than 0.
>
> How exactly did you allocate memory? hugetlbfs?
>
> > So the "straightforward"
> > mmap with size = entry.last-entry.start and offset = entry.offset does
> > not work. I don't know if this is a limitation of Linux or VDUSE.
> >
> > Checking QEMU's
> > subprojects/libvduse/libvduse.c:vduse_iova_add_region() I see it
> > handles the offset by adding it up to the size, instead of using it
> > directly as a parameter in the mmap:
> >
> > void *mmap_addr = mmap(0, size + offset, prot, MAP_SHARED, fd, 0);
>
>
> CC Xie Yongji who wrote this code, too.
>

The mmap() with hugetlb would fail if the offset into the file is not
aligned to the huge page size. So libvhost-user did something like
this. But I think VDUSE doesn't have this problem. So it's fine to
directly use the offset as a parameter in the mmap(2) here.

Thanks,
Yongji


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Discrepancy between mmap call on DPDK/libvduse and rust vm-memory crate
  2024-04-15  7:28   ` Yongji Xie
@ 2024-04-15  8:45     ` Jason Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Jason Wang @ 2024-04-15  8:45 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Michael S. Tsirkin, Eugenio Perez Martin, Maxime Coquelin,
	qemu-devel, German Maglione, Hanna Czenczek

On Mon, Apr 15, 2024 at 3:28 PM Yongji Xie <xieyongji@bytedance.com> wrote:
>
> On Sun, Apr 14, 2024 at 5:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Apr 12, 2024 at 12:15:40PM +0200, Eugenio Perez Martin wrote:
> > > Hi!
> > >
> > > I'm building a bridge to expose vhost-user devices through VDUSE. The
> > > code is still immature but I'm able to forward packets using
> > > dpdk-l2fwd through VDUSE to VM. I'm now developing exposing virtiofsd,
> > > but I've hit an error I'd like to discuss.
> > >
> > > VDUSE devices can get all the memory regions the driver is using by
> > > VDUSE_IOTLB_GET_FD ioctl. It returns a file descriptor with a memory
> > > region associated that can be mapped with mmap, and an information
> > > entry about the map it contains:
> > > * Start and end addresses from the driver POV
> > > * Offset within the mmaped region of these start and end
> > > * Device permissions over that region.
> > >
> > > [start=0xc3000][last=0xe7fff][offset=0xc3000][perm=1]
> > >
> > > Now when I try to map it, it is impossible for the userspace device to
> > > call mmap with any offset different than 0.
> >
> > How exactly did you allocate memory? hugetlbfs?
> >
> > > So the "straightforward"
> > > mmap with size = entry.last-entry.start and offset = entry.offset does
> > > not work. I don't know if this is a limitation of Linux or VDUSE.
> > >
> > > Checking QEMU's
> > > subprojects/libvduse/libvduse.c:vduse_iova_add_region() I see it
> > > handles the offset by adding it up to the size, instead of using it
> > > directly as a parameter in the mmap:
> > >
> > > void *mmap_addr = mmap(0, size + offset, prot, MAP_SHARED, fd, 0);
> >
> >
> > CC Xie Yongji who wrote this code, too.
> >
>
> The mmap() with hugetlb would fail if the offset into the file is not
> aligned to the huge page size. So libvhost-user did something like
> this. But I think VDUSE doesn't have this problem.

I think what you meant is that VDUSE IOTLB doesn't have this problem.

Btw, I think we need to understand the setup. E.g is this used for
containers (bounce pages) or VM (hugetlb or other).

Thanks

> So it's fine to
> directly use the offset as a parameter in the mmap(2) here.
>
> Thanks,
> Yongji
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Discrepancy between mmap call on DPDK/libvduse and rust vm-memory crate
  2024-04-14  9:01 ` Michael S. Tsirkin
  2024-04-15  7:28   ` Yongji Xie
@ 2024-04-15 10:51   ` Eugenio Perez Martin
  1 sibling, 0 replies; 5+ messages in thread
From: Eugenio Perez Martin @ 2024-04-15 10:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Maxime Coquelin, qemu-devel, German Maglione,
	Hanna Czenczek, Xie Yongji

On Sun, Apr 14, 2024 at 11:02 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Apr 12, 2024 at 12:15:40PM +0200, Eugenio Perez Martin wrote:
> > Hi!
> >
> > I'm building a bridge to expose vhost-user devices through VDUSE. The
> > code is still immature but I'm able to forward packets using
> > dpdk-l2fwd through VDUSE to VM. I'm now developing exposing virtiofsd,
> > but I've hit an error I'd like to discuss.
> >
> > VDUSE devices can get all the memory regions the driver is using by
> > VDUSE_IOTLB_GET_FD ioctl. It returns a file descriptor with a memory
> > region associated that can be mapped with mmap, and an information
> > entry about the map it contains:
> > * Start and end addresses from the driver POV
> > * Offset within the mmaped region of these start and end
> > * Device permissions over that region.
> >
> > [start=0xc3000][last=0xe7fff][offset=0xc3000][perm=1]
> >
> > Now when I try to map it, it is impossible for the userspace device to
> > call mmap with any offset different than 0.
>
> How exactly did you allocate memory? hugetlbfs?
>

Yes, that was definitely the cause, thank you very much!

> > So the "straightforward"
> > mmap with size = entry.last-entry.start and offset = entry.offset does
> > not work. I don't know if this is a limitation of Linux or VDUSE.
> >
> > Checking QEMU's
> > subprojects/libvduse/libvduse.c:vduse_iova_add_region() I see it
> > handles the offset by adding it up to the size, instead of using it
> > directly as a parameter in the mmap:
> >
> > void *mmap_addr = mmap(0, size + offset, prot, MAP_SHARED, fd, 0);
>
>
> CC Xie Yongji who wrote this code, too.
>

Thanks!

>
> > I can replicate it on the bridge for sure.
> >
> > Now I send the VhostUserMemoryRegion to the vhost-user application.
> > The struct has these members:
> > struct VhostUserMemoryRegion {
> >     uint64_t guest_phys_addr;
> >     uint64_t memory_size;
> >     uint64_t userspace_addr;
> >     uint64_t mmap_offset;
> > };
> >
> > So I can send the offset to the vhost-user device. I can check that
> > dpdk-l2fwd uses the same trick of adding offset to the size of the
> > mapping region [1], at
> > lib/vhost/vhost_user.c:vhost_user_mmap_region():
> >
> > mmap_size = region->size + mmap_offset;
> > mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
> >             MAP_SHARED | populate, region->fd, 0);
> >
> > So mmap is called with offset == 0 and everybody is happy.
> >
> > Now I'm moving to virtiofsd, and vm-memory crate in particular. And it
> > performs the mmap without the size += offset trick, at
> > MmapRegionBuilder<B>:build() [2].
> >
> > I can try to apply the offset + size trick in my bridge but I don't
> > think it is the right solution. At first glance, the right solution is
> > to mmap with the offset as vm-memory crate do. But having libvduse and
> > DPDK apply the same trick sounds to me like it is a known limitation /
> > workaround I don't know about. What is the history of this? Can VDUSE
> > problem (if any) be solved? Am I missing something?
> >
> > Thanks!
> >
> > [1] https://github.com/DPDK/dpdk/blob/e2e546ab5bf5e024986ccb5310ab43982f3bb40c/lib/vhost/vhost_user.c#L1305
> > [2] https://github.com/rust-vmm/vm-memory/blob/main/src/mmap_unix.rs#L128
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-15 10:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-12 10:15 Discrepancy between mmap call on DPDK/libvduse and rust vm-memory crate Eugenio Perez Martin
2024-04-14  9:01 ` Michael S. Tsirkin
2024-04-15  7:28   ` Yongji Xie
2024-04-15  8:45     ` Jason Wang
2024-04-15 10:51   ` Eugenio Perez Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).