* Re: [PATCH 6/6] vduse: Update api version to 1 [not found] ` <20220629082541.118-7-xieyongji@bytedance.com> @ 2022-06-29 8:33 ` Michael S. Tsirkin [not found] ` <CACycT3sZXwunA_UOCriSv=f2VARMnPb1mNU2GAUd9BLCU-Hg8w@mail.gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Michael S. Tsirkin @ 2022-06-29 8:33 UTC (permalink / raw) To: Xie Yongji Cc: xiaodong.liu, linux-kernel, virtualization, maxime.coquelin, stefanha On Wed, Jun 29, 2022 at 04:25:41PM +0800, Xie Yongji wrote: > Let's update api version to 1 since we introduced > some new ioctls to support registering userspace > memory for IOTLB. > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Adding new ioctls does not justify things like this. Besides, adding UAPI then changing it is not nice since it makes git bisect behave incorrectly. > --- > drivers/vdpa/vdpa_user/vduse_dev.c | 12 ++++++++++++ > include/uapi/linux/vduse.h | 8 +++++++- > 2 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > index 7b2ea7612da9..2795785ca6a2 100644 > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > @@ -1206,6 +1206,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > case VDUSE_IOTLB_GET_INFO: { > struct vduse_iotlb_info iotlb; > > + ret = -EPERM; Almost for sure a wrong error code. > + if (dev->api_version < 1) > + break; > + > iotlb.bounce_iova = 0; > iotlb.bounce_size = dev->domain->bounce_size; > Wait a second. so you are intentionally breaking any userspace that called VDUSE_SET_API_VERSION with version 0? Please don't. > @@ -1219,6 +1223,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > case VDUSE_IOTLB_REG_UMEM: { > struct vduse_iotlb_umem umem; > > + ret = -EPERM; > + if (dev->api_version < 1) > + break; > + > ret = -EFAULT; > if (copy_from_user(&umem, argp, sizeof(umem))) > break; > @@ -1230,6 +1238,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > case VDUSE_IOTLB_DEREG_UMEM: { > struct vduse_iotlb_umem umem; > > + ret = -EPERM; > + if (dev->api_version < 1) > + break; > + > ret = -EFAULT; > if (copy_from_user(&umem, argp, sizeof(umem))) > break; > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h > index 1b17391e228f..902ea19cd9e0 100644 > --- a/include/uapi/linux/vduse.h > +++ b/include/uapi/linux/vduse.h > @@ -8,7 +8,13 @@ > > /* The ioctls for control device (/dev/vduse/control) */ > > -#define VDUSE_API_VERSION 0 > +/* > + * v0 -> v1: > + * - Introduce VDUSE_IOTLB_GET_INFO ioctl > + * - Introduce VDUSE_VDUSE_IOTLB_REG_UMEM ioctl > + * - Introduce VDUSE_IOTLB_DEREG_UMEM ioctl > + */ > +#define VDUSE_API_VERSION 1 > > /* > * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION). > -- > 2.20.1 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CACycT3sZXwunA_UOCriSv=f2VARMnPb1mNU2GAUd9BLCU-Hg8w@mail.gmail.com>]
* Re: [PATCH 6/6] vduse: Update api version to 1 [not found] ` <CACycT3sZXwunA_UOCriSv=f2VARMnPb1mNU2GAUd9BLCU-Hg8w@mail.gmail.com> @ 2022-06-29 9:22 ` Michael S. Tsirkin 0 siblings, 0 replies; 6+ messages in thread From: Michael S. Tsirkin @ 2022-06-29 9:22 UTC (permalink / raw) To: Yongji Xie Cc: Liu Xiaodong, linux-kernel, virtualization, Maxime Coquelin, Stefan Hajnoczi On Wed, Jun 29, 2022 at 05:02:40PM +0800, Yongji Xie wrote: > On Wed, Jun 29, 2022 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Wed, Jun 29, 2022 at 04:25:41PM +0800, Xie Yongji wrote: > > > Let's update api version to 1 since we introduced > > > some new ioctls to support registering userspace > > > memory for IOTLB. > > > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > > > > > Adding new ioctls does not justify things like this. > > > > What I want to do here is make userspace know whether this feature is > supported or not in the kernel. So do you think we need to add > something like CHECK_EXTENSION ioctl here? Why bother? unsupported ioctls just return an error code. > > Besides, adding UAPI then changing it is not nice > > since it makes git bisect behave incorrectly. > > > > > --- > > > drivers/vdpa/vdpa_user/vduse_dev.c | 12 ++++++++++++ > > > include/uapi/linux/vduse.h | 8 +++++++- > > > 2 files changed, 19 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > > index 7b2ea7612da9..2795785ca6a2 100644 > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > @@ -1206,6 +1206,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > > > case VDUSE_IOTLB_GET_INFO: { > > > struct vduse_iotlb_info iotlb; > > > > > > + ret = -EPERM; > > > > > > Almost for sure a wrong error code. > > > > > + if (dev->api_version < 1) > > > + break; > > > + > > > iotlb.bounce_iova = 0; > > > iotlb.bounce_size = dev->domain->bounce_size; > > > > > > > > > Wait a second. so you are intentionally breaking any userspace > > that called VDUSE_SET_API_VERSION with version 0? > > > > Please don't. > > > > Yes, I'd like to let userspace know we don't support this feature. > > Thanks. > Yongji -- MST _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20220629082541.118-6-xieyongji@bytedance.com>]
* Re: [PATCH 5/6] vduse: Support registering userspace memory for IOTLB [not found] ` <20220629082541.118-6-xieyongji@bytedance.com> @ 2022-06-29 8:42 ` Michael S. Tsirkin [not found] ` <CACycT3sAcH-b40hORjSOQb67jZ0Fd-fxdzmZNwt=4iZdX6gLeA@mail.gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Michael S. Tsirkin @ 2022-06-29 8:42 UTC (permalink / raw) To: Xie Yongji Cc: xiaodong.liu, linux-kernel, virtualization, maxime.coquelin, stefanha On Wed, Jun 29, 2022 at 04:25:40PM +0800, Xie Yongji wrote: > Introduce two ioctls: VDUSE_IOTLB_REG_UMEM and > VDUSE_IOTLB_DEREG_UMEM to support registering > and de-registering userspace memory for IOTLB > in virtio-vdpa case. > > Now it only supports registering userspace memory > for IOTLB as bounce buffer. > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > --- > drivers/vdpa/vdpa_user/vduse_dev.c | 138 +++++++++++++++++++++++++++++ > include/uapi/linux/vduse.h | 28 ++++++ > 2 files changed, 166 insertions(+) > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > index c47a5d9765cf..7b2ea7612da9 100644 > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > @@ -21,6 +21,7 @@ > #include <linux/uio.h> > #include <linux/vdpa.h> > #include <linux/nospec.h> > +#include <linux/sched/mm.h> > #include <uapi/linux/vduse.h> > #include <uapi/linux/vdpa.h> > #include <uapi/linux/virtio_config.h> > @@ -64,6 +65,13 @@ struct vduse_vdpa { > struct vduse_dev *dev; > }; > > +struct vduse_iotlb_mem { > + unsigned long iova; > + unsigned long npages; > + struct page **pages; > + struct mm_struct *mm; > +}; > + > struct vduse_dev { > struct vduse_vdpa *vdev; > struct device *dev; > @@ -95,6 +103,8 @@ struct vduse_dev { > u8 status; > u32 vq_num; > u32 vq_align; > + struct vduse_iotlb_mem *iotlb_mem; > + struct mutex mem_lock; > }; > > struct vduse_dev_msg { > @@ -917,6 +927,100 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > return ret; > } > > +static int vduse_dev_dereg_iotlb_mem(struct vduse_dev *dev, > + u64 iova, u64 size) > +{ > + int ret; > + > + mutex_lock(&dev->mem_lock); > + ret = -ENOENT; > + if (!dev->iotlb_mem) > + goto unlock; > + > + ret = -EINVAL; > + if (dev->iotlb_mem->iova != iova || size != dev->domain->bounce_size) > + goto unlock; > + > + vduse_domain_remove_user_bounce_pages(dev->domain); > + unpin_user_pages(dev->iotlb_mem->pages, dev->iotlb_mem->npages); I notice you don't mark the pages dirty. This is going to be a problem. > + atomic64_sub(dev->iotlb_mem->npages, &dev->iotlb_mem->mm->pinned_vm); > + mmdrop(dev->iotlb_mem->mm); > + vfree(dev->iotlb_mem->pages); > + kfree(dev->iotlb_mem); > + dev->iotlb_mem = NULL; > + ret = 0; > +unlock: > + mutex_unlock(&dev->mem_lock); > + return ret; > +} > + > +static int vduse_dev_reg_iotlb_mem(struct vduse_dev *dev, > + u64 iova, u64 uaddr, u64 size) > +{ > + struct page **page_list = NULL; > + struct vduse_iotlb_mem *mem = NULL; > + long pinned = 0; > + unsigned long npages, lock_limit; > + int ret; > + > + if (size != dev->domain->bounce_size || > + iova != 0 || uaddr & ~PAGE_MASK) > + return -EINVAL; > + > + mutex_lock(&dev->mem_lock); > + ret = -EEXIST; > + if (dev->iotlb_mem) > + goto unlock; > + > + ret = -ENOMEM; > + npages = size >> PAGE_SHIFT; > + page_list = vmalloc(array_size(npages, > + sizeof(struct page *))); Is this basically trying to do a vmalloc with userspace-controlled size? That's an easy DOS vector. > + mem = kzalloc(sizeof(*mem), GFP_KERNEL); > + if (!page_list || !mem) > + goto unlock; > + > + mmap_read_lock(current->mm); > + > + lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK)); > + if (npages + atomic64_read(¤t->mm->pinned_vm) > lock_limit) > + goto out; > + > + pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE, > + page_list, NULL); > + if (pinned != npages) { > + ret = pinned < 0 ? pinned : -ENOMEM; > + goto out; > + } This is a popular approach but it's problematic if multiple devices try to pin the same page. Can this happen here? > + > + ret = vduse_domain_add_user_bounce_pages(dev->domain, > + page_list, pinned); > + if (ret) > + goto out; > + > + atomic64_add(npages, ¤t->mm->pinned_vm); > + > + mem->pages = page_list; > + mem->npages = pinned; > + mem->iova = iova; > + mem->mm = current->mm; > + mmgrab(current->mm); > + > + dev->iotlb_mem = mem; > +out: > + if (ret && pinned > 0) > + unpin_user_pages(page_list, pinned); > + > + mmap_read_unlock(current->mm); > +unlock: > + if (ret) { > + vfree(page_list); > + kfree(mem); > + } > + mutex_unlock(&dev->mem_lock); > + return ret; > +} > + > static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > unsigned long arg) > { > @@ -943,6 +1047,16 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > if (entry.start > entry.last) > break; > > + if (domain->bounce_map && dev->iotlb_mem) { > + ret = -EEXIST; > + if (entry.start >= 0 && > + entry.last < domain->bounce_size) > + break; > + > + if (entry.start < domain->bounce_size) > + entry.start = domain->bounce_size; > + } > + > spin_lock(&domain->iotlb_lock); > map = vhost_iotlb_itree_first(domain->iotlb, > entry.start, entry.last); > @@ -1102,6 +1216,28 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > ret = 0; > break; > } > + case VDUSE_IOTLB_REG_UMEM: { > + struct vduse_iotlb_umem umem; > + > + ret = -EFAULT; > + if (copy_from_user(&umem, argp, sizeof(umem))) > + break; > + > + ret = vduse_dev_reg_iotlb_mem(dev, umem.iova, > + umem.uaddr, umem.size); > + break; > + } > + case VDUSE_IOTLB_DEREG_UMEM: { > + struct vduse_iotlb_umem umem; > + > + ret = -EFAULT; > + if (copy_from_user(&umem, argp, sizeof(umem))) > + break; > + > + ret = vduse_dev_dereg_iotlb_mem(dev, umem.iova, > + umem.size); > + break; > + } > default: > ret = -ENOIOCTLCMD; > break; > @@ -1114,6 +1250,7 @@ static int vduse_dev_release(struct inode *inode, struct file *file) > { > struct vduse_dev *dev = file->private_data; > > + vduse_dev_dereg_iotlb_mem(dev, 0, dev->domain->bounce_size); > spin_lock(&dev->msg_lock); > /* Make sure the inflight messages can processed after reconncection */ > list_splice_init(&dev->recv_list, &dev->send_list); > @@ -1176,6 +1313,7 @@ static struct vduse_dev *vduse_dev_create(void) > return NULL; > > mutex_init(&dev->lock); > + mutex_init(&dev->mem_lock); > spin_lock_init(&dev->msg_lock); > INIT_LIST_HEAD(&dev->send_list); > INIT_LIST_HEAD(&dev->recv_list); > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h > index c201b7a77c2c..1b17391e228f 100644 > --- a/include/uapi/linux/vduse.h > +++ b/include/uapi/linux/vduse.h > @@ -227,6 +227,34 @@ struct vduse_iotlb_info { > /* Get IOTLB information, e.g. bounce buffer size */ > #define VDUSE_IOTLB_GET_INFO _IOR(VDUSE_BASE, 0x18, struct vduse_iotlb_info) > > +/** > + * struct vduse_iotlb_umem - userspace memory configuration > + * @uaddr: start address of userspace memory, it must be aligned to page size > + * @iova: IOVA of userspace memory, it must be equal to bounce iova returned > + * by VDUSE_IOTLB_GET_INFO now > + * @size: size of userspace memory, it must be equal to bounce size returned > + * by VDUSE_IOTLB_GET_INFO now > + * @reserved: for future use, needs to be initialized to zero You should check that it's 0 in that case, otherwise userspace will conveniently forget. > + * > + * Structure used by VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM > + * ioctls to register/de-register userspace memory for IOTLB. > + */ > +struct vduse_iotlb_umem { > + __u64 uaddr; > + __u64 iova; > + __u64 size; > + __u64 reserved[3]; > +}; > + > +/* > + * Register userspace memory for IOTLB. Now we only support registering > + * userspace memory as bounce buffer. > + */ > +#define VDUSE_IOTLB_REG_UMEM _IOW(VDUSE_BASE, 0x19, struct vduse_iotlb_umem) > + > +/* De-register the userspace memory. Caller should set iova and size field. */ > +#define VDUSE_IOTLB_DEREG_UMEM _IOW(VDUSE_BASE, 0x1a, struct vduse_iotlb_umem) > + > /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */ > > /** > -- > 2.20.1 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CACycT3sAcH-b40hORjSOQb67jZ0Fd-fxdzmZNwt=4iZdX6gLeA@mail.gmail.com>]
* Re: [PATCH 5/6] vduse: Support registering userspace memory for IOTLB [not found] ` <CACycT3sAcH-b40hORjSOQb67jZ0Fd-fxdzmZNwt=4iZdX6gLeA@mail.gmail.com> @ 2022-06-29 9:54 ` Michael S. Tsirkin [not found] ` <CACycT3vaNLYRid5SsT11LuVCaGXbBfV=q7c7SUp1+r9BcRpwkw@mail.gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Michael S. Tsirkin @ 2022-06-29 9:54 UTC (permalink / raw) To: Yongji Xie Cc: Liu Xiaodong, linux-kernel, virtualization, Maxime Coquelin, Stefan Hajnoczi On Wed, Jun 29, 2022 at 05:26:04PM +0800, Yongji Xie wrote: > On Wed, Jun 29, 2022 at 4:43 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Wed, Jun 29, 2022 at 04:25:40PM +0800, Xie Yongji wrote: > > > Introduce two ioctls: VDUSE_IOTLB_REG_UMEM and > > > VDUSE_IOTLB_DEREG_UMEM to support registering > > > and de-registering userspace memory for IOTLB > > > in virtio-vdpa case. > > > > > > Now it only supports registering userspace memory > > > for IOTLB as bounce buffer. > > > > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com> > > > --- > > > drivers/vdpa/vdpa_user/vduse_dev.c | 138 +++++++++++++++++++++++++++++ > > > include/uapi/linux/vduse.h | 28 ++++++ > > > 2 files changed, 166 insertions(+) > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > > index c47a5d9765cf..7b2ea7612da9 100644 > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > @@ -21,6 +21,7 @@ > > > #include <linux/uio.h> > > > #include <linux/vdpa.h> > > > #include <linux/nospec.h> > > > +#include <linux/sched/mm.h> > > > #include <uapi/linux/vduse.h> > > > #include <uapi/linux/vdpa.h> > > > #include <uapi/linux/virtio_config.h> > > > @@ -64,6 +65,13 @@ struct vduse_vdpa { > > > struct vduse_dev *dev; > > > }; > > > > > > +struct vduse_iotlb_mem { > > > + unsigned long iova; > > > + unsigned long npages; > > > + struct page **pages; > > > + struct mm_struct *mm; > > > +}; > > > + > > > struct vduse_dev { > > > struct vduse_vdpa *vdev; > > > struct device *dev; > > > @@ -95,6 +103,8 @@ struct vduse_dev { > > > u8 status; > > > u32 vq_num; > > > u32 vq_align; > > > + struct vduse_iotlb_mem *iotlb_mem; > > > + struct mutex mem_lock; > > > }; > > > > > > struct vduse_dev_msg { > > > @@ -917,6 +927,100 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > > > return ret; > > > } > > > > > > +static int vduse_dev_dereg_iotlb_mem(struct vduse_dev *dev, > > > + u64 iova, u64 size) > > > +{ > > > + int ret; > > > + > > > + mutex_lock(&dev->mem_lock); > > > + ret = -ENOENT; > > > + if (!dev->iotlb_mem) > > > + goto unlock; > > > + > > > + ret = -EINVAL; > > > + if (dev->iotlb_mem->iova != iova || size != dev->domain->bounce_size) > > > + goto unlock; > > > + > > > + vduse_domain_remove_user_bounce_pages(dev->domain); > > > + unpin_user_pages(dev->iotlb_mem->pages, dev->iotlb_mem->npages); > > > > I notice you don't mark the pages dirty. This is going to be a problem. > > > > Thanks for pointing out this, I will use unpin_user_pages_dirty_lock() instead. > > > > + atomic64_sub(dev->iotlb_mem->npages, &dev->iotlb_mem->mm->pinned_vm); > > > + mmdrop(dev->iotlb_mem->mm); > > > + vfree(dev->iotlb_mem->pages); > > > + kfree(dev->iotlb_mem); > > > + dev->iotlb_mem = NULL; > > > + ret = 0; > > > +unlock: > > > + mutex_unlock(&dev->mem_lock); > > > + return ret; > > > +} > > > + > > > +static int vduse_dev_reg_iotlb_mem(struct vduse_dev *dev, > > > + u64 iova, u64 uaddr, u64 size) > > > +{ > > > + struct page **page_list = NULL; > > > + struct vduse_iotlb_mem *mem = NULL; > > > + long pinned = 0; > > > + unsigned long npages, lock_limit; > > > + int ret; > > > + > > > + if (size != dev->domain->bounce_size || > > > + iova != 0 || uaddr & ~PAGE_MASK) > > > + return -EINVAL; > > > + > > > + mutex_lock(&dev->mem_lock); > > > + ret = -EEXIST; > > > + if (dev->iotlb_mem) > > > + goto unlock; > > > + > > > + ret = -ENOMEM; > > > + npages = size >> PAGE_SHIFT; > > > + page_list = vmalloc(array_size(npages, > > > + sizeof(struct page *))); > > > > Is this basically trying to do a vmalloc with userspace-controlled size? > > That's an easy DOS vector. > > > > We already checked the size before. The size must equal to (64MB >> > PAGE_SHIFT) now. That's not a small amount. Can this be accounted e.g. through cgroups at least? > > > + mem = kzalloc(sizeof(*mem), GFP_KERNEL); > > > + if (!page_list || !mem) > > > + goto unlock; > > > + > > > + mmap_read_lock(current->mm); > > > + > > > + lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK)); > > > + if (npages + atomic64_read(¤t->mm->pinned_vm) > lock_limit) > > > + goto out; > > > + > > > + pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE, > > > + page_list, NULL); > > > + if (pinned != npages) { > > > + ret = pinned < 0 ? pinned : -ENOMEM; > > > + goto out; > > > + } > > > > > > This is a popular approach but it's problematic if multiple > > devices try to pin the same page. > > Do you mean the data would be corrupted if multiple devices use the > same page as bounce buffer? This is indeed a problem. No i mean you decrement the lock twice. Question is can two bounce buffers share a page? > > Can this happen here? > > > > I think we can't prevent this case now. I will do it in v2. > > > > + > > > + ret = vduse_domain_add_user_bounce_pages(dev->domain, > > > + page_list, pinned); > > > + if (ret) > > > + goto out; > > > + > > > + atomic64_add(npages, ¤t->mm->pinned_vm); > > > + > > > + mem->pages = page_list; > > > + mem->npages = pinned; > > > + mem->iova = iova; > > > + mem->mm = current->mm; > > > + mmgrab(current->mm); > > > + > > > + dev->iotlb_mem = mem; > > > +out: > > > + if (ret && pinned > 0) > > > + unpin_user_pages(page_list, pinned); > > > + > > > + mmap_read_unlock(current->mm); > > > +unlock: > > > + if (ret) { > > > + vfree(page_list); > > > + kfree(mem); > > > + } > > > + mutex_unlock(&dev->mem_lock); > > > + return ret; > > > +} > > > + > > > static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > > > unsigned long arg) > > > { > > > @@ -943,6 +1047,16 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > > > if (entry.start > entry.last) > > > break; > > > > > > + if (domain->bounce_map && dev->iotlb_mem) { > > > + ret = -EEXIST; > > > + if (entry.start >= 0 && > > > + entry.last < domain->bounce_size) > > > + break; > > > + > > > + if (entry.start < domain->bounce_size) > > > + entry.start = domain->bounce_size; > > > + } > > > + > > > spin_lock(&domain->iotlb_lock); > > > map = vhost_iotlb_itree_first(domain->iotlb, > > > entry.start, entry.last); > > > @@ -1102,6 +1216,28 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > > > ret = 0; > > > break; > > > } > > > + case VDUSE_IOTLB_REG_UMEM: { > > > + struct vduse_iotlb_umem umem; > > > + > > > + ret = -EFAULT; > > > + if (copy_from_user(&umem, argp, sizeof(umem))) > > > + break; > > > + > > > + ret = vduse_dev_reg_iotlb_mem(dev, umem.iova, > > > + umem.uaddr, umem.size); > > > + break; > > > + } > > > + case VDUSE_IOTLB_DEREG_UMEM: { > > > + struct vduse_iotlb_umem umem; > > > + > > > + ret = -EFAULT; > > > + if (copy_from_user(&umem, argp, sizeof(umem))) > > > + break; > > > + > > > + ret = vduse_dev_dereg_iotlb_mem(dev, umem.iova, > > > + umem.size); > > > + break; > > > + } > > > default: > > > ret = -ENOIOCTLCMD; > > > break; > > > @@ -1114,6 +1250,7 @@ static int vduse_dev_release(struct inode *inode, struct file *file) > > > { > > > struct vduse_dev *dev = file->private_data; > > > > > > + vduse_dev_dereg_iotlb_mem(dev, 0, dev->domain->bounce_size); > > > spin_lock(&dev->msg_lock); > > > /* Make sure the inflight messages can processed after reconncection */ > > > list_splice_init(&dev->recv_list, &dev->send_list); > > > @@ -1176,6 +1313,7 @@ static struct vduse_dev *vduse_dev_create(void) > > > return NULL; > > > > > > mutex_init(&dev->lock); > > > + mutex_init(&dev->mem_lock); > > > spin_lock_init(&dev->msg_lock); > > > INIT_LIST_HEAD(&dev->send_list); > > > INIT_LIST_HEAD(&dev->recv_list); > > > diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h > > > index c201b7a77c2c..1b17391e228f 100644 > > > --- a/include/uapi/linux/vduse.h > > > +++ b/include/uapi/linux/vduse.h > > > @@ -227,6 +227,34 @@ struct vduse_iotlb_info { > > > /* Get IOTLB information, e.g. bounce buffer size */ > > > #define VDUSE_IOTLB_GET_INFO _IOR(VDUSE_BASE, 0x18, struct vduse_iotlb_info) > > > > > > +/** > > > + * struct vduse_iotlb_umem - userspace memory configuration > > > + * @uaddr: start address of userspace memory, it must be aligned to page size > > > + * @iova: IOVA of userspace memory, it must be equal to bounce iova returned > > > + * by VDUSE_IOTLB_GET_INFO now > > > + * @size: size of userspace memory, it must be equal to bounce size returned > > > + * by VDUSE_IOTLB_GET_INFO now > > > + * @reserved: for future use, needs to be initialized to zero > > > > You should check that it's 0 in that case, otherwise userspace > > will conveniently forget. > > > > OK, I will fix it. > > Thanks, > Yongji _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CACycT3vaNLYRid5SsT11LuVCaGXbBfV=q7c7SUp1+r9BcRpwkw@mail.gmail.com>]
* Re: [PATCH 5/6] vduse: Support registering userspace memory for IOTLB [not found] ` <CACycT3vaNLYRid5SsT11LuVCaGXbBfV=q7c7SUp1+r9BcRpwkw@mail.gmail.com> @ 2022-06-29 11:28 ` Michael S. Tsirkin 0 siblings, 0 replies; 6+ messages in thread From: Michael S. Tsirkin @ 2022-06-29 11:28 UTC (permalink / raw) To: Yongji Xie Cc: Liu Xiaodong, linux-kernel, virtualization, Maxime Coquelin, Stefan Hajnoczi, songmuchun On Wed, Jun 29, 2022 at 06:19:31PM +0800, Yongji Xie wrote: > > No i mean you decrement the lock twice. Question is can two bounce > > buffers share a page? > > > > I think we can't. I will find a way to prevent it. > > Thanks, > Yongji I guess it doesn't matter much then. -- MST _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/6] VDUSE: Support registering userspace memory as bounce buffer [not found] <20220629082541.118-1-xieyongji@bytedance.com> [not found] ` <20220629082541.118-7-xieyongji@bytedance.com> [not found] ` <20220629082541.118-6-xieyongji@bytedance.com> @ 2022-07-04 9:26 ` Liu Xiaodong 2 siblings, 0 replies; 6+ messages in thread From: Liu Xiaodong @ 2022-07-04 9:26 UTC (permalink / raw) To: Xie Yongji Cc: mst, xiaodong.liu, linux-kernel, virtualization, maxime.coquelin, stefanha On Wed, Jun 29, 2022 at 04:25:35PM +0800, Xie Yongji wrote: > Hi all, > > This series introduces some new ioctls: VDUSE_IOTLB_GET_INFO, > VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support > registering and de-registering userspace memory for IOTLB > as bounce buffer in virtio-vdpa case. > > The VDUSE_IOTLB_GET_INFO ioctl can help user to query IOLTB > information such as bounce buffer size. Then user can use > those information on VDUSE_IOTLB_REG_UMEM and > VDUSE_IOTLB_DEREG_UMEM ioctls to register and de-register > userspace memory for IOTLB. > > During registering and de-registering, the DMA data in use > would be copied from kernel bounce pages to userspace bounce > pages and back. > > With this feature, some existing application such as SPDK > and DPDK can leverage the datapath of VDUSE directly and > efficiently as discussed before [1]. They can register some > preallocated hugepages to VDUSE to avoid an extra memcpy > from bounce-buffer to hugepages. Hi, Yongji Very glad to see this enhancement in VDUSE. Thank you. It is really helpful and essential to SPDK. With this new feature, we can get VDUSE transferred data accessed directly by userspace physical backends, like RDMA and PCIe devices. In SPDK roadmap, it's one important work to export block services to local host, especially for container scenario. This patch could help SPDK do that with its userspace backend stacks while keeping high efficiency and performance. So the whole SPDK ecosystem can get benefited. Based on this enhancement, as discussed, I drafted a VDUSE prototype module in SPDK for initial evaluation: [TEST]vduse: prototype for initial draft https://review.spdk.io/gerrit/c/spdk/spdk/+/13534 Running SPDK on single CPU core, configured with 2 P3700 NVMe, and exported block devices to local host kernel via different protocols. The randwrite IOPS through each protocol are: NBD 121K NVMf-tcp loopback 274K VDUSE 463K SPDK with RDMA backends should have a similar ratio. VDUSE has a great performance advantage for SPDK. We have kept investigating on this usage for years. Originally, some SPDK users used NBD. Then NVMf-tcp loopback is SPDK community accommended way. In future, VDUSE could be the preferred way. > The kernel and userspace codes could be found in github: > > https://github.com/bytedance/linux/tree/vduse-umem > https://github.com/bytedance/qemu/tree/vduse-umem > > To test it with qemu-storage-daemon: > > $ qemu-storage-daemon \ > --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server=on,wait=off \ > --monitor chardev=charmonitor \ > --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 > \ > --export type=vduse-blk,id=vduse-test,name=vduse-test,node-name=disk0,writable=on > > [1] https://lkml.org/lkml/2021/6/27/318 > > Please review, thanks! Waiting for its review process. Thanks Xiaodong _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-07-04 9:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20220629082541.118-1-xieyongji@bytedance.com>
[not found] ` <20220629082541.118-7-xieyongji@bytedance.com>
2022-06-29 8:33 ` [PATCH 6/6] vduse: Update api version to 1 Michael S. Tsirkin
[not found] ` <CACycT3sZXwunA_UOCriSv=f2VARMnPb1mNU2GAUd9BLCU-Hg8w@mail.gmail.com>
2022-06-29 9:22 ` Michael S. Tsirkin
[not found] ` <20220629082541.118-6-xieyongji@bytedance.com>
2022-06-29 8:42 ` [PATCH 5/6] vduse: Support registering userspace memory for IOTLB Michael S. Tsirkin
[not found] ` <CACycT3sAcH-b40hORjSOQb67jZ0Fd-fxdzmZNwt=4iZdX6gLeA@mail.gmail.com>
2022-06-29 9:54 ` Michael S. Tsirkin
[not found] ` <CACycT3vaNLYRid5SsT11LuVCaGXbBfV=q7c7SUp1+r9BcRpwkw@mail.gmail.com>
2022-06-29 11:28 ` Michael S. Tsirkin
2022-07-04 9:26 ` [PATCH 0/6] VDUSE: Support registering userspace memory as bounce buffer Liu Xiaodong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).