virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v7 04/12] virtio-blk: Add validation for block size in config space
       [not found]   ` <CACycT3s1rEvNnNkJKQsHGRsyLPADieFdVkb1Sp3GObR0Vox5Fg@mail.gmail.com>
@ 2021-05-19 14:42     ` Dan Carpenter
       [not found]       ` <CACycT3veubBFCg9omxLDJJFP7B7QH8++Q+tKmb_M_hmNS45cmw@mail.gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Dan Carpenter @ 2021-05-19 14:42 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, Jonathan Corbet, linux-kernel, kvm,
	Michael S. Tsirkin, netdev, Randy Dunlap, iommu, Matthew Wilcox,
	virtualization, Christoph Hellwig, Christian Brauner, bcrl, viro,
	Stefan Hajnoczi, linux-fsdevel, joro, Mika Penttilä

On Wed, May 19, 2021 at 09:39:20PM +0800, Yongji Xie wrote:
> On Mon, May 17, 2021 at 5:56 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> >
> > This ensures that we will not use an invalid block size
> > in config space (might come from an untrusted device).

I looked at if I should add this as an untrusted function so that Smatch
could find these sorts of bugs but this is reading data from the host so
there has to be some level of trust...

I should add some more untrusted data kvm functions to Smatch.  Right
now I only have kvm_register_read() and I've added kvm_read_guest_virt()
just now.

> >
> > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > ---
> >  drivers/block/virtio_blk.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index ebb4d3fe803f..c848aa36d49b 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -826,7 +826,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> >         err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
> >                                    struct virtio_blk_config, blk_size,
> >                                    &blk_size);
> > -       if (!err)
> > +       if (!err && blk_size > 0 && blk_size <= max_size)
> 
> The check here is incorrect. I will use PAGE_SIZE as the maximum
> boundary in the new version.

What does this bug look like to the user?  A minimum block size of 1
seems pretty crazy.  Surely the minimum should be higher?

regards,
dan carpenter

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Re: [PATCH v7 04/12] virtio-blk: Add validation for block size in config space
       [not found]       ` <CACycT3veubBFCg9omxLDJJFP7B7QH8++Q+tKmb_M_hmNS45cmw@mail.gmail.com>
@ 2021-05-20  5:43         ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2021-05-20  5:43 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, linux-kernel, kvm, Jonathan Corbet,
	Mika Penttilä, netdev, Randy Dunlap, iommu, Matthew Wilcox,
	virtualization, Christoph Hellwig, Christian Brauner, bcrl, viro,
	Stefan Hajnoczi, linux-fsdevel, joro, Dan Carpenter

On Thu, May 20, 2021 at 01:25:16PM +0800, Yongji Xie wrote:
> On Wed, May 19, 2021 at 10:42 PM Dan Carpenter <dan.carpenter@oracle.com> wrote:
> >
> > On Wed, May 19, 2021 at 09:39:20PM +0800, Yongji Xie wrote:
> > > On Mon, May 17, 2021 at 5:56 PM Xie Yongji <xieyongji@bytedance.com> wrote:
> > > >
> > > > This ensures that we will not use an invalid block size
> > > > in config space (might come from an untrusted device).
> >
> > I looked at if I should add this as an untrusted function so that Smatch
> > could find these sorts of bugs but this is reading data from the host so
> > there has to be some level of trust...
> >
> 
> It would be great if Smatch could detect this case if possible. The
> data might be trusted in traditional VM cases. But now the data can be
> read from a userspace daemon when VDUSE is enabled.
> 
> > I should add some more untrusted data kvm functions to Smatch.  Right
> > now I only have kvm_register_read() and I've added kvm_read_guest_virt()
> > just now.
> >
> > > >
> > > > Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> > > > ---
> > > >  drivers/block/virtio_blk.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > index ebb4d3fe803f..c848aa36d49b 100644
> > > > --- a/drivers/block/virtio_blk.c
> > > > +++ b/drivers/block/virtio_blk.c
> > > > @@ -826,7 +826,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > >         err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
> > > >                                    struct virtio_blk_config, blk_size,
> > > >                                    &blk_size);
> > > > -       if (!err)
> > > > +       if (!err && blk_size > 0 && blk_size <= max_size)
> > >
> > > The check here is incorrect. I will use PAGE_SIZE as the maximum
> > > boundary in the new version.
> >
> > What does this bug look like to the user?
> 
> The kernel will panic if the block size is larger than PAGE_SIZE.

Kernel panic at this point is par for the course IMHO.
Let's focus on eliminating data corruption for starters.

> > A minimum block size of 1 seems pretty crazy.  Surely the minimum should be > higher?
> >
> 
> Yes, 512 is better here.
> 
> Thanks,
> Yongji

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace
       [not found] <20210517095513.850-1-xieyongji@bytedance.com>
       [not found] ` <20210517095513.850-5-xieyongji@bytedance.com>
@ 2021-05-20  6:06 ` Michael S. Tsirkin
       [not found]   ` <CACycT3tKY2V=dmOJjeiZxkqA3cH8_KF93NNbRnNU04e5Job2cw@mail.gmail.com>
       [not found] ` <20210517095513.850-3-xieyongji@bytedance.com>
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2021-05-20  6:06 UTC (permalink / raw)
  To: Xie Yongji
  Cc: axboe, linux-kernel, kvm, corbet, netdev, rdunlap, iommu, willy,
	virtualization, hch, christian.brauner, bcrl, viro, stefanha,
	linux-fsdevel, dan.carpenter, joro, mika.penttila

On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote:
> This series introduces a framework, which can be used to implement
> vDPA Devices in a userspace program. The work consist of two parts:
> control path forwarding and data path offloading.
> 
> In the control path, the VDUSE driver will make use of message
> mechnism to forward the config operation from vdpa bus driver
> to userspace. Userspace can use read()/write() to receive/reply
> those control messages.
> 
> In the data path, the core is mapping dma buffer into VDUSE
> daemon's address space, which can be implemented in different ways
> depending on the vdpa bus to which the vDPA device is attached.
> 
> In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> buffer is reside in a userspace memory region which can be shared to the
> VDUSE userspace processs via transferring the shmfd.
> 
> The details and our user case is shown below:
> 
> ------------------------    -------------------------   ----------------------------------------------
> |            Container |    |              QEMU(VM) |   |                               VDUSE daemon |
> |       ---------      |    |  -------------------  |   | ------------------------- ---------------- |
> |       |dev/vdx|      |    |  |/dev/vhost-vdpa-x|  |   | | vDPA device emulation | | block driver | |
> ------------+-----------     -----------+------------   -------------+----------------------+---------
>             |                           |                            |                      |
>             |                           |                            |                      |
> ------------+---------------------------+----------------------------+----------------------+---------
> |    | block device |           |  vhost device |            | vduse driver |          | TCP/IP |    |
> |    -------+--------           --------+--------            -------+--------          -----+----    |
> |           |                           |                           |                       |        |
> | ----------+----------       ----------+-----------         -------+-------                |        |
> | | virtio-blk driver |       |  vhost-vdpa driver |         | vdpa device |                |        |
> | ----------+----------       ----------+-----------         -------+-------                |        |
> |           |      virtio bus           |                           |                       |        |
> |   --------+----+-----------           |                           |                       |        |
> |                |                      |                           |                       |        |
> |      ----------+----------            |                           |                       |        |
> |      | virtio-blk device |            |                           |                       |        |
> |      ----------+----------            |                           |                       |        |
> |                |                      |                           |                       |        |
> |     -----------+-----------           |                           |                       |        |
> |     |  virtio-vdpa driver |           |                           |                       |        |
> |     -----------+-----------           |                           |                       |        |
> |                |                      |                           |    vdpa bus           |        |
> |     -----------+----------------------+---------------------------+------------           |        |
> |                                                                                        ---+---     |
> -----------------------------------------------------------------------------------------| NIC |------
>                                                                                          ---+---
>                                                                                             |
>                                                                                    ---------+---------
>                                                                                    | Remote Storages |
>                                                                                    -------------------
> 
> We make use of it to implement a block device connecting to
> our distributed storage, which can be used both in containers and
> VMs. Thus, we can have an unified technology stack in this two cases.
> 
> To test it with null-blk:
> 
>   $ qemu-storage-daemon \
>       --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
>       --monitor chardev=charmonitor \
>       --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \
>       --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128
> 
> The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse
> 
> To make the userspace VDUSE processes such as qemu-storage-daemon able to
> run unprivileged. We did some works on virtio driver to avoid trusting
> device, including:
> 
>   - validating the device status:
> 
>     * https://lore.kernel.org/lkml/20210517093428.670-1-xieyongji@bytedance.com/
> 
>   - validating the used length: 
> 
>     * https://lore.kernel.org/lkml/20210517090836.533-1-xieyongji@bytedance.com/
> 
>   - validating the device config:
>     
>     * patch 4 ("virtio-blk: Add validation for block size in config space")
> 
>   - validating the device response:
> 
>     * patch 5 ("virtio_scsi: Add validation for residual bytes from response")
> 
> Since I'm not sure if I missing something during auditing, especially on some
> virtio device drivers that I'm not familiar with, now we only support emualting
> a few vDPA devices by default, including: virtio-net device, virtio-blk device,
> virtio-scsi device and virtio-fs device. This limitaion can help to reduce
> security risks.

I suspect there are a lot of assumptions even with these 4.
Just what are the security assumptions and guarantees here?
E.g. it seems pretty clear that exposing a malformed FS
to a random kernel config can cause untold mischief.

Things like virtnet_send_command are also an easy way for
the device to DOS the kernel. And before you try to add
an arbitrary timeout there - please don't,
the fix is moving things that must be guaranteed into kernel
and making things that are not guaranteed asynchronous.
Right now there are some things that happen with locks taken,
where if we don't wait for device we lose the ability to report failures
to userspace. E.g. all kind of netlink things are like this.
One can think of a bunch of ways to address this, this
needs to be discussed with the relevant subsystem maintainers.


If I were you I would start with one type of device, and as simple one
as possible.



> When a sysadmin trusts the userspace process enough, it can relax
> the limitation with a 'allow_unsafe_device_emulation' module parameter.

That's not a great security interface. It's a global module specific knob
that just allows any userspace to emulate anything at all.
Coming up with a reasonable interface isn't going to be easy.
For now maybe just have people patch their kernels if they want to
move fast and break things.

> Future work:
>   - Improve performance
>   - Userspace library (find a way to reuse device emulation code in qemu/rust-vmm)
> 
> V6 to V7:
> - Export alloc_iova_fast()
> - Add get_config_size() callback
> - Add some patches to avoid trusting virtio devices
> - Add limited device emulation
> - Add some documents
> - Use workqueue to inject config irq
> - Add parameter on vq irq injecting
> - Rename vduse_domain_get_mapping_page() to vduse_domain_get_coherent_page()
> - Add WARN_ON() to catch message failure
> - Add some padding/reserved fields to uAPI structure
> - Fix some bugs
> - Rebase to vhost.git
> 
> V5 to V6:
> - Export receive_fd() instead of __receive_fd()
> - Factor out the unmapping logic of pa and va separatedly
> - Remove the logic of bounce page allocation in page fault handler
> - Use PAGE_SIZE as IOVA allocation granule
> - Add EPOLLOUT support
> - Enable setting API version in userspace
> - Fix some bugs
> 
> V4 to V5:
> - Remove the patch for irq binding
> - Use a single IOTLB for all types of mapping
> - Factor out vhost_vdpa_pa_map()
> - Add some sample codes in document
> - Use receice_fd_user() to pass file descriptor
> - Fix some bugs
> 
> V3 to V4:
> - Rebase to vhost.git
> - Split some patches
> - Add some documents
> - Use ioctl to inject interrupt rather than eventfd
> - Enable config interrupt support
> - Support binding irq to the specified cpu
> - Add two module parameter to limit bounce/iova size
> - Create char device rather than anon inode per vduse
> - Reuse vhost IOTLB for iova domain
> - Rework the message mechnism in control path
> 
> V2 to V3:
> - Rework the MMU-based IOMMU driver
> - Use the iova domain as iova allocator instead of genpool
> - Support transferring vma->vm_file in vhost-vdpa
> - Add SVA support in vhost-vdpa
> - Remove the patches on bounce pages reclaim
> 
> V1 to V2:
> - Add vhost-vdpa support
> - Add some documents
> - Based on the vdpa management tool
> - Introduce a workqueue for irq injection
> - Replace interval tree with array map to store the iova_map
> 
> Xie Yongji (12):
>   iova: Export alloc_iova_fast()
>   file: Export receive_fd() to modules
>   eventfd: Increase the recursion depth of eventfd_signal()
>   virtio-blk: Add validation for block size in config space
>   virtio_scsi: Add validation for residual bytes from response
>   vhost-iotlb: Add an opaque pointer for vhost IOTLB
>   vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()
>   vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()
>   vdpa: Support transferring virtual addressing during DMA mapping
>   vduse: Implement an MMU-based IOMMU driver
>   vduse: Introduce VDUSE - vDPA Device in Userspace
>   Documentation: Add documentation for VDUSE
> 
>  Documentation/userspace-api/index.rst              |    1 +
>  Documentation/userspace-api/ioctl/ioctl-number.rst |    1 +
>  Documentation/userspace-api/vduse.rst              |  243 ++++
>  drivers/block/virtio_blk.c                         |    2 +-
>  drivers/iommu/iova.c                               |    1 +
>  drivers/scsi/virtio_scsi.c                         |    2 +-
>  drivers/vdpa/Kconfig                               |   10 +
>  drivers/vdpa/Makefile                              |    1 +
>  drivers/vdpa/ifcvf/ifcvf_main.c                    |    2 +-
>  drivers/vdpa/mlx5/net/mlx5_vnet.c                  |    2 +-
>  drivers/vdpa/vdpa.c                                |    9 +-
>  drivers/vdpa/vdpa_sim/vdpa_sim.c                   |    8 +-
>  drivers/vdpa/vdpa_user/Makefile                    |    5 +
>  drivers/vdpa/vdpa_user/iova_domain.c               |  531 +++++++
>  drivers/vdpa/vdpa_user/iova_domain.h               |   70 +
>  drivers/vdpa/vdpa_user/vduse_dev.c                 | 1453 ++++++++++++++++++++
>  drivers/vdpa/virtio_pci/vp_vdpa.c                  |    2 +-
>  drivers/vhost/iotlb.c                              |   20 +-
>  drivers/vhost/vdpa.c                               |  148 +-
>  fs/eventfd.c                                       |    2 +-
>  fs/file.c                                          |    6 +
>  include/linux/eventfd.h                            |    5 +-
>  include/linux/file.h                               |    7 +-
>  include/linux/vdpa.h                               |   21 +-
>  include/linux/vhost_iotlb.h                        |    3 +
>  include/uapi/linux/vduse.h                         |  178 +++
>  26 files changed, 2681 insertions(+), 52 deletions(-)
>  create mode 100644 Documentation/userspace-api/vduse.rst
>  create mode 100644 drivers/vdpa/vdpa_user/Makefile
>  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
>  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h
>  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
>  create mode 100644 include/uapi/linux/vduse.h
> 
> -- 
> 2.11.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 02/12] file: Export receive_fd() to modules
       [not found] ` <20210517095513.850-3-xieyongji@bytedance.com>
@ 2021-05-20  6:18   ` Al Viro
  0 siblings, 0 replies; 21+ messages in thread
From: Al Viro @ 2021-05-20  6:18 UTC (permalink / raw)
  To: Xie Yongji
  Cc: axboe, corbet, linux-kernel, kvm, mst, netdev, rdunlap, iommu,
	willy, virtualization, hch, christian.brauner, bcrl,
	mika.penttila, stefanha, linux-fsdevel, joro, dan.carpenter

On Mon, May 17, 2021 at 05:55:03PM +0800, Xie Yongji wrote:
> Export receive_fd() so that some modules can use
> it to pass file descriptor between processes without
> missing any security stuffs.

Which tree is that against?  Because in mainline this won't even build, let
alone work.

> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -1135,6 +1135,12 @@ int __receive_fd(int fd, struct file *file, int __user *ufd, unsigned int o_flag
>  	return new_fd;
>  }
>  
> +int receive_fd(struct file *file, unsigned int o_flags)
> +{
> +	return __receive_fd(-1, file, NULL, o_flags);
> +}
> +EXPORT_SYMBOL_GPL(receive_fd);

fs/file.c:1097:int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found] ` <20210517095513.850-12-xieyongji@bytedance.com>
@ 2021-05-20  6:28   ` Al Viro
  2021-05-27  4:12   ` Jason Wang
  2021-05-31  4:56   ` Greg KH
  2 siblings, 0 replies; 21+ messages in thread
From: Al Viro @ 2021-05-20  6:28 UTC (permalink / raw)
  To: Xie Yongji
  Cc: axboe, corbet, linux-kernel, kvm, mst, netdev, rdunlap, iommu,
	willy, virtualization, hch, christian.brauner, bcrl,
	mika.penttila, stefanha, linux-fsdevel, joro, dan.carpenter

On Mon, May 17, 2021 at 05:55:12PM +0800, Xie Yongji wrote:

> +	case VDUSE_IOTLB_GET_FD: {
> +		struct vduse_iotlb_entry entry;
> +		struct vhost_iotlb_map *map;
> +		struct vdpa_map_file *map_file;
> +		struct vduse_iova_domain *domain = dev->domain;
> +		struct file *f = NULL;
> +
> +		ret = -EFAULT;
> +		if (copy_from_user(&entry, argp, sizeof(entry)))
> +			break;

			return -EFAULT;
surely?
> +
> +		ret = -EINVAL;
> +		if (entry.start > entry.last)
> +			break;

... and similar here, etc.

> +		spin_lock(&domain->iotlb_lock);
> +		map = vhost_iotlb_itree_first(domain->iotlb,
> +					      entry.start, entry.last);
> +		if (map) {
> +			map_file = (struct vdpa_map_file *)map->opaque;
> +			f = get_file(map_file->file);
> +			entry.offset = map_file->offset;
> +			entry.start = map->start;
> +			entry.last = map->last;
> +			entry.perm = map->perm;
> +		}
> +		spin_unlock(&domain->iotlb_lock);
> +		ret = -EINVAL;
> +		if (!f)
> +			break;
> +
> +		ret = -EFAULT;
> +		if (copy_to_user(argp, &entry, sizeof(entry))) {
> +			fput(f);
> +			break;
> +		}
> +		ret = receive_fd(f, perm_to_file_flags(entry.perm));
> +		fput(f);
> +		break;

IDGI.  The main difference between receive_fd() and plain old
get_unused_fd_flags() + fd_install() is __receive_sock() call.
Which does nothing whatsoever in case of non-sockets.  Can you
get a socket here?

IOW, why bother with that crap at all, nevermind exporting it?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace
       [not found]   ` <CACycT3tKY2V=dmOJjeiZxkqA3cH8_KF93NNbRnNU04e5Job2cw@mail.gmail.com>
@ 2021-05-25  6:40     ` Jason Wang
  2021-05-25  6:48       ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2021-05-25  6:40 UTC (permalink / raw)
  To: Yongji Xie, Michael S. Tsirkin
  Cc: Jens Axboe, linux-kernel, kvm, Jonathan Corbet, netdev, joro,
	Randy Dunlap, iommu, Matthew Wilcox, virtualization,
	Christoph Hellwig, Christian Brauner, bcrl, Al Viro,
	Stefan Hajnoczi, linux-fsdevel, Dan Carpenter, Mika Penttilä


在 2021/5/20 下午5:06, Yongji Xie 写道:
> On Thu, May 20, 2021 at 2:06 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote:
>>> This series introduces a framework, which can be used to implement
>>> vDPA Devices in a userspace program. The work consist of two parts:
>>> control path forwarding and data path offloading.
>>>
>>> In the control path, the VDUSE driver will make use of message
>>> mechnism to forward the config operation from vdpa bus driver
>>> to userspace. Userspace can use read()/write() to receive/reply
>>> those control messages.
>>>
>>> In the data path, the core is mapping dma buffer into VDUSE
>>> daemon's address space, which can be implemented in different ways
>>> depending on the vdpa bus to which the vDPA device is attached.
>>>
>>> In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
>>> bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
>>> buffer is reside in a userspace memory region which can be shared to the
>>> VDUSE userspace processs via transferring the shmfd.
>>>
>>> The details and our user case is shown below:
>>>
>>> ------------------------    -------------------------   ----------------------------------------------
>>> |            Container |    |              QEMU(VM) |   |                               VDUSE daemon |
>>> |       ---------      |    |  -------------------  |   | ------------------------- ---------------- |
>>> |       |dev/vdx|      |    |  |/dev/vhost-vdpa-x|  |   | | vDPA device emulation | | block driver | |
>>> ------------+-----------     -----------+------------   -------------+----------------------+---------
>>>              |                           |                            |                      |
>>>              |                           |                            |                      |
>>> ------------+---------------------------+----------------------------+----------------------+---------
>>> |    | block device |           |  vhost device |            | vduse driver |          | TCP/IP |    |
>>> |    -------+--------           --------+--------            -------+--------          -----+----    |
>>> |           |                           |                           |                       |        |
>>> | ----------+----------       ----------+-----------         -------+-------                |        |
>>> | | virtio-blk driver |       |  vhost-vdpa driver |         | vdpa device |                |        |
>>> | ----------+----------       ----------+-----------         -------+-------                |        |
>>> |           |      virtio bus           |                           |                       |        |
>>> |   --------+----+-----------           |                           |                       |        |
>>> |                |                      |                           |                       |        |
>>> |      ----------+----------            |                           |                       |        |
>>> |      | virtio-blk device |            |                           |                       |        |
>>> |      ----------+----------            |                           |                       |        |
>>> |                |                      |                           |                       |        |
>>> |     -----------+-----------           |                           |                       |        |
>>> |     |  virtio-vdpa driver |           |                           |                       |        |
>>> |     -----------+-----------           |                           |                       |        |
>>> |                |                      |                           |    vdpa bus           |        |
>>> |     -----------+----------------------+---------------------------+------------           |        |
>>> |                                                                                        ---+---     |
>>> -----------------------------------------------------------------------------------------| NIC |------
>>>                                                                                           ---+---
>>>                                                                                              |
>>>                                                                                     ---------+---------
>>>                                                                                     | Remote Storages |
>>>                                                                                     -------------------
>>>
>>> We make use of it to implement a block device connecting to
>>> our distributed storage, which can be used both in containers and
>>> VMs. Thus, we can have an unified technology stack in this two cases.
>>>
>>> To test it with null-blk:
>>>
>>>    $ qemu-storage-daemon \
>>>        --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
>>>        --monitor chardev=charmonitor \
>>>        --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \
>>>        --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128
>>>
>>> The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse
>>>
>>> To make the userspace VDUSE processes such as qemu-storage-daemon able to
>>> run unprivileged. We did some works on virtio driver to avoid trusting
>>> device, including:
>>>
>>>    - validating the device status:
>>>
>>>      * https://lore.kernel.org/lkml/20210517093428.670-1-xieyongji@bytedance.com/
>>>
>>>    - validating the used length:
>>>
>>>      * https://lore.kernel.org/lkml/20210517090836.533-1-xieyongji@bytedance.com/
>>>
>>>    - validating the device config:
>>>
>>>      * patch 4 ("virtio-blk: Add validation for block size in config space")
>>>
>>>    - validating the device response:
>>>
>>>      * patch 5 ("virtio_scsi: Add validation for residual bytes from response")
>>>
>>> Since I'm not sure if I missing something during auditing, especially on some
>>> virtio device drivers that I'm not familiar with, now we only support emualting
>>> a few vDPA devices by default, including: virtio-net device, virtio-blk device,
>>> virtio-scsi device and virtio-fs device. This limitation can help to reduce
>>> security risks.
>> I suspect there are a lot of assumptions even with these 4.
>> Just what are the security assumptions and guarantees here?


Note that VDUSE is not the only device that may suffer from this, 
here're two others:

1) Encrypted VM
2) Smart NICs


> The attack surface from a virtio device is limited with IOMMU enabled.
> It should be able to avoid security risk if we can validate all data
> such as config space and used length from device in device driver.
>
>> E.g. it seems pretty clear that exposing a malformed FS
>> to a random kernel config can cause untold mischief.
>>
>> Things like virtnet_send_command are also an easy way for
>> the device to DOS the kernel.


I think the virtnet_send_command() needs to use interrupt instead of 
polling.

Thanks


>> And before you try to add
>> an arbitrary timeout there - please don't,
>> the fix is moving things that must be guaranteed into kernel
>> and making things that are not guaranteed asynchronous.
>> Right now there are some things that happen with locks taken,
>> where if we don't wait for device we lose the ability to report failures
>> to userspace. E.g. all kind of netlink things are like this.
>> One can think of a bunch of ways to address this, this
>> needs to be discussed with the relevant subsystem maintainers.
>>
>>
>> If I were you I would start with one type of device, and as simple one
>> as possible.
>>
> Make sense to me. The virtio-blk device might be a good start. We
> already have some existing interface like NBD to do similar things.
>
>>
>>> When a sysadmin trusts the userspace process enough, it can relax
>>> the limitation with a 'allow_unsafe_device_emulation' module parameter.
>> That's not a great security interface. It's a global module specific knob
>> that just allows any userspace to emulate anything at all.
>> Coming up with a reasonable interface isn't going to be easy.
>> For now maybe just have people patch their kernels if they want to
>> move fast and break things.
>>
> OK. A reasonable interface can be added if we need it in the future.
>
> Thanks,
> Yongji

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace
  2021-05-25  6:40     ` Jason Wang
@ 2021-05-25  6:48       ` Michael S. Tsirkin
  2021-05-25  7:11         ` Jason Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2021-05-25  6:48 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jens Axboe, linux-kernel, kvm, Jonathan Corbet, netdev, joro,
	Randy Dunlap, iommu, Matthew Wilcox, virtualization,
	Christoph Hellwig, Yongji Xie, Christian Brauner, bcrl, Al Viro,
	Stefan Hajnoczi, linux-fsdevel, Dan Carpenter, Mika Penttilä

On Tue, May 25, 2021 at 02:40:57PM +0800, Jason Wang wrote:
> 
> 在 2021/5/20 下午5:06, Yongji Xie 写道:
> > On Thu, May 20, 2021 at 2:06 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote:
> > > > This series introduces a framework, which can be used to implement
> > > > vDPA Devices in a userspace program. The work consist of two parts:
> > > > control path forwarding and data path offloading.
> > > > 
> > > > In the control path, the VDUSE driver will make use of message
> > > > mechnism to forward the config operation from vdpa bus driver
> > > > to userspace. Userspace can use read()/write() to receive/reply
> > > > those control messages.
> > > > 
> > > > In the data path, the core is mapping dma buffer into VDUSE
> > > > daemon's address space, which can be implemented in different ways
> > > > depending on the vdpa bus to which the vDPA device is attached.
> > > > 
> > > > In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> > > > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> > > > buffer is reside in a userspace memory region which can be shared to the
> > > > VDUSE userspace processs via transferring the shmfd.
> > > > 
> > > > The details and our user case is shown below:
> > > > 
> > > > ------------------------    -------------------------   ----------------------------------------------
> > > > |            Container |    |              QEMU(VM) |   |                               VDUSE daemon |
> > > > |       ---------      |    |  -------------------  |   | ------------------------- ---------------- |
> > > > |       |dev/vdx|      |    |  |/dev/vhost-vdpa-x|  |   | | vDPA device emulation | | block driver | |
> > > > ------------+-----------     -----------+------------   -------------+----------------------+---------
> > > >              |                           |                            |                      |
> > > >              |                           |                            |                      |
> > > > ------------+---------------------------+----------------------------+----------------------+---------
> > > > |    | block device |           |  vhost device |            | vduse driver |          | TCP/IP |    |
> > > > |    -------+--------           --------+--------            -------+--------          -----+----    |
> > > > |           |                           |                           |                       |        |
> > > > | ----------+----------       ----------+-----------         -------+-------                |        |
> > > > | | virtio-blk driver |       |  vhost-vdpa driver |         | vdpa device |                |        |
> > > > | ----------+----------       ----------+-----------         -------+-------                |        |
> > > > |           |      virtio bus           |                           |                       |        |
> > > > |   --------+----+-----------           |                           |                       |        |
> > > > |                |                      |                           |                       |        |
> > > > |      ----------+----------            |                           |                       |        |
> > > > |      | virtio-blk device |            |                           |                       |        |
> > > > |      ----------+----------            |                           |                       |        |
> > > > |                |                      |                           |                       |        |
> > > > |     -----------+-----------           |                           |                       |        |
> > > > |     |  virtio-vdpa driver |           |                           |                       |        |
> > > > |     -----------+-----------           |                           |                       |        |
> > > > |                |                      |                           |    vdpa bus           |        |
> > > > |     -----------+----------------------+---------------------------+------------           |        |
> > > > |                                                                                        ---+---     |
> > > > -----------------------------------------------------------------------------------------| NIC |------
> > > >                                                                                           ---+---
> > > >                                                                                              |
> > > >                                                                                     ---------+---------
> > > >                                                                                     | Remote Storages |
> > > >                                                                                     -------------------
> > > > 
> > > > We make use of it to implement a block device connecting to
> > > > our distributed storage, which can be used both in containers and
> > > > VMs. Thus, we can have an unified technology stack in this two cases.
> > > > 
> > > > To test it with null-blk:
> > > > 
> > > >    $ qemu-storage-daemon \
> > > >        --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
> > > >        --monitor chardev=charmonitor \
> > > >        --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \
> > > >        --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128
> > > > 
> > > > The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse
> > > > 
> > > > To make the userspace VDUSE processes such as qemu-storage-daemon able to
> > > > run unprivileged. We did some works on virtio driver to avoid trusting
> > > > device, including:
> > > > 
> > > >    - validating the device status:
> > > > 
> > > >      * https://lore.kernel.org/lkml/20210517093428.670-1-xieyongji@bytedance.com/
> > > > 
> > > >    - validating the used length:
> > > > 
> > > >      * https://lore.kernel.org/lkml/20210517090836.533-1-xieyongji@bytedance.com/
> > > > 
> > > >    - validating the device config:
> > > > 
> > > >      * patch 4 ("virtio-blk: Add validation for block size in config space")
> > > > 
> > > >    - validating the device response:
> > > > 
> > > >      * patch 5 ("virtio_scsi: Add validation for residual bytes from response")
> > > > 
> > > > Since I'm not sure if I missing something during auditing, especially on some
> > > > virtio device drivers that I'm not familiar with, now we only support emualting
> > > > a few vDPA devices by default, including: virtio-net device, virtio-blk device,
> > > > virtio-scsi device and virtio-fs device. This limitation can help to reduce
> > > > security risks.
> > > I suspect there are a lot of assumptions even with these 4.
> > > Just what are the security assumptions and guarantees here?
> 
> 
> Note that VDUSE is not the only device that may suffer from this, here're
> two others:
> 
> 1) Encrypted VM

Encrypted VMs are generally understood not to be fully
protected from attacks by a malicious hypervisor. For example
a DoS by a hypervisor is currently trivial.

> 2) Smart NICs

More or less the same thing.


> 
> > The attack surface from a virtio device is limited with IOMMU enabled.
> > It should be able to avoid security risk if we can validate all data
> > such as config space and used length from device in device driver.
> > 
> > > E.g. it seems pretty clear that exposing a malformed FS
> > > to a random kernel config can cause untold mischief.
> > > 
> > > Things like virtnet_send_command are also an easy way for
> > > the device to DOS the kernel.
> 
> 
> I think the virtnet_send_command() needs to use interrupt instead of
> polling.
> 
> Thanks
> 
> 
> > > And before you try to add
> > > an arbitrary timeout there - please don't,
> > > the fix is moving things that must be guaranteed into kernel
> > > and making things that are not guaranteed asynchronous.
> > > Right now there are some things that happen with locks taken,
> > > where if we don't wait for device we lose the ability to report failures
> > > to userspace. E.g. all kind of netlink things are like this.
> > > One can think of a bunch of ways to address this, this
> > > needs to be discussed with the relevant subsystem maintainers.
> > > 
> > > 
> > > If I were you I would start with one type of device, and as simple one
> > > as possible.
> > > 
> > Make sense to me. The virtio-blk device might be a good start. We
> > already have some existing interface like NBD to do similar things.
> > 
> > > 
> > > > When a sysadmin trusts the userspace process enough, it can relax
> > > > the limitation with a 'allow_unsafe_device_emulation' module parameter.
> > > That's not a great security interface. It's a global module specific knob
> > > that just allows any userspace to emulate anything at all.
> > > Coming up with a reasonable interface isn't going to be easy.
> > > For now maybe just have people patch their kernels if they want to
> > > move fast and break things.
> > > 
> > OK. A reasonable interface can be added if we need it in the future.
> > 
> > Thanks,
> > Yongji

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace
  2021-05-25  6:48       ` Michael S. Tsirkin
@ 2021-05-25  7:11         ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2021-05-25  7:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, linux-kernel, kvm, Jonathan Corbet, netdev, joro,
	Randy Dunlap, iommu, Matthew Wilcox, virtualization,
	Christoph Hellwig, Yongji Xie, Christian Brauner, bcrl, Al Viro,
	Stefan Hajnoczi, linux-fsdevel, Dan Carpenter, Mika Penttilä

On Tue, May 25, 2021 at 2:48 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, May 25, 2021 at 02:40:57PM +0800, Jason Wang wrote:
> >
> > 在 2021/5/20 下午5:06, Yongji Xie 写道:
> > > On Thu, May 20, 2021 at 2:06 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Mon, May 17, 2021 at 05:55:01PM +0800, Xie Yongji wrote:
> > > > > This series introduces a framework, which can be used to implement
> > > > > vDPA Devices in a userspace program. The work consist of two parts:
> > > > > control path forwarding and data path offloading.
> > > > >
> > > > > In the control path, the VDUSE driver will make use of message
> > > > > mechnism to forward the config operation from vdpa bus driver
> > > > > to userspace. Userspace can use read()/write() to receive/reply
> > > > > those control messages.
> > > > >
> > > > > In the data path, the core is mapping dma buffer into VDUSE
> > > > > daemon's address space, which can be implemented in different ways
> > > > > depending on the vdpa bus to which the vDPA device is attached.
> > > > >
> > > > > In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
> > > > > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> > > > > buffer is reside in a userspace memory region which can be shared to the
> > > > > VDUSE userspace processs via transferring the shmfd.
> > > > >
> > > > > The details and our user case is shown below:
> > > > >
> > > > > ------------------------    -------------------------   ----------------------------------------------
> > > > > |            Container |    |              QEMU(VM) |   |                               VDUSE daemon |
> > > > > |       ---------      |    |  -------------------  |   | ------------------------- ---------------- |
> > > > > |       |dev/vdx|      |    |  |/dev/vhost-vdpa-x|  |   | | vDPA device emulation | | block driver | |
> > > > > ------------+-----------     -----------+------------   -------------+----------------------+---------
> > > > >              |                           |                            |                      |
> > > > >              |                           |                            |                      |
> > > > > ------------+---------------------------+----------------------------+----------------------+---------
> > > > > |    | block device |           |  vhost device |            | vduse driver |          | TCP/IP |    |
> > > > > |    -------+--------           --------+--------            -------+--------          -----+----    |
> > > > > |           |                           |                           |                       |        |
> > > > > | ----------+----------       ----------+-----------         -------+-------                |        |
> > > > > | | virtio-blk driver |       |  vhost-vdpa driver |         | vdpa device |                |        |
> > > > > | ----------+----------       ----------+-----------         -------+-------                |        |
> > > > > |           |      virtio bus           |                           |                       |        |
> > > > > |   --------+----+-----------           |                           |                       |        |
> > > > > |                |                      |                           |                       |        |
> > > > > |      ----------+----------            |                           |                       |        |
> > > > > |      | virtio-blk device |            |                           |                       |        |
> > > > > |      ----------+----------            |                           |                       |        |
> > > > > |                |                      |                           |                       |        |
> > > > > |     -----------+-----------           |                           |                       |        |
> > > > > |     |  virtio-vdpa driver |           |                           |                       |        |
> > > > > |     -----------+-----------           |                           |                       |        |
> > > > > |                |                      |                           |    vdpa bus           |        |
> > > > > |     -----------+----------------------+---------------------------+------------           |        |
> > > > > |                                                                                        ---+---     |
> > > > > -----------------------------------------------------------------------------------------| NIC |------
> > > > >                                                                                           ---+---
> > > > >                                                                                              |
> > > > >                                                                                     ---------+---------
> > > > >                                                                                     | Remote Storages |
> > > > >                                                                                     -------------------
> > > > >
> > > > > We make use of it to implement a block device connecting to
> > > > > our distributed storage, which can be used both in containers and
> > > > > VMs. Thus, we can have an unified technology stack in this two cases.
> > > > >
> > > > > To test it with null-blk:
> > > > >
> > > > >    $ qemu-storage-daemon \
> > > > >        --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
> > > > >        --monitor chardev=charmonitor \
> > > > >        --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \
> > > > >        --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128
> > > > >
> > > > > The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse
> > > > >
> > > > > To make the userspace VDUSE processes such as qemu-storage-daemon able to
> > > > > run unprivileged. We did some works on virtio driver to avoid trusting
> > > > > device, including:
> > > > >
> > > > >    - validating the device status:
> > > > >
> > > > >      * https://lore.kernel.org/lkml/20210517093428.670-1-xieyongji@bytedance.com/
> > > > >
> > > > >    - validating the used length:
> > > > >
> > > > >      * https://lore.kernel.org/lkml/20210517090836.533-1-xieyongji@bytedance.com/
> > > > >
> > > > >    - validating the device config:
> > > > >
> > > > >      * patch 4 ("virtio-blk: Add validation for block size in config space")
> > > > >
> > > > >    - validating the device response:
> > > > >
> > > > >      * patch 5 ("virtio_scsi: Add validation for residual bytes from response")
> > > > >
> > > > > Since I'm not sure if I missing something during auditing, especially on some
> > > > > virtio device drivers that I'm not familiar with, now we only support emualting
> > > > > a few vDPA devices by default, including: virtio-net device, virtio-blk device,
> > > > > virtio-scsi device and virtio-fs device. This limitation can help to reduce
> > > > > security risks.
> > > > I suspect there are a lot of assumptions even with these 4.
> > > > Just what are the security assumptions and guarantees here?
> >
> >
> > Note that VDUSE is not the only device that may suffer from this, here're
> > two others:
> >
> > 1) Encrypted VM
>
> Encrypted VMs are generally understood not to be fully
> protected from attacks by a malicious hypervisor. For example
> a DoS by a hypervisor is currently trivial.

Right, but I mainly meant the emulated virtio-net device in the case
of an encrypted VM. We should not leak information to the
device/hypervisor.

>
> > 2) Smart NICs
>
> More or less the same thing.

In my opinion, this is more similar to VDUSE. Without an encrypted VM,
we trust the hypervisor but not the device so DOS from a device should
be eliminated.

Thanks

>
>
> >
> > > The attack surface from a virtio device is limited with IOMMU enabled.
> > > It should be able to avoid security risk if we can validate all data
> > > such as config space and used length from device in device driver.
> > >
> > > > E.g. it seems pretty clear that exposing a malformed FS
> > > > to a random kernel config can cause untold mischief.
> > > >
> > > > Things like virtnet_send_command are also an easy way for
> > > > the device to DOS the kernel.
> >
> >
> > I think the virtnet_send_command() needs to use interrupt instead of
> > polling.
> >
> > Thanks
> >
> >
> > > > And before you try to add
> > > > an arbitrary timeout there - please don't,
> > > > the fix is moving things that must be guaranteed into kernel
> > > > and making things that are not guaranteed asynchronous.
> > > > Right now there are some things that happen with locks taken,
> > > > where if we don't wait for device we lose the ability to report failures
> > > > to userspace. E.g. all kind of netlink things are like this.
> > > > One can think of a bunch of ways to address this, this
> > > > needs to be discussed with the relevant subsystem maintainers.
> > > >
> > > >
> > > > If I were you I would start with one type of device, and as simple one
> > > > as possible.
> > > >
> > > Make sense to me. The virtio-blk device might be a good start. We
> > > already have some existing interface like NBD to do similar things.
> > >
> > > >
> > > > > When a sysadmin trusts the userspace process enough, it can relax
> > > > > the limitation with a 'allow_unsafe_device_emulation' module parameter.
> > > > That's not a great security interface. It's a global module specific knob
> > > > that just allows any userspace to emulate anything at all.
> > > > Coming up with a reasonable interface isn't going to be easy.
> > > > For now maybe just have people patch their kernels if they want to
> > > > move fast and break things.
> > > >
> > > OK. A reasonable interface can be added if we need it in the future.
> > >
> > > Thanks,
> > > Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 01/12] iova: Export alloc_iova_fast()
       [not found] ` <20210517095513.850-2-xieyongji@bytedance.com>
@ 2021-05-26  2:36   ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2021-05-26  2:36 UTC (permalink / raw)
  To: Xie Yongji, mst, stefanha, sgarzare, parav, hch,
	christian.brauner, rdunlap, willy, viro, axboe, bcrl, corbet,
	mika.penttila, dan.carpenter, joro
  Cc: kvm, netdev, linux-kernel, virtualization, iommu, linux-fsdevel


在 2021/5/17 下午5:55, Xie Yongji 写道:
> Export alloc_iova_fast() so that some modules can use it
> to improve iova allocation efficiency.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> ---
>   drivers/iommu/iova.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index e6e2fa85271c..317eef64ffef 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -450,6 +450,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
>   
>   	return new_iova->pfn_lo;
>   }
> +EXPORT_SYMBOL_GPL(alloc_iova_fast);
>   
>   /**
>    * free_iova_fast - free iova pfn range into rcache


Interesting, do we need export free_iova_fast() as well?

Thanks


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 05/12] virtio_scsi: Add validation for residual bytes from response
       [not found] ` <20210517095513.850-6-xieyongji@bytedance.com>
@ 2021-05-26  2:41   ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2021-05-26  2:41 UTC (permalink / raw)
  To: Xie Yongji, mst, stefanha, sgarzare, parav, hch,
	christian.brauner, rdunlap, willy, viro, axboe, bcrl, corbet,
	mika.penttila, dan.carpenter, joro
  Cc: kvm, netdev, linux-kernel, virtualization, iommu, linux-fsdevel


在 2021/5/17 下午5:55, Xie Yongji 写道:
> This ensures that the residual bytes in response (might come
> from an untrusted device) will not exceed the data buffer length.
>
> Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
> ---
>   drivers/scsi/virtio_scsi.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index efcaf0674c8d..ad7d8cecec32 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -97,7 +97,7 @@ static inline struct Scsi_Host *virtio_scsi_host(struct virtio_device *vdev)
>   static void virtscsi_compute_resid(struct scsi_cmnd *sc, u32 resid)
>   {
>   	if (resid)
> -		scsi_set_resid(sc, resid);
> +		scsi_set_resid(sc, min(resid, scsi_bufflen(sc)));
>   }
>   
>   /*


Acked-by: Jason Wang <jasowang@redhat.com>


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found] ` <20210517095513.850-12-xieyongji@bytedance.com>
  2021-05-20  6:28   ` [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace Al Viro
@ 2021-05-27  4:12   ` Jason Wang
       [not found]     ` <CACycT3uAqa6azso_8MGreh+quj-JXO1piuGnrV8k2kTfc34N2g@mail.gmail.com>
  2021-05-31  4:56   ` Greg KH
  2 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2021-05-27  4:12 UTC (permalink / raw)
  To: Xie Yongji, mst, stefanha, sgarzare, parav, hch,
	christian.brauner, rdunlap, willy, viro, axboe, bcrl, corbet,
	mika.penttila, dan.carpenter, joro
  Cc: kvm, netdev, linux-kernel, virtualization, iommu, linux-fsdevel


在 2021/5/17 下午5:55, Xie Yongji 写道:
> +
> +static int vduse_dev_msg_sync(struct vduse_dev *dev,
> +			      struct vduse_dev_msg *msg)
> +{
> +	init_waitqueue_head(&msg->waitq);
> +	spin_lock(&dev->msg_lock);
> +	vduse_enqueue_msg(&dev->send_list, msg);
> +	wake_up(&dev->waitq);
> +	spin_unlock(&dev->msg_lock);
> +	wait_event_killable(msg->waitq, msg->completed);


What happens if the userspace(malicous) doesn't give a response forever?

It looks like a DOS. If yes, we need to consider a way to fix that.

Thanks


> +	spin_lock(&dev->msg_lock);
> +	if (!msg->completed) {
> +		list_del(&msg->list);
> +		msg->resp.result = VDUSE_REQUEST_FAILED;
> +	}
> +	spin_unlock(&dev->msg_lock);
> +
> +	return (msg->resp.result == VDUSE_REQUEST_OK) ? 0 : -1;
> +}

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found]     ` <CACycT3uAqa6azso_8MGreh+quj-JXO1piuGnrV8k2kTfc34N2g@mail.gmail.com>
@ 2021-05-27  5:00       ` Jason Wang
       [not found]         ` <CACycT3ve7YvKF+F+AnTQoJZMPua+jDvGMs_ox8GQe_=SGdeCMA@mail.gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2021-05-27  5:00 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, Jonathan Corbet, linux-kernel, kvm,
	Michael S. Tsirkin, netdev, joro, Randy Dunlap, iommu,
	Matthew Wilcox, virtualization, Christoph Hellwig,
	Christian Brauner, bcrl, Al Viro, Stefan Hajnoczi, linux-fsdevel,
	Dan Carpenter, Mika Penttilä


在 2021/5/27 下午12:57, Yongji Xie 写道:
> On Thu, May 27, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/5/17 下午5:55, Xie Yongji 写道:
>>> +
>>> +static int vduse_dev_msg_sync(struct vduse_dev *dev,
>>> +                           struct vduse_dev_msg *msg)
>>> +{
>>> +     init_waitqueue_head(&msg->waitq);
>>> +     spin_lock(&dev->msg_lock);
>>> +     vduse_enqueue_msg(&dev->send_list, msg);
>>> +     wake_up(&dev->waitq);
>>> +     spin_unlock(&dev->msg_lock);
>>> +     wait_event_killable(msg->waitq, msg->completed);
>>
>> What happens if the userspace(malicous) doesn't give a response forever?
>>
>> It looks like a DOS. If yes, we need to consider a way to fix that.
>>
> How about using wait_event_killable_timeout() instead?


Probably, and then we need choose a suitable timeout and more important, 
need to report the failure to virtio.

Thanks


>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found]         ` <CACycT3ve7YvKF+F+AnTQoJZMPua+jDvGMs_ox8GQe_=SGdeCMA@mail.gmail.com>
@ 2021-05-27  5:40           ` Jason Wang
       [not found]             ` <CACycT3ufok97cKpk47NjUBTc0QAyfauFUyuFvhWKmuqCGJ7zZw@mail.gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2021-05-27  5:40 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, Jonathan Corbet, linux-kernel, kvm,
	Michael S. Tsirkin, netdev, joro, Randy Dunlap, iommu,
	Matthew Wilcox, virtualization, Christoph Hellwig,
	Christian Brauner, bcrl, Al Viro, Stefan Hajnoczi, linux-fsdevel,
	Dan Carpenter, Mika Penttilä


在 2021/5/27 下午1:08, Yongji Xie 写道:
> On Thu, May 27, 2021 at 1:00 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/5/27 下午12:57, Yongji Xie 写道:
>>> On Thu, May 27, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2021/5/17 下午5:55, Xie Yongji 写道:
>>>>> +
>>>>> +static int vduse_dev_msg_sync(struct vduse_dev *dev,
>>>>> +                           struct vduse_dev_msg *msg)
>>>>> +{
>>>>> +     init_waitqueue_head(&msg->waitq);
>>>>> +     spin_lock(&dev->msg_lock);
>>>>> +     vduse_enqueue_msg(&dev->send_list, msg);
>>>>> +     wake_up(&dev->waitq);
>>>>> +     spin_unlock(&dev->msg_lock);
>>>>> +     wait_event_killable(msg->waitq, msg->completed);
>>>> What happens if the userspace(malicous) doesn't give a response forever?
>>>>
>>>> It looks like a DOS. If yes, we need to consider a way to fix that.
>>>>
>>> How about using wait_event_killable_timeout() instead?
>>
>> Probably, and then we need choose a suitable timeout and more important,
>> need to report the failure to virtio.
>>
> Makes sense to me. But it looks like some
> vdpa_config_ops/virtio_config_ops such as set_status() didn't have a
> return value.  Now I add a WARN_ON() for the failure. Do you mean we
> need to add some change for virtio core to handle the failure?


Maybe, but I'm not sure how hard we can do that.

We had NEEDS_RESET but it looks we don't implement it.

Or a rough idea is that maybe need some relaxing to be coupled loosely 
with userspace. E.g the device (control path) is implemented in the 
kernel but the datapath is implemented in the userspace like TUN/TAP.

Thanks

>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found]             ` <CACycT3ufok97cKpk47NjUBTc0QAyfauFUyuFvhWKmuqCGJ7zZw@mail.gmail.com>
@ 2021-05-27  8:41               ` Jason Wang
  2021-05-27  8:43                 ` Jason Wang
       [not found]                 ` <CACycT3uK_Fuade-b8FVYkGCKZnne_UGGbYRFwv7WOH2oKCsXSg@mail.gmail.com>
  0 siblings, 2 replies; 21+ messages in thread
From: Jason Wang @ 2021-05-27  8:41 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, Jonathan Corbet, linux-kernel, kvm,
	Michael S. Tsirkin, netdev, joro, Randy Dunlap, iommu,
	Matthew Wilcox, virtualization, Christoph Hellwig,
	Christian Brauner, bcrl, Al Viro, Stefan Hajnoczi, linux-fsdevel,
	Dan Carpenter, Mika Penttilä


在 2021/5/27 下午3:34, Yongji Xie 写道:
> On Thu, May 27, 2021 at 1:40 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/5/27 下午1:08, Yongji Xie 写道:
>>> On Thu, May 27, 2021 at 1:00 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2021/5/27 下午12:57, Yongji Xie 写道:
>>>>> On Thu, May 27, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2021/5/17 下午5:55, Xie Yongji 写道:
>>>>>>> +
>>>>>>> +static int vduse_dev_msg_sync(struct vduse_dev *dev,
>>>>>>> +                           struct vduse_dev_msg *msg)
>>>>>>> +{
>>>>>>> +     init_waitqueue_head(&msg->waitq);
>>>>>>> +     spin_lock(&dev->msg_lock);
>>>>>>> +     vduse_enqueue_msg(&dev->send_list, msg);
>>>>>>> +     wake_up(&dev->waitq);
>>>>>>> +     spin_unlock(&dev->msg_lock);
>>>>>>> +     wait_event_killable(msg->waitq, msg->completed);
>>>>>> What happens if the userspace(malicous) doesn't give a response forever?
>>>>>>
>>>>>> It looks like a DOS. If yes, we need to consider a way to fix that.
>>>>>>
>>>>> How about using wait_event_killable_timeout() instead?
>>>> Probably, and then we need choose a suitable timeout and more important,
>>>> need to report the failure to virtio.
>>>>
>>> Makes sense to me. But it looks like some
>>> vdpa_config_ops/virtio_config_ops such as set_status() didn't have a
>>> return value.  Now I add a WARN_ON() for the failure. Do you mean we
>>> need to add some change for virtio core to handle the failure?
>>
>> Maybe, but I'm not sure how hard we can do that.
>>
> We need to change all virtio device drivers in this way.


Probably.


>
>> We had NEEDS_RESET but it looks we don't implement it.
>>
> Could it handle the failure of get_feature() and get/set_config()?


Looks not:

"

The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state 
that a reset is needed. If DRIVER_OK is set, after it sets 
DEVICE_NEEDS_RESET, the device MUST send a device configuration change 
notification to the driver.

"

This looks implies that NEEDS_RESET may only work after device is 
probed. But in the current design, even the reset() is not reliable.


>
>> Or a rough idea is that maybe need some relaxing to be coupled loosely
>> with userspace. E.g the device (control path) is implemented in the
>> kernel but the datapath is implemented in the userspace like TUN/TAP.
>>
> I think it can work for most cases. One problem is that the set_config
> might change the behavior of the data path at runtime, e.g.
> virtnet_set_mac_address() in the virtio-net driver and
> cache_type_store() in the virtio-blk driver. Not sure if this path is
> able to return before the datapath is aware of this change.


Good point.

But set_config() should be rare:

E.g in the case of virtio-net with VERSION_1, config space is read only, 
and it was set via control vq.

For block, we can

1) start from without WCE or
2) we add a config change notification to userspace or
3) extend the spec to use vq instead of config space

Thanks


>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
  2021-05-27  8:41               ` Jason Wang
@ 2021-05-27  8:43                 ` Jason Wang
       [not found]                   ` <CACycT3s6SkER09KL_Ns9d03quYSKOuZwd3=HJ_s1SL7eH7y5gA@mail.gmail.com>
       [not found]                 ` <CACycT3uK_Fuade-b8FVYkGCKZnne_UGGbYRFwv7WOH2oKCsXSg@mail.gmail.com>
  1 sibling, 1 reply; 21+ messages in thread
From: Jason Wang @ 2021-05-27  8:43 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, Jonathan Corbet, linux-kernel, kvm,
	Michael S. Tsirkin, netdev, joro, Randy Dunlap, iommu,
	Matthew Wilcox, virtualization, Christoph Hellwig,
	Christian Brauner, bcrl, Al Viro, Stefan Hajnoczi, linux-fsdevel,
	Dan Carpenter, Mika Penttilä


在 2021/5/27 下午4:41, Jason Wang 写道:
>
> 在 2021/5/27 下午3:34, Yongji Xie 写道:
>> On Thu, May 27, 2021 at 1:40 PM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> 在 2021/5/27 下午1:08, Yongji Xie 写道:
>>>> On Thu, May 27, 2021 at 1:00 PM Jason Wang <jasowang@redhat.com> 
>>>> wrote:
>>>>> 在 2021/5/27 下午12:57, Yongji Xie 写道:
>>>>>> On Thu, May 27, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> 
>>>>>> wrote:
>>>>>>> 在 2021/5/17 下午5:55, Xie Yongji 写道:
>>>>>>>> +
>>>>>>>> +static int vduse_dev_msg_sync(struct vduse_dev *dev,
>>>>>>>> +                           struct vduse_dev_msg *msg)
>>>>>>>> +{
>>>>>>>> +     init_waitqueue_head(&msg->waitq);
>>>>>>>> +     spin_lock(&dev->msg_lock);
>>>>>>>> +     vduse_enqueue_msg(&dev->send_list, msg);
>>>>>>>> +     wake_up(&dev->waitq);
>>>>>>>> +     spin_unlock(&dev->msg_lock);
>>>>>>>> +     wait_event_killable(msg->waitq, msg->completed);
>>>>>>> What happens if the userspace(malicous) doesn't give a response 
>>>>>>> forever?
>>>>>>>
>>>>>>> It looks like a DOS. If yes, we need to consider a way to fix that.
>>>>>>>
>>>>>> How about using wait_event_killable_timeout() instead?
>>>>> Probably, and then we need choose a suitable timeout and more 
>>>>> important,
>>>>> need to report the failure to virtio.
>>>>>
>>>> Makes sense to me. But it looks like some
>>>> vdpa_config_ops/virtio_config_ops such as set_status() didn't have a
>>>> return value.  Now I add a WARN_ON() for the failure. Do you mean we
>>>> need to add some change for virtio core to handle the failure?
>>>
>>> Maybe, but I'm not sure how hard we can do that.
>>>
>> We need to change all virtio device drivers in this way.
>
>
> Probably.
>
>
>>
>>> We had NEEDS_RESET but it looks we don't implement it.
>>>
>> Could it handle the failure of get_feature() and get/set_config()?
>
>
> Looks not:
>
> "
>
> The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state 
> that a reset is needed. If DRIVER_OK is set, after it sets 
> DEVICE_NEEDS_RESET, the device MUST send a device configuration change 
> notification to the driver.
>
> "
>
> This looks implies that NEEDS_RESET may only work after device is 
> probed. But in the current design, even the reset() is not reliable.
>
>
>>
>>> Or a rough idea is that maybe need some relaxing to be coupled loosely
>>> with userspace. E.g the device (control path) is implemented in the
>>> kernel but the datapath is implemented in the userspace like TUN/TAP.
>>>
>> I think it can work for most cases. One problem is that the set_config
>> might change the behavior of the data path at runtime, e.g.
>> virtnet_set_mac_address() in the virtio-net driver and
>> cache_type_store() in the virtio-blk driver. Not sure if this path is
>> able to return before the datapath is aware of this change.
>
>
> Good point.
>
> But set_config() should be rare:
>
> E.g in the case of virtio-net with VERSION_1, config space is read 
> only, and it was set via control vq.
>
> For block, we can
>
> 1) start from without WCE or
> 2) we add a config change notification to userspace or
> 3) extend the spec to use vq instead of config space
>
> Thanks


Another thing if we want to go this way:

We need find a way to terminate the data path from the kernel side, to 
implement to reset semantic.

Thanks


>
>
>>
>> Thanks,
>> Yongji
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found]                   ` <CACycT3s6SkER09KL_Ns9d03quYSKOuZwd3=HJ_s1SL7eH7y5gA@mail.gmail.com>
@ 2021-05-28  1:33                     ` Jason Wang
       [not found]                       ` <CACycT3vKZ3y0gga8PrSVtssZfNV0Y-A8=iYZSi9sbpHRNkVf-A@mail.gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2021-05-28  1:33 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, Jonathan Corbet, linux-kernel, kvm,
	Michael S. Tsirkin, netdev, joro, Randy Dunlap, iommu,
	Matthew Wilcox, virtualization, Christoph Hellwig,
	Christian Brauner, bcrl, Al Viro, Stefan Hajnoczi, linux-fsdevel,
	Dan Carpenter, Mika Penttilä


在 2021/5/27 下午6:14, Yongji Xie 写道:
> On Thu, May 27, 2021 at 4:43 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/5/27 下午4:41, Jason Wang 写道:
>>> 在 2021/5/27 下午3:34, Yongji Xie 写道:
>>>> On Thu, May 27, 2021 at 1:40 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>> 在 2021/5/27 下午1:08, Yongji Xie 写道:
>>>>>> On Thu, May 27, 2021 at 1:00 PM Jason Wang <jasowang@redhat.com>
>>>>>> wrote:
>>>>>>> 在 2021/5/27 下午12:57, Yongji Xie 写道:
>>>>>>>> On Thu, May 27, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com>
>>>>>>>> wrote:
>>>>>>>>> 在 2021/5/17 下午5:55, Xie Yongji 写道:
>>>>>>>>>> +
>>>>>>>>>> +static int vduse_dev_msg_sync(struct vduse_dev *dev,
>>>>>>>>>> +                           struct vduse_dev_msg *msg)
>>>>>>>>>> +{
>>>>>>>>>> +     init_waitqueue_head(&msg->waitq);
>>>>>>>>>> +     spin_lock(&dev->msg_lock);
>>>>>>>>>> +     vduse_enqueue_msg(&dev->send_list, msg);
>>>>>>>>>> +     wake_up(&dev->waitq);
>>>>>>>>>> +     spin_unlock(&dev->msg_lock);
>>>>>>>>>> +     wait_event_killable(msg->waitq, msg->completed);
>>>>>>>>> What happens if the userspace(malicous) doesn't give a response
>>>>>>>>> forever?
>>>>>>>>>
>>>>>>>>> It looks like a DOS. If yes, we need to consider a way to fix that.
>>>>>>>>>
>>>>>>>> How about using wait_event_killable_timeout() instead?
>>>>>>> Probably, and then we need choose a suitable timeout and more
>>>>>>> important,
>>>>>>> need to report the failure to virtio.
>>>>>>>
>>>>>> Makes sense to me. But it looks like some
>>>>>> vdpa_config_ops/virtio_config_ops such as set_status() didn't have a
>>>>>> return value.  Now I add a WARN_ON() for the failure. Do you mean we
>>>>>> need to add some change for virtio core to handle the failure?
>>>>> Maybe, but I'm not sure how hard we can do that.
>>>>>
>>>> We need to change all virtio device drivers in this way.
>>>
>>> Probably.
>>>
>>>
>>>>> We had NEEDS_RESET but it looks we don't implement it.
>>>>>
>>>> Could it handle the failure of get_feature() and get/set_config()?
>>>
>>> Looks not:
>>>
>>> "
>>>
>>> The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>>> that a reset is needed. If DRIVER_OK is set, after it sets
>>> DEVICE_NEEDS_RESET, the device MUST send a device configuration change
>>> notification to the driver.
>>>
>>> "
>>>
>>> This looks implies that NEEDS_RESET may only work after device is
>>> probed. But in the current design, even the reset() is not reliable.
>>>
>>>
>>>>> Or a rough idea is that maybe need some relaxing to be coupled loosely
>>>>> with userspace. E.g the device (control path) is implemented in the
>>>>> kernel but the datapath is implemented in the userspace like TUN/TAP.
>>>>>
>>>> I think it can work for most cases. One problem is that the set_config
>>>> might change the behavior of the data path at runtime, e.g.
>>>> virtnet_set_mac_address() in the virtio-net driver and
>>>> cache_type_store() in the virtio-blk driver. Not sure if this path is
>>>> able to return before the datapath is aware of this change.
>>>
>>> Good point.
>>>
>>> But set_config() should be rare:
>>>
>>> E.g in the case of virtio-net with VERSION_1, config space is read
>>> only, and it was set via control vq.
>>>
>>> For block, we can
>>>
>>> 1) start from without WCE or
>>> 2) we add a config change notification to userspace or
>>> 3) extend the spec to use vq instead of config space
>>>
>>> Thanks
>>
>> Another thing if we want to go this way:
>>
>> We need find a way to terminate the data path from the kernel side, to
>> implement to reset semantic.
>>
> Do you mean terminate the data path in vdpa_reset().


Yes.


>   Is it ok to just
> notify userspace to stop data path asynchronously?


For well-behaved userspace, yes but no for buggy or malicious ones.

I had an idea, how about terminate IOTLB in this case? Then we're in 
fact turn datapath off.

Thanks


>   Userspace should
> not be able to do any I/O at that time because the iotlb mapping is
> already removed.
>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found]                 ` <CACycT3uK_Fuade-b8FVYkGCKZnne_UGGbYRFwv7WOH2oKCsXSg@mail.gmail.com>
@ 2021-05-28  2:31                   ` Jason Wang
       [not found]                     ` <CACycT3tLj6a7-tbqO9SzCLStwYrOALdkfnt1jxQBv3s0VzD6AQ@mail.gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2021-05-28  2:31 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, Jonathan Corbet, linux-kernel, kvm,
	Michael S. Tsirkin, netdev, joro, Randy Dunlap, iommu,
	Matthew Wilcox, virtualization, Christoph Hellwig,
	Christian Brauner, bcrl, Al Viro, Stefan Hajnoczi, linux-fsdevel,
	Dan Carpenter, Mika Penttilä


在 2021/5/27 下午9:17, Yongji Xie 写道:
> On Thu, May 27, 2021 at 4:41 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/5/27 下午3:34, Yongji Xie 写道:
>>> On Thu, May 27, 2021 at 1:40 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2021/5/27 下午1:08, Yongji Xie 写道:
>>>>> On Thu, May 27, 2021 at 1:00 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2021/5/27 下午12:57, Yongji Xie 写道:
>>>>>>> On Thu, May 27, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>> 在 2021/5/17 下午5:55, Xie Yongji 写道:
>>>>>>>>> +
>>>>>>>>> +static int vduse_dev_msg_sync(struct vduse_dev *dev,
>>>>>>>>> +                           struct vduse_dev_msg *msg)
>>>>>>>>> +{
>>>>>>>>> +     init_waitqueue_head(&msg->waitq);
>>>>>>>>> +     spin_lock(&dev->msg_lock);
>>>>>>>>> +     vduse_enqueue_msg(&dev->send_list, msg);
>>>>>>>>> +     wake_up(&dev->waitq);
>>>>>>>>> +     spin_unlock(&dev->msg_lock);
>>>>>>>>> +     wait_event_killable(msg->waitq, msg->completed);
>>>>>>>> What happens if the userspace(malicous) doesn't give a response forever?
>>>>>>>>
>>>>>>>> It looks like a DOS. If yes, we need to consider a way to fix that.
>>>>>>>>
>>>>>>> How about using wait_event_killable_timeout() instead?
>>>>>> Probably, and then we need choose a suitable timeout and more important,
>>>>>> need to report the failure to virtio.
>>>>>>
>>>>> Makes sense to me. But it looks like some
>>>>> vdpa_config_ops/virtio_config_ops such as set_status() didn't have a
>>>>> return value.  Now I add a WARN_ON() for the failure. Do you mean we
>>>>> need to add some change for virtio core to handle the failure?
>>>> Maybe, but I'm not sure how hard we can do that.
>>>>
>>> We need to change all virtio device drivers in this way.
>>
>> Probably.
>>
>>
>>>> We had NEEDS_RESET but it looks we don't implement it.
>>>>
>>> Could it handle the failure of get_feature() and get/set_config()?
>>
>> Looks not:
>>
>> "
>>
>> The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>> that a reset is needed. If DRIVER_OK is set, after it sets
>> DEVICE_NEEDS_RESET, the device MUST send a device configuration change
>> notification to the driver.
>>
>> "
>>
>> This looks implies that NEEDS_RESET may only work after device is
>> probed. But in the current design, even the reset() is not reliable.
>>
>>
>>>> Or a rough idea is that maybe need some relaxing to be coupled loosely
>>>> with userspace. E.g the device (control path) is implemented in the
>>>> kernel but the datapath is implemented in the userspace like TUN/TAP.
>>>>
>>> I think it can work for most cases. One problem is that the set_config
>>> might change the behavior of the data path at runtime, e.g.
>>> virtnet_set_mac_address() in the virtio-net driver and
>>> cache_type_store() in the virtio-blk driver. Not sure if this path is
>>> able to return before the datapath is aware of this change.
>>
>> Good point.
>>
>> But set_config() should be rare:
>>
>> E.g in the case of virtio-net with VERSION_1, config space is read only,
>> and it was set via control vq.
>>
>> For block, we can
>>
>> 1) start from without WCE or
>> 2) we add a config change notification to userspace or
> I prefer this way. And I think we also need to do similar things for
> set/get_vq_state().


Yes, I agree.

Thanks


>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found]                       ` <CACycT3vKZ3y0gga8PrSVtssZfNV0Y-A8=iYZSi9sbpHRNkVf-A@mail.gmail.com>
@ 2021-05-28  6:38                         ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2021-05-28  6:38 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, Jonathan Corbet, linux-kernel, kvm,
	Michael S. Tsirkin, netdev, joro, Randy Dunlap, iommu,
	Matthew Wilcox, virtualization, Christoph Hellwig,
	Christian Brauner, bcrl, Al Viro, Stefan Hajnoczi, linux-fsdevel,
	Dan Carpenter, Mika Penttilä


在 2021/5/28 上午11:54, Yongji Xie 写道:
> On Fri, May 28, 2021 at 9:33 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/5/27 下午6:14, Yongji Xie 写道:
>>> On Thu, May 27, 2021 at 4:43 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2021/5/27 下午4:41, Jason Wang 写道:
>>>>> 在 2021/5/27 下午3:34, Yongji Xie 写道:
>>>>>> On Thu, May 27, 2021 at 1:40 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>> 在 2021/5/27 下午1:08, Yongji Xie 写道:
>>>>>>>> On Thu, May 27, 2021 at 1:00 PM Jason Wang <jasowang@redhat.com>
>>>>>>>> wrote:
>>>>>>>>> 在 2021/5/27 下午12:57, Yongji Xie 写道:
>>>>>>>>>> On Thu, May 27, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 在 2021/5/17 下午5:55, Xie Yongji 写道:
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int vduse_dev_msg_sync(struct vduse_dev *dev,
>>>>>>>>>>>> +                           struct vduse_dev_msg *msg)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +     init_waitqueue_head(&msg->waitq);
>>>>>>>>>>>> +     spin_lock(&dev->msg_lock);
>>>>>>>>>>>> +     vduse_enqueue_msg(&dev->send_list, msg);
>>>>>>>>>>>> +     wake_up(&dev->waitq);
>>>>>>>>>>>> +     spin_unlock(&dev->msg_lock);
>>>>>>>>>>>> +     wait_event_killable(msg->waitq, msg->completed);
>>>>>>>>>>> What happens if the userspace(malicous) doesn't give a response
>>>>>>>>>>> forever?
>>>>>>>>>>>
>>>>>>>>>>> It looks like a DOS. If yes, we need to consider a way to fix that.
>>>>>>>>>>>
>>>>>>>>>> How about using wait_event_killable_timeout() instead?
>>>>>>>>> Probably, and then we need choose a suitable timeout and more
>>>>>>>>> important,
>>>>>>>>> need to report the failure to virtio.
>>>>>>>>>
>>>>>>>> Makes sense to me. But it looks like some
>>>>>>>> vdpa_config_ops/virtio_config_ops such as set_status() didn't have a
>>>>>>>> return value.  Now I add a WARN_ON() for the failure. Do you mean we
>>>>>>>> need to add some change for virtio core to handle the failure?
>>>>>>> Maybe, but I'm not sure how hard we can do that.
>>>>>>>
>>>>>> We need to change all virtio device drivers in this way.
>>>>> Probably.
>>>>>
>>>>>
>>>>>>> We had NEEDS_RESET but it looks we don't implement it.
>>>>>>>
>>>>>> Could it handle the failure of get_feature() and get/set_config()?
>>>>> Looks not:
>>>>>
>>>>> "
>>>>>
>>>>> The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>>>>> that a reset is needed. If DRIVER_OK is set, after it sets
>>>>> DEVICE_NEEDS_RESET, the device MUST send a device configuration change
>>>>> notification to the driver.
>>>>>
>>>>> "
>>>>>
>>>>> This looks implies that NEEDS_RESET may only work after device is
>>>>> probed. But in the current design, even the reset() is not reliable.
>>>>>
>>>>>
>>>>>>> Or a rough idea is that maybe need some relaxing to be coupled loosely
>>>>>>> with userspace. E.g the device (control path) is implemented in the
>>>>>>> kernel but the datapath is implemented in the userspace like TUN/TAP.
>>>>>>>
>>>>>> I think it can work for most cases. One problem is that the set_config
>>>>>> might change the behavior of the data path at runtime, e.g.
>>>>>> virtnet_set_mac_address() in the virtio-net driver and
>>>>>> cache_type_store() in the virtio-blk driver. Not sure if this path is
>>>>>> able to return before the datapath is aware of this change.
>>>>> Good point.
>>>>>
>>>>> But set_config() should be rare:
>>>>>
>>>>> E.g in the case of virtio-net with VERSION_1, config space is read
>>>>> only, and it was set via control vq.
>>>>>
>>>>> For block, we can
>>>>>
>>>>> 1) start from without WCE or
>>>>> 2) we add a config change notification to userspace or
>>>>> 3) extend the spec to use vq instead of config space
>>>>>
>>>>> Thanks
>>>> Another thing if we want to go this way:
>>>>
>>>> We need find a way to terminate the data path from the kernel side, to
>>>> implement to reset semantic.
>>>>
>>> Do you mean terminate the data path in vdpa_reset().
>>
>> Yes.
>>
>>
>>>    Is it ok to just
>>> notify userspace to stop data path asynchronously?
>>
>> For well-behaved userspace, yes but no for buggy or malicious ones.
>>
> But the buggy or malicious daemons can't do anything if my
> understanding is correct.


You're right. I originally thought there can still have bouncing. But 
consider we don't do that during fault.

It should be safe.


>
>> I had an idea, how about terminate IOTLB in this case? Then we're in
>> fact turn datapath off.
>>
> Sorry, I didn't get your point here. What do you mean by terminating
> IOTLB?


I meant terminate the bouncing but it looks safe after a second thought :)

Thanks


>   Remove iotlb mapping? But userspace can still access the mapped
> region.
>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found]                     ` <CACycT3tLj6a7-tbqO9SzCLStwYrOALdkfnt1jxQBv3s0VzD6AQ@mail.gmail.com>
@ 2021-05-31  4:38                       ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2021-05-31  4:38 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jens Axboe, Jonathan Corbet, linux-kernel, kvm,
	Michael S. Tsirkin, netdev, joro, Randy Dunlap, iommu,
	Matthew Wilcox, virtualization, Christoph Hellwig,
	Christian Brauner, bcrl, Al Viro, Stefan Hajnoczi, linux-fsdevel,
	Dan Carpenter, Mika Penttilä


在 2021/5/31 下午12:27, Yongji Xie 写道:
> On Fri, May 28, 2021 at 10:31 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/5/27 下午9:17, Yongji Xie 写道:
>>> On Thu, May 27, 2021 at 4:41 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2021/5/27 下午3:34, Yongji Xie 写道:
>>>>> On Thu, May 27, 2021 at 1:40 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2021/5/27 下午1:08, Yongji Xie 写道:
>>>>>>> On Thu, May 27, 2021 at 1:00 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>> 在 2021/5/27 下午12:57, Yongji Xie 写道:
>>>>>>>>> On Thu, May 27, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>> 在 2021/5/17 下午5:55, Xie Yongji 写道:
>>>>>>>>>>> +
>>>>>>>>>>> +static int vduse_dev_msg_sync(struct vduse_dev *dev,
>>>>>>>>>>> +                           struct vduse_dev_msg *msg)
>>>>>>>>>>> +{
>>>>>>>>>>> +     init_waitqueue_head(&msg->waitq);
>>>>>>>>>>> +     spin_lock(&dev->msg_lock);
>>>>>>>>>>> +     vduse_enqueue_msg(&dev->send_list, msg);
>>>>>>>>>>> +     wake_up(&dev->waitq);
>>>>>>>>>>> +     spin_unlock(&dev->msg_lock);
>>>>>>>>>>> +     wait_event_killable(msg->waitq, msg->completed);
>>>>>>>>>> What happens if the userspace(malicous) doesn't give a response forever?
>>>>>>>>>>
>>>>>>>>>> It looks like a DOS. If yes, we need to consider a way to fix that.
>>>>>>>>>>
>>>>>>>>> How about using wait_event_killable_timeout() instead?
>>>>>>>> Probably, and then we need choose a suitable timeout and more important,
>>>>>>>> need to report the failure to virtio.
>>>>>>>>
>>>>>>> Makes sense to me. But it looks like some
>>>>>>> vdpa_config_ops/virtio_config_ops such as set_status() didn't have a
>>>>>>> return value.  Now I add a WARN_ON() for the failure. Do you mean we
>>>>>>> need to add some change for virtio core to handle the failure?
>>>>>> Maybe, but I'm not sure how hard we can do that.
>>>>>>
>>>>> We need to change all virtio device drivers in this way.
>>>> Probably.
>>>>
>>>>
>>>>>> We had NEEDS_RESET but it looks we don't implement it.
>>>>>>
>>>>> Could it handle the failure of get_feature() and get/set_config()?
>>>> Looks not:
>>>>
>>>> "
>>>>
>>>> The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>>>> that a reset is needed. If DRIVER_OK is set, after it sets
>>>> DEVICE_NEEDS_RESET, the device MUST send a device configuration change
>>>> notification to the driver.
>>>>
>>>> "
>>>>
>>>> This looks implies that NEEDS_RESET may only work after device is
>>>> probed. But in the current design, even the reset() is not reliable.
>>>>
>>>>
>>>>>> Or a rough idea is that maybe need some relaxing to be coupled loosely
>>>>>> with userspace. E.g the device (control path) is implemented in the
>>>>>> kernel but the datapath is implemented in the userspace like TUN/TAP.
>>>>>>
>>>>> I think it can work for most cases. One problem is that the set_config
>>>>> might change the behavior of the data path at runtime, e.g.
>>>>> virtnet_set_mac_address() in the virtio-net driver and
>>>>> cache_type_store() in the virtio-blk driver. Not sure if this path is
>>>>> able to return before the datapath is aware of this change.
>>>> Good point.
>>>>
>>>> But set_config() should be rare:
>>>>
>>>> E.g in the case of virtio-net with VERSION_1, config space is read only,
>>>> and it was set via control vq.
>>>>
>>>> For block, we can
>>>>
>>>> 1) start from without WCE or
>>>> 2) we add a config change notification to userspace or
>>> I prefer this way. And I think we also need to do similar things for
>>> set/get_vq_state().
>>
>> Yes, I agree.
>>
> Hi Jason,
>
> Now I'm working on this. But I found the config change notification
> must be synchronous in the virtio-blk case, which means the kernel
> still needs to wait for the response from userspace in set_config().
> Otherwise, some I/Os might still run the old way after we change the
> cache_type in sysfs.
>
> The simple ways to solve this problem are:
>
> 1. Only support read-only config space, disable WCE as you suggested
> 2. Add a return value to set_config() and handle the failure only in
> virtio-blk driver
> 3. Print some warnings after timeout since it only affects the
> dataplane which is under userspace's control
>
> Any suggestions?


Let's go without WCE first and make VDUSE work first. We can then think 
of a solution for WCE on top.

Thanks


>
> Thanks,
> Yongji
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found] ` <20210517095513.850-12-xieyongji@bytedance.com>
  2021-05-20  6:28   ` [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace Al Viro
  2021-05-27  4:12   ` Jason Wang
@ 2021-05-31  4:56   ` Greg KH
       [not found]     ` <CACycT3vRHPfOGxmy1Uv=8_dqqq8iG4YTZHUizo+y8EYKGS5g8g@mail.gmail.com>
  2 siblings, 1 reply; 21+ messages in thread
From: Greg KH @ 2021-05-31  4:56 UTC (permalink / raw)
  To: Xie Yongji
  Cc: kvm, mst, virtualization, christian.brauner, corbet, joro, willy,
	hch, dan.carpenter, viro, stefanha, axboe, netdev, rdunlap,
	linux-kernel, iommu, bcrl, linux-fsdevel, mika.penttila

On Mon, May 17, 2021 at 05:55:12PM +0800, Xie Yongji wrote:
> +struct vduse_dev {
> +	struct vduse_vdpa *vdev;
> +	struct device dev;
> +	struct cdev cdev;

You now have 2 reference counted devices controling the lifespace of a
single structure.  A mess that is guaranteed to go wrong.  Please never
do this.

> +	struct vduse_virtqueue *vqs;
> +	struct vduse_iova_domain *domain;
> +	char *name;
> +	struct mutex lock;
> +	spinlock_t msg_lock;
> +	atomic64_t msg_unique;

Why do you need an atomic and a lock?

> +	wait_queue_head_t waitq;
> +	struct list_head send_list;
> +	struct list_head recv_list;
> +	struct list_head list;
> +	struct vdpa_callback config_cb;
> +	struct work_struct inject;
> +	spinlock_t irq_lock;
> +	unsigned long api_version;
> +	bool connected;
> +	int minor;
> +	u16 vq_size_max;
> +	u32 vq_num;
> +	u32 vq_align;
> +	u32 config_size;
> +	u32 device_id;
> +	u32 vendor_id;
> +};
> +
> +struct vduse_dev_msg {
> +	struct vduse_dev_request req;
> +	struct vduse_dev_response resp;
> +	struct list_head list;
> +	wait_queue_head_t waitq;
> +	bool completed;
> +};
> +
> +struct vduse_control {
> +	unsigned long api_version;

u64?

> +};
> +
> +static unsigned long max_bounce_size = (64 * 1024 * 1024);
> +module_param(max_bounce_size, ulong, 0444);
> +MODULE_PARM_DESC(max_bounce_size, "Maximum bounce buffer size. (default: 64M)");
> +
> +static unsigned long max_iova_size = (128 * 1024 * 1024);
> +module_param(max_iova_size, ulong, 0444);
> +MODULE_PARM_DESC(max_iova_size, "Maximum iova space size (default: 128M)");
> +
> +static bool allow_unsafe_device_emulation;
> +module_param(allow_unsafe_device_emulation, bool, 0444);
> +MODULE_PARM_DESC(allow_unsafe_device_emulation, "Allow emulating unsafe device."
> +	" We must make sure the userspace device emulation process is trusted."
> +	" Otherwise, don't enable this option. (default: false)");
> +

This is not the 1990's anymore, please never use module parameters, make
these per-device attributes if you really need them.

> +static int vduse_init(void)
> +{
> +	int ret;
> +
> +	if (max_bounce_size >= max_iova_size)
> +		return -EINVAL;
> +
> +	ret = misc_register(&vduse_misc);
> +	if (ret)
> +		return ret;
> +
> +	vduse_class = class_create(THIS_MODULE, "vduse");

If you have a misc device, you do not need to create a class at the same
time.  Why are you doing both here?  Just stick with the misc device, no
need for anything else.

> +	if (IS_ERR(vduse_class)) {
> +		ret = PTR_ERR(vduse_class);
> +		goto err_class;
> +	}
> +	vduse_class->devnode = vduse_devnode;
> +
> +	ret = alloc_chrdev_region(&vduse_major, 0, VDUSE_DEV_MAX, "vduse");

Wait, you want a whole major?  What is the misc device for?

> +MODULE_VERSION(DRV_VERSION);

MODULE_VERSION() makes no sense when the code is merged into the kernel
tree, so you can just drop that.

thanks,

greg k-h
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Re: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace
       [not found]     ` <CACycT3vRHPfOGxmy1Uv=8_dqqq8iG4YTZHUizo+y8EYKGS5g8g@mail.gmail.com>
@ 2021-05-31  6:32       ` Greg KH
  0 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2021-05-31  6:32 UTC (permalink / raw)
  To: Yongji Xie
  Cc: kvm, Michael S. Tsirkin, virtualization, Christian Brauner,
	Jonathan Corbet, joro, Matthew Wilcox, Christoph Hellwig,
	Dan Carpenter, Al Viro, Stefan Hajnoczi, Jens Axboe, netdev,
	Randy Dunlap, linux-kernel, iommu, bcrl, linux-fsdevel,
	Mika Penttilä

On Mon, May 31, 2021 at 02:19:37PM +0800, Yongji Xie wrote:
> Hi Greg,
> 
> Thanks a lot for the review!
> 
> On Mon, May 31, 2021 at 12:56 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, May 17, 2021 at 05:55:12PM +0800, Xie Yongji wrote:
> > > +struct vduse_dev {
> > > +     struct vduse_vdpa *vdev;
> > > +     struct device dev;
> > > +     struct cdev cdev;
> >
> > You now have 2 reference counted devices controling the lifespace of a
> > single structure.  A mess that is guaranteed to go wrong.  Please never
> > do this.
> >
> 
> These two are both used by cdev_device_add(). Looks like I didn't find
> any problem. Any suggestions?

Make one of these dynamic and do not have them both control the lifespan
of the structure.

> > > +     struct vduse_virtqueue *vqs;
> > > +     struct vduse_iova_domain *domain;
> > > +     char *name;
> > > +     struct mutex lock;
> > > +     spinlock_t msg_lock;
> > > +     atomic64_t msg_unique;
> >
> > Why do you need an atomic and a lock?
> >
> 
> You are right. We don't need an atomic here.
> 
> > > +     wait_queue_head_t waitq;
> > > +     struct list_head send_list;
> > > +     struct list_head recv_list;
> > > +     struct list_head list;
> > > +     struct vdpa_callback config_cb;
> > > +     struct work_struct inject;
> > > +     spinlock_t irq_lock;
> > > +     unsigned long api_version;
> > > +     bool connected;
> > > +     int minor;
> > > +     u16 vq_size_max;
> > > +     u32 vq_num;
> > > +     u32 vq_align;
> > > +     u32 config_size;
> > > +     u32 device_id;
> > > +     u32 vendor_id;
> > > +};
> > > +
> > > +struct vduse_dev_msg {
> > > +     struct vduse_dev_request req;
> > > +     struct vduse_dev_response resp;
> > > +     struct list_head list;
> > > +     wait_queue_head_t waitq;
> > > +     bool completed;
> > > +};
> > > +
> > > +struct vduse_control {
> > > +     unsigned long api_version;
> >
> > u64?
> >
> 
> OK.
> 
> > > +};
> > > +
> > > +static unsigned long max_bounce_size = (64 * 1024 * 1024);
> > > +module_param(max_bounce_size, ulong, 0444);
> > > +MODULE_PARM_DESC(max_bounce_size, "Maximum bounce buffer size. (default: 64M)");
> > > +
> > > +static unsigned long max_iova_size = (128 * 1024 * 1024);
> > > +module_param(max_iova_size, ulong, 0444);
> > > +MODULE_PARM_DESC(max_iova_size, "Maximum iova space size (default: 128M)");
> > > +
> > > +static bool allow_unsafe_device_emulation;
> > > +module_param(allow_unsafe_device_emulation, bool, 0444);
> > > +MODULE_PARM_DESC(allow_unsafe_device_emulation, "Allow emulating unsafe device."
> > > +     " We must make sure the userspace device emulation process is trusted."
> > > +     " Otherwise, don't enable this option. (default: false)");
> > > +
> >
> > This is not the 1990's anymore, please never use module parameters, make
> > these per-device attributes if you really need them.
> >
> 
> These parameters will be used before the device is created. Or do you
> mean add some attributes to the control device?

You need to do something, as no one can mess with a module parameter
easily.  Why do you need them at all, shouldn't it "just work" properly
with no need for userspace interaction?

> > > +static int vduse_init(void)
> > > +{
> > > +     int ret;
> > > +
> > > +     if (max_bounce_size >= max_iova_size)
> > > +             return -EINVAL;
> > > +
> > > +     ret = misc_register(&vduse_misc);
> > > +     if (ret)
> > > +             return ret;
> > > +
> > > +     vduse_class = class_create(THIS_MODULE, "vduse");
> >
> > If you have a misc device, you do not need to create a class at the same
> > time.  Why are you doing both here?  Just stick with the misc device, no
> > need for anything else.
> >
> 
> The misc device is the control device represented by
> /dev/vduse/control. Then a VDUSE device represented by
> /dev/vduse/$NAME can be created by the ioctl(VDUSE_CREATE_DEV) on this
> control device.

Ah.  Then how about using the same MAJOR for all of these, and just have
the first minor (0) be your control?  That happens for other device
types (raw, loop, etc.).  Or just document this really well please, as
it was not obvious what you were doing here.

thanks,

greg k-h
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-05-31  6:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20210517095513.850-1-xieyongji@bytedance.com>
     [not found] ` <20210517095513.850-5-xieyongji@bytedance.com>
     [not found]   ` <CACycT3s1rEvNnNkJKQsHGRsyLPADieFdVkb1Sp3GObR0Vox5Fg@mail.gmail.com>
2021-05-19 14:42     ` [PATCH v7 04/12] virtio-blk: Add validation for block size in config space Dan Carpenter
     [not found]       ` <CACycT3veubBFCg9omxLDJJFP7B7QH8++Q+tKmb_M_hmNS45cmw@mail.gmail.com>
2021-05-20  5:43         ` Michael S. Tsirkin
2021-05-20  6:06 ` [PATCH v7 00/12] Introduce VDUSE - vDPA Device in Userspace Michael S. Tsirkin
     [not found]   ` <CACycT3tKY2V=dmOJjeiZxkqA3cH8_KF93NNbRnNU04e5Job2cw@mail.gmail.com>
2021-05-25  6:40     ` Jason Wang
2021-05-25  6:48       ` Michael S. Tsirkin
2021-05-25  7:11         ` Jason Wang
     [not found] ` <20210517095513.850-3-xieyongji@bytedance.com>
2021-05-20  6:18   ` [PATCH v7 02/12] file: Export receive_fd() to modules Al Viro
     [not found] ` <20210517095513.850-2-xieyongji@bytedance.com>
2021-05-26  2:36   ` [PATCH v7 01/12] iova: Export alloc_iova_fast() Jason Wang
     [not found] ` <20210517095513.850-6-xieyongji@bytedance.com>
2021-05-26  2:41   ` [PATCH v7 05/12] virtio_scsi: Add validation for residual bytes from response Jason Wang
     [not found] ` <20210517095513.850-12-xieyongji@bytedance.com>
2021-05-20  6:28   ` [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace Al Viro
2021-05-27  4:12   ` Jason Wang
     [not found]     ` <CACycT3uAqa6azso_8MGreh+quj-JXO1piuGnrV8k2kTfc34N2g@mail.gmail.com>
2021-05-27  5:00       ` Jason Wang
     [not found]         ` <CACycT3ve7YvKF+F+AnTQoJZMPua+jDvGMs_ox8GQe_=SGdeCMA@mail.gmail.com>
2021-05-27  5:40           ` Jason Wang
     [not found]             ` <CACycT3ufok97cKpk47NjUBTc0QAyfauFUyuFvhWKmuqCGJ7zZw@mail.gmail.com>
2021-05-27  8:41               ` Jason Wang
2021-05-27  8:43                 ` Jason Wang
     [not found]                   ` <CACycT3s6SkER09KL_Ns9d03quYSKOuZwd3=HJ_s1SL7eH7y5gA@mail.gmail.com>
2021-05-28  1:33                     ` Jason Wang
     [not found]                       ` <CACycT3vKZ3y0gga8PrSVtssZfNV0Y-A8=iYZSi9sbpHRNkVf-A@mail.gmail.com>
2021-05-28  6:38                         ` Jason Wang
     [not found]                 ` <CACycT3uK_Fuade-b8FVYkGCKZnne_UGGbYRFwv7WOH2oKCsXSg@mail.gmail.com>
2021-05-28  2:31                   ` Jason Wang
     [not found]                     ` <CACycT3tLj6a7-tbqO9SzCLStwYrOALdkfnt1jxQBv3s0VzD6AQ@mail.gmail.com>
2021-05-31  4:38                       ` Jason Wang
2021-05-31  4:56   ` Greg KH
     [not found]     ` <CACycT3vRHPfOGxmy1Uv=8_dqqq8iG4YTZHUizo+y8EYKGS5g8g@mail.gmail.com>
2021-05-31  6:32       ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).