All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Xie Yongji <xieyongji@bytedance.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	virtualization@lists.linux-foundation.org
Subject: Re: [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace
Date: Mon, 19 Oct 2020 13:16:10 -0400	[thread overview]
Message-ID: <20201019130815-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20201019145623.671-1-xieyongji@bytedance.com>

On Mon, Oct 19, 2020 at 10:56:19PM +0800, Xie Yongji wrote:
> This series introduces a framework, which can be used to implement
> vDPA Devices in a userspace program. To implement it, the work
> consist of two parts: control path emulating and data path offloading.
> 
> In the control path, the VDUSE driver will make use of message
> mechnism to forward the actions (get/set features, get/st status,
> get/set config space and set virtqueue states) from virtio-vdpa
> driver to userspace. Userspace can use read()/write() to
> receive/reply to those control messages.
> 
> In the data path, the VDUSE driver implements a MMU-based
> on-chip IOMMU driver which supports both direct mapping and
> indirect mapping with bounce buffer. Then userspace can access
> those iova space via mmap(). Besides, eventfd mechnism is used to
> trigger interrupts and forward virtqueue kicks.
> 
> The details and our user case is shown below:
> 
> ------------------------     -----------------------------------------------------------
> |                  APP |     |                          QEMU                           |
> |       ---------      |     | --------------------    -------------------+<-->+------ |
> |       |dev/vdx|      |     | | device emulation |    | virtio dataplane |    | BDS | |
> ------------+-----------     -----------+-----------------------+-----------------+-----
>             |                           |                       |                 |
>             |                           | emulating             | offloading      |
> ------------+---------------------------+-----------------------+-----------------+------
> |    | block device |           |  vduse driver |        |  vdpa device |    | TCP/IP | |
> |    -------+--------           --------+--------        +------+-------     -----+---- |
> |           |                           |                |      |                 |     |
> |           |                           |                |      |                 |     |
> | ----------+----------       ----------+-----------     |      |                 |     |
> | | virtio-blk driver |       | virtio-vdpa driver |     |      |                 |     |
> | ----------+----------       ----------+-----------     |      |                 |     |
> |           |                           |                |      |                 |     |
> |           |                           ------------------      |                 |     |
> |           -----------------------------------------------------              ---+---  |
> ------------------------------------------------------------------------------ | NIC |---
>                                                                                ---+---
>                                                                                   |
>                                                                          ---------+---------
>                                                                          | Remote Storages |
>                                                                          -------------------
> We make use of it to implement a block device connecting to
> our distributed storage, which can be used in containers and
> bare metal.

What is not exactly clear is what is the APP above doing.

Taking virtio blk requests and sending them over the network
in some proprietary way?

> Compared with qemu-nbd solution, this solution has
> higher performance, and we can have an unified technology stack
> in VM and containers for remote storages.
> 
> To test it with a host disk (e.g. /dev/sdx):
> 
>   $ qemu-storage-daemon \
>       --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
>       --monitor chardev=charmonitor \
>       --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/sdx,node-name=disk0 \
>       --export vduse-blk,id=test,node-name=disk0,writable=on,vduse-id=1,num-queues=16,queue-size=128
> 
> The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse
> 
> Future work:
>   - Improve performance (e.g. zero copy implementation in datapath)
>   - Config interrupt support
>   - Userspace library (find a way to reuse device emulation code in qemu/rust-vmm)


How does this driver compare with vhost-user-blk (which doesn't need kernel support)?



> Xie Yongji (4):
>   mm: export zap_page_range() for driver use
>   vduse: Introduce VDUSE - vDPA Device in Userspace
>   vduse: grab the module's references until there is no vduse device
>   vduse: Add memory shrinker to reclaim bounce pages
> 
>  drivers/vdpa/Kconfig                 |    8 +
>  drivers/vdpa/Makefile                |    1 +
>  drivers/vdpa/vdpa_user/Makefile      |    5 +
>  drivers/vdpa/vdpa_user/eventfd.c     |  221 ++++++
>  drivers/vdpa/vdpa_user/eventfd.h     |   48 ++
>  drivers/vdpa/vdpa_user/iova_domain.c |  488 ++++++++++++
>  drivers/vdpa/vdpa_user/iova_domain.h |  104 +++
>  drivers/vdpa/vdpa_user/vduse.h       |   66 ++
>  drivers/vdpa/vdpa_user/vduse_dev.c   | 1081 ++++++++++++++++++++++++++
>  include/uapi/linux/vduse.h           |   85 ++
>  mm/memory.c                          |    1 +
>  11 files changed, 2108 insertions(+)
>  create mode 100644 drivers/vdpa/vdpa_user/Makefile
>  create mode 100644 drivers/vdpa/vdpa_user/eventfd.c
>  create mode 100644 drivers/vdpa/vdpa_user/eventfd.h
>  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
>  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h
>  create mode 100644 drivers/vdpa/vdpa_user/vduse.h
>  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
>  create mode 100644 include/uapi/linux/vduse.h
> 
> -- 
> 2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Xie Yongji <xieyongji@bytedance.com>
Cc: jasowang@redhat.com, akpm@linux-foundation.org,
	linux-mm@kvack.org, virtualization@lists.linux-foundation.org
Subject: Re: [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace
Date: Mon, 19 Oct 2020 13:16:10 -0400	[thread overview]
Message-ID: <20201019130815-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20201019145623.671-1-xieyongji@bytedance.com>

On Mon, Oct 19, 2020 at 10:56:19PM +0800, Xie Yongji wrote:
> This series introduces a framework, which can be used to implement
> vDPA Devices in a userspace program. To implement it, the work
> consist of two parts: control path emulating and data path offloading.
> 
> In the control path, the VDUSE driver will make use of message
> mechnism to forward the actions (get/set features, get/st status,
> get/set config space and set virtqueue states) from virtio-vdpa
> driver to userspace. Userspace can use read()/write() to
> receive/reply to those control messages.
> 
> In the data path, the VDUSE driver implements a MMU-based
> on-chip IOMMU driver which supports both direct mapping and
> indirect mapping with bounce buffer. Then userspace can access
> those iova space via mmap(). Besides, eventfd mechnism is used to
> trigger interrupts and forward virtqueue kicks.
> 
> The details and our user case is shown below:
> 
> ------------------------     -----------------------------------------------------------
> |                  APP |     |                          QEMU                           |
> |       ---------      |     | --------------------    -------------------+<-->+------ |
> |       |dev/vdx|      |     | | device emulation |    | virtio dataplane |    | BDS | |
> ------------+-----------     -----------+-----------------------+-----------------+-----
>             |                           |                       |                 |
>             |                           | emulating             | offloading      |
> ------------+---------------------------+-----------------------+-----------------+------
> |    | block device |           |  vduse driver |        |  vdpa device |    | TCP/IP | |
> |    -------+--------           --------+--------        +------+-------     -----+---- |
> |           |                           |                |      |                 |     |
> |           |                           |                |      |                 |     |
> | ----------+----------       ----------+-----------     |      |                 |     |
> | | virtio-blk driver |       | virtio-vdpa driver |     |      |                 |     |
> | ----------+----------       ----------+-----------     |      |                 |     |
> |           |                           |                |      |                 |     |
> |           |                           ------------------      |                 |     |
> |           -----------------------------------------------------              ---+---  |
> ------------------------------------------------------------------------------ | NIC |---
>                                                                                ---+---
>                                                                                   |
>                                                                          ---------+---------
>                                                                          | Remote Storages |
>                                                                          -------------------
> We make use of it to implement a block device connecting to
> our distributed storage, which can be used in containers and
> bare metal.

What is not exactly clear is what is the APP above doing.

Taking virtio blk requests and sending them over the network
in some proprietary way?

> Compared with qemu-nbd solution, this solution has
> higher performance, and we can have an unified technology stack
> in VM and containers for remote storages.
> 
> To test it with a host disk (e.g. /dev/sdx):
> 
>   $ qemu-storage-daemon \
>       --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
>       --monitor chardev=charmonitor \
>       --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/sdx,node-name=disk0 \
>       --export vduse-blk,id=test,node-name=disk0,writable=on,vduse-id=1,num-queues=16,queue-size=128
> 
> The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse
> 
> Future work:
>   - Improve performance (e.g. zero copy implementation in datapath)
>   - Config interrupt support
>   - Userspace library (find a way to reuse device emulation code in qemu/rust-vmm)


How does this driver compare with vhost-user-blk (which doesn't need kernel support)?



> Xie Yongji (4):
>   mm: export zap_page_range() for driver use
>   vduse: Introduce VDUSE - vDPA Device in Userspace
>   vduse: grab the module's references until there is no vduse device
>   vduse: Add memory shrinker to reclaim bounce pages
> 
>  drivers/vdpa/Kconfig                 |    8 +
>  drivers/vdpa/Makefile                |    1 +
>  drivers/vdpa/vdpa_user/Makefile      |    5 +
>  drivers/vdpa/vdpa_user/eventfd.c     |  221 ++++++
>  drivers/vdpa/vdpa_user/eventfd.h     |   48 ++
>  drivers/vdpa/vdpa_user/iova_domain.c |  488 ++++++++++++
>  drivers/vdpa/vdpa_user/iova_domain.h |  104 +++
>  drivers/vdpa/vdpa_user/vduse.h       |   66 ++
>  drivers/vdpa/vdpa_user/vduse_dev.c   | 1081 ++++++++++++++++++++++++++
>  include/uapi/linux/vduse.h           |   85 ++
>  mm/memory.c                          |    1 +
>  11 files changed, 2108 insertions(+)
>  create mode 100644 drivers/vdpa/vdpa_user/Makefile
>  create mode 100644 drivers/vdpa/vdpa_user/eventfd.c
>  create mode 100644 drivers/vdpa/vdpa_user/eventfd.h
>  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
>  create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h
>  create mode 100644 drivers/vdpa/vdpa_user/vduse.h
>  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
>  create mode 100644 include/uapi/linux/vduse.h
> 
> -- 
> 2.25.1



  parent reply	other threads:[~2020-10-19 17:16 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-19 14:56 [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2020-10-19 14:56 ` [RFC 1/4] mm: export zap_page_range() for driver use Xie Yongji
2020-10-19 15:14   ` Matthew Wilcox
2020-10-19 15:14     ` Matthew Wilcox
2020-10-19 15:36     ` [External] " 谢永吉
2020-10-19 14:56 ` [RFC 2/4] vduse: Introduce VDUSE - vDPA Device in Userspace Xie Yongji
2020-10-19 15:08   ` Michael S. Tsirkin
2020-10-19 15:08     ` Michael S. Tsirkin
2020-10-19 15:24     ` Randy Dunlap
2020-10-19 15:24       ` Randy Dunlap
2020-10-19 15:46       ` [External] " 谢永吉
2020-10-19 15:48     ` 谢永吉
2020-10-19 20:22   ` kernel test robot
2020-10-19 14:56 ` [RFC 3/4] vduse: grab the module's references until there is no vduse device Xie Yongji
2020-10-19 15:05   ` Michael S. Tsirkin
2020-10-19 15:05     ` Michael S. Tsirkin
2020-10-19 15:44     ` [External] " 谢永吉
2020-10-19 15:47       ` Michael S. Tsirkin
2020-10-19 15:47         ` Michael S. Tsirkin
2020-10-19 15:56         ` 谢永吉
2020-10-19 16:41           ` Michael S. Tsirkin
2020-10-19 16:41             ` Michael S. Tsirkin
2020-10-20  7:42             ` Yongji Xie
2020-10-19 14:56 ` [RFC 4/4] vduse: Add memory shrinker to reclaim bounce pages Xie Yongji
2020-10-19 17:16 ` Michael S. Tsirkin [this message]
2020-10-19 17:16   ` [RFC 0/4] Introduce VDUSE - vDPA Device in Userspace Michael S. Tsirkin
2020-10-20  2:18   ` [External] " 谢永吉
2020-10-20  2:20     ` Jason Wang
2020-10-20  2:20       ` Jason Wang
2020-10-20  2:28       ` 谢永吉
2020-10-20  3:20 ` Jason Wang
2020-10-20  3:20   ` Jason Wang
2020-10-20  7:39   ` [External] " Yongji Xie
2020-10-20  8:01     ` Jason Wang
2020-10-20  8:01       ` Jason Wang
2020-10-20  8:35       ` Yongji Xie
2020-10-20  9:12         ` Jason Wang
2020-10-20  9:12           ` Jason Wang
2020-10-23  2:55           ` Yongji Xie
2020-10-23  8:44             ` Jason Wang
2020-10-23  8:44               ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201019130815-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.