qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Xiao Wang <xiao.w.wang@intel.com>,
	mst@redhat.com, alex.williamson@redhat.com
Cc: qemu-devel@nongnu.org, tiwei.bie@intel.com,
	cunming.liang@intel.com, xiaolong.ye@intel.com,
	zhihong.wang@intel.com, dan.daly@intel.com
Subject: Re: [Qemu-devel] [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend
Date: Tue, 6 Nov 2018 12:17:48 +0800	[thread overview]
Message-ID: <2e010e02-f7a2-d009-ac7a-fdc266f99254@redhat.com> (raw)
In-Reply-To: <20181016132327.121839-1-xiao.w.wang@intel.com>


On 2018/10/16 下午9:23, Xiao Wang wrote:
> What's this
> ===========
> Following the patch (vhost: introduce mdev based hardware vhost backend)
> https://lwn.net/Articles/750770/, which defines a generic mdev device for
> vhost data path acceleration (aliased as vDPA mdev below), this patch set
> introduces a new net client type: vhost-vfio.


Thanks a lot for a such interesting series. Some generic questions:


If we consider to use software backend (e.g vhost-kernel or a rely of 
virito-vhost-user or other cases) as well in the future, maybe 
vhost-mdev is better which mean it does not tie to VFIO anyway.


>
> Currently we have 2 types of vhost backends in QEMU: vhost kernel (tap)
> and vhost-user (e.g. DPDK vhost), in order to have a kernel space HW vhost
> acceleration framework, the vDPA mdev device works as a generic configuring
> channel.


Does "generic" configuring channel means dpdk will also go for this way? 
E.g it will have a vhost mdev pmd?


>   It exposes to user space a non-vendor-specific configuration
> interface for setting up a vhost HW accelerator,


Or even a software translation layer on top of exist hardware.


> based on this, this patch
> set introduces a third vhost backend called vhost-vfio.
>
> How does it work
> ================
> The vDPA mdev defines 2 BAR regions, BAR0 and BAR1. BAR0 is the main
> device interface, vhost messages can be written to or read from this
> region following below format. All the regular vhost messages about vring
> addr, negotiated features, etc., are written to this region directly.


If I understand this correctly, the mdev was not used for passed through 
to guest directly. So what's the reason of inventing a PCI like device 
here? I'm asking since:

- vhost protocol is transport indepedent, we should consider to support 
transport other than PCI. I know we can even do it with the exist design 
but it looks rather odd if we do e.g ccw device with a PCI like mediated 
device.

- can we try to reuse vhost-kernel ioctl? Less API means less bugs and 
code reusing. E.g virtio-user can benefit from the vhost kernel ioctl 
API almost with no changes I believe.


>
> struct vhost_vfio_op {
> 	__u64 request;
> 	__u32 flags;
> 	/* Flag values: */
> #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> 	__u32 size;
> 	union {
> 		__u64 u64;
> 		struct vhost_vring_state state;
> 		struct vhost_vring_addr addr;
> 		struct vhost_memory memory;
> 	} payload;
> };
>
> BAR1 is defined to be a region of doorbells, QEMU can use this region as
> host notifier for virtio. To optimize virtio notify, vhost-vfio trys to
> mmap the corresponding page on BAR1 for each queue and leverage EPT to let
> guest virtio driver kick vDPA device doorbell directly. For virtio 0.95
> case in which we cannot set host notifier memory region, QEMU will help to
> relay the notify to vDPA device.
>
> Note: EPT mapping requires each queue's notify address locates at the
> beginning of a separate page, parameter "page-per-vq=on" could help.


I think qemu should prepare a fallback for this if page-per-vq is off.


>
> For interrupt setting, vDPA mdev device leverages existing VFIO API to
> enable interrupt config in user space. In this way, KVM's irqfd for virtio
> can be set to mdev device by QEMU using ioctl().
>
> vhost-vfio net client will set up a vDPA mdev device which is specified
> by a "sysfsdev" parameter, during the net client init, the device will be
> opened and parsed using VFIO API, the VFIO device fd and device BAR region
> offset will be kept in a VhostVFIO structure, this initialization provides
> a channel to configure vhost information to the vDPA device driver.
>
> To do later
> ===========
> 1. The net client initialization uses raw VFIO API to open vDPA mdev
> device, it's better to provide a set of helpers in hw/vfio/common.c
> to help vhost-vfio initialize device easily.
>
> 2. For device DMA mapping, QEMU passes memory region info to mdev device
> and let kernel parent device driver program IOMMU. This is a temporary
> implementation, for future when IOMMU driver supports mdev bus, we
> can use VFIO API to program IOMMU directly for parent device.
> Refer to the patch (vfio/mdev: IOMMU aware mediated device):
> https://lkml.org/lkml/2018/10/12/225


As Steve mentioned in the KVM forum. It's better to have at least one 
sample driver e.g virtio-net itself.

Then it would be more convenient for the reviewer to evaluate the whole 
stack.

Thanks


>
> Vhost-vfio usage
> ================
> # Query the number of available mdev instances
> $ cat /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/available_instances
>
> # Create a mdev instance
> $ echo $UUID > /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/create
>
> # Launch QEMU with a virtio-net device
>      qemu-system-x86_64 -cpu host -enable-kvm \
>      <snip>
>      -mem-prealloc \
>      -netdev type=vhost-vfio,sysfsdev=/sys/bus/mdev/devices/$UUID,id=mynet\
>      -device virtio-net-pci,netdv=mynet,page-per-vq=on \
>
> -------- END --------
>
> Xiao Wang (2):
>    vhost-vfio: introduce vhost-vfio net client
>    vhost-vfio: implement vhost-vfio backend
>
>   hw/net/vhost_net.c                |  56 ++++-
>   hw/vfio/common.c                  |   3 +-
>   hw/virtio/Makefile.objs           |   2 +-
>   hw/virtio/vhost-backend.c         |   3 +
>   hw/virtio/vhost-vfio.c            | 501 ++++++++++++++++++++++++++++++++++++++
>   hw/virtio/vhost.c                 |  15 ++
>   include/hw/virtio/vhost-backend.h |   7 +-
>   include/hw/virtio/vhost-vfio.h    |  35 +++
>   include/hw/virtio/vhost.h         |   2 +
>   include/net/vhost-vfio.h          |  17 ++
>   linux-headers/linux/vhost.h       |   9 +
>   net/Makefile.objs                 |   1 +
>   net/clients.h                     |   3 +
>   net/net.c                         |   1 +
>   net/vhost-vfio.c                  | 327 +++++++++++++++++++++++++
>   qapi/net.json                     |  22 +-
>   16 files changed, 996 insertions(+), 8 deletions(-)
>   create mode 100644 hw/virtio/vhost-vfio.c
>   create mode 100644 include/hw/virtio/vhost-vfio.h
>   create mode 100644 include/net/vhost-vfio.h
>   create mode 100644 net/vhost-vfio.c
>

  parent reply	other threads:[~2018-11-06  4:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-16 13:23 [Qemu-devel] [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend Xiao Wang
2018-10-16 13:23 ` [Qemu-devel] [RFC 1/2] vhost-vfio: introduce vhost-vfio net client Xiao Wang
2018-10-16 13:23 ` [Qemu-devel] [RFC 2/2] vhost-vfio: implement vhost-vfio backend Xiao Wang
2018-11-06  4:17 ` Jason Wang [this message]
2018-11-07 12:26   ` [Qemu-devel] [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend Liang, Cunming
2018-11-07 14:38     ` Jason Wang
2018-11-07 15:08       ` Liang, Cunming
2018-11-08  2:15         ` Jason Wang
2018-11-08 16:48           ` Liang, Cunming
2018-11-09  2:32             ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e010e02-f7a2-d009-ac7a-fdc266f99254@redhat.com \
    --to=jasowang@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=cunming.liang@intel.com \
    --cc=dan.daly@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tiwei.bie@intel.com \
    --cc=xiao.w.wang@intel.com \
    --cc=xiaolong.ye@intel.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).