From: Jason Wang <jasowang@redhat.com>
To: Xiao Wang <xiao.w.wang@intel.com>,
mst@redhat.com, alex.williamson@redhat.com
Cc: qemu-devel@nongnu.org, tiwei.bie@intel.com,
cunming.liang@intel.com, xiaolong.ye@intel.com,
zhihong.wang@intel.com, dan.daly@intel.com
Subject: Re: [Qemu-devel] [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend
Date: Tue, 6 Nov 2018 12:17:48 +0800 [thread overview]
Message-ID: <2e010e02-f7a2-d009-ac7a-fdc266f99254@redhat.com> (raw)
In-Reply-To: <20181016132327.121839-1-xiao.w.wang@intel.com>
On 2018/10/16 下午9:23, Xiao Wang wrote:
> What's this
> ===========
> Following the patch (vhost: introduce mdev based hardware vhost backend)
> https://lwn.net/Articles/750770/, which defines a generic mdev device for
> vhost data path acceleration (aliased as vDPA mdev below), this patch set
> introduces a new net client type: vhost-vfio.
Thanks a lot for a such interesting series. Some generic questions:
If we consider to use software backend (e.g vhost-kernel or a rely of
virito-vhost-user or other cases) as well in the future, maybe
vhost-mdev is better which mean it does not tie to VFIO anyway.
>
> Currently we have 2 types of vhost backends in QEMU: vhost kernel (tap)
> and vhost-user (e.g. DPDK vhost), in order to have a kernel space HW vhost
> acceleration framework, the vDPA mdev device works as a generic configuring
> channel.
Does "generic" configuring channel means dpdk will also go for this way?
E.g it will have a vhost mdev pmd?
> It exposes to user space a non-vendor-specific configuration
> interface for setting up a vhost HW accelerator,
Or even a software translation layer on top of exist hardware.
> based on this, this patch
> set introduces a third vhost backend called vhost-vfio.
>
> How does it work
> ================
> The vDPA mdev defines 2 BAR regions, BAR0 and BAR1. BAR0 is the main
> device interface, vhost messages can be written to or read from this
> region following below format. All the regular vhost messages about vring
> addr, negotiated features, etc., are written to this region directly.
If I understand this correctly, the mdev was not used for passed through
to guest directly. So what's the reason of inventing a PCI like device
here? I'm asking since:
- vhost protocol is transport indepedent, we should consider to support
transport other than PCI. I know we can even do it with the exist design
but it looks rather odd if we do e.g ccw device with a PCI like mediated
device.
- can we try to reuse vhost-kernel ioctl? Less API means less bugs and
code reusing. E.g virtio-user can benefit from the vhost kernel ioctl
API almost with no changes I believe.
>
> struct vhost_vfio_op {
> __u64 request;
> __u32 flags;
> /* Flag values: */
> #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> __u32 size;
> union {
> __u64 u64;
> struct vhost_vring_state state;
> struct vhost_vring_addr addr;
> struct vhost_memory memory;
> } payload;
> };
>
> BAR1 is defined to be a region of doorbells, QEMU can use this region as
> host notifier for virtio. To optimize virtio notify, vhost-vfio trys to
> mmap the corresponding page on BAR1 for each queue and leverage EPT to let
> guest virtio driver kick vDPA device doorbell directly. For virtio 0.95
> case in which we cannot set host notifier memory region, QEMU will help to
> relay the notify to vDPA device.
>
> Note: EPT mapping requires each queue's notify address locates at the
> beginning of a separate page, parameter "page-per-vq=on" could help.
I think qemu should prepare a fallback for this if page-per-vq is off.
>
> For interrupt setting, vDPA mdev device leverages existing VFIO API to
> enable interrupt config in user space. In this way, KVM's irqfd for virtio
> can be set to mdev device by QEMU using ioctl().
>
> vhost-vfio net client will set up a vDPA mdev device which is specified
> by a "sysfsdev" parameter, during the net client init, the device will be
> opened and parsed using VFIO API, the VFIO device fd and device BAR region
> offset will be kept in a VhostVFIO structure, this initialization provides
> a channel to configure vhost information to the vDPA device driver.
>
> To do later
> ===========
> 1. The net client initialization uses raw VFIO API to open vDPA mdev
> device, it's better to provide a set of helpers in hw/vfio/common.c
> to help vhost-vfio initialize device easily.
>
> 2. For device DMA mapping, QEMU passes memory region info to mdev device
> and let kernel parent device driver program IOMMU. This is a temporary
> implementation, for future when IOMMU driver supports mdev bus, we
> can use VFIO API to program IOMMU directly for parent device.
> Refer to the patch (vfio/mdev: IOMMU aware mediated device):
> https://lkml.org/lkml/2018/10/12/225
As Steve mentioned in the KVM forum. It's better to have at least one
sample driver e.g virtio-net itself.
Then it would be more convenient for the reviewer to evaluate the whole
stack.
Thanks
>
> Vhost-vfio usage
> ================
> # Query the number of available mdev instances
> $ cat /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/available_instances
>
> # Create a mdev instance
> $ echo $UUID > /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/create
>
> # Launch QEMU with a virtio-net device
> qemu-system-x86_64 -cpu host -enable-kvm \
> <snip>
> -mem-prealloc \
> -netdev type=vhost-vfio,sysfsdev=/sys/bus/mdev/devices/$UUID,id=mynet\
> -device virtio-net-pci,netdv=mynet,page-per-vq=on \
>
> -------- END --------
>
> Xiao Wang (2):
> vhost-vfio: introduce vhost-vfio net client
> vhost-vfio: implement vhost-vfio backend
>
> hw/net/vhost_net.c | 56 ++++-
> hw/vfio/common.c | 3 +-
> hw/virtio/Makefile.objs | 2 +-
> hw/virtio/vhost-backend.c | 3 +
> hw/virtio/vhost-vfio.c | 501 ++++++++++++++++++++++++++++++++++++++
> hw/virtio/vhost.c | 15 ++
> include/hw/virtio/vhost-backend.h | 7 +-
> include/hw/virtio/vhost-vfio.h | 35 +++
> include/hw/virtio/vhost.h | 2 +
> include/net/vhost-vfio.h | 17 ++
> linux-headers/linux/vhost.h | 9 +
> net/Makefile.objs | 1 +
> net/clients.h | 3 +
> net/net.c | 1 +
> net/vhost-vfio.c | 327 +++++++++++++++++++++++++
> qapi/net.json | 22 +-
> 16 files changed, 996 insertions(+), 8 deletions(-)
> create mode 100644 hw/virtio/vhost-vfio.c
> create mode 100644 include/hw/virtio/vhost-vfio.h
> create mode 100644 include/net/vhost-vfio.h
> create mode 100644 net/vhost-vfio.c
>
next prev parent reply other threads:[~2018-11-06 4:18 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-16 13:23 [Qemu-devel] [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend Xiao Wang
2018-10-16 13:23 ` [Qemu-devel] [RFC 1/2] vhost-vfio: introduce vhost-vfio net client Xiao Wang
2018-10-16 13:23 ` [Qemu-devel] [RFC 2/2] vhost-vfio: implement vhost-vfio backend Xiao Wang
2018-11-06 4:17 ` Jason Wang [this message]
2018-11-07 12:26 ` [Qemu-devel] [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend Liang, Cunming
2018-11-07 14:38 ` Jason Wang
2018-11-07 15:08 ` Liang, Cunming
2018-11-08 2:15 ` Jason Wang
2018-11-08 16:48 ` Liang, Cunming
2018-11-09 2:32 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e010e02-f7a2-d009-ac7a-fdc266f99254@redhat.com \
--to=jasowang@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=cunming.liang@intel.com \
--cc=dan.daly@intel.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=tiwei.bie@intel.com \
--cc=xiao.w.wang@intel.com \
--cc=xiaolong.ye@intel.com \
--cc=zhihong.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).