Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jason Wang <jasowang@redhat.com>
To: "Liang, Cunming" <cunming.liang@intel.com>,
	"Bie, Tiwei" <tiwei.bie@intel.com>
Cc: "Tan, Jianfeng" <jianfeng.tan@intel.com>,
	"virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"Daly, Dan" <dan.daly@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"Wang, Xiao W" <xiao.w.wang@intel.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	"Wang, Zhihong" <zhihong.wang@intel.com>,
	Maxime Coquelin <maxime.coquelin@redhat.com>
Subject: Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] [RFC 0/3] Extend vhost-user to support VFIO based accelerators
Date: Mon, 8 Jan 2018 18:06:10 +0800	[thread overview]
Message-ID: <06f74212-4148-0630-66ea-1e2f933c63fb@redhat.com> (raw)
In-Reply-To: <D0158A423229094DA7ABF71CF2FA0DA34E8479EC@SHSMSX104.ccr.corp.intel.com>


[...]

>> chip EMC for early classification. It gives a fast path for those throughput
>> sensitive(SLA) VNF to bypass the further table lookup. It co-exists other VNF
>> whose SLA level is best effort but requires more functions(e.g. stateful
>> conntrack, security check, even higher layer WAF support) support, DPDK
>> based datapath still boost the throughput there. It's not used to be a single
>> choice of dedicated or shared datapath, usually they're co-exist.
>>
>> So if I understand this correctly, the "vswtich" here is a hardware function
>> (something like smart NICs or OVS offloaded). So the question still, is vhost-
>> user a must in this case?
> "vswitch" point to SW vswitch(e.g. OVS-DPDK). Accelerators stands for different offloading IPs on the device(e.g. smart NIC) which can be used from a userland driver.
> EMC IP used to offload OVS fastpath, so as move traffic to VM directly. Either SRIOV device assignment or vDPA helps to build datapath pass-thru context which represented by a virtual interface on management perspective. For entire "vswitch", there still co-exist none pass-thru interface(SW backend) which uses vhost-user for virtual interface.
> Both of them shall be able to replace each other.

Thanks, I kind of get the picture here.

A question is about the software backend, e.g what's the software 
counterpart for SRIOV or vDPA? E.g is there a VF or vDPA pmd connected 
to OVS-dpdk and it can switch to offload if required?

>
> There's no other user space choice yet recently for network except vhost-user. The patch of vhost-user extension has lower impact for qemu.
> If you read this patch, it's really about to reduce the doorbell and interrupt overhead.

For this patch, you need decouple pci specific stuffs out of vhost-user 
which is transport independent (at least now).

> Basic vDPA works even without any qemu change. As vhost-user is well-recognized as the vhost interface for userland backend, it's reasonable to well-support the usage of userland backend w/ I/O accelerator.

Right, so you can do all offloads in qemu, vhost-user could be still 
there. And qemu can switch between the two like a transparent bond or team?

>
> Before moving forward, it's necessary to get some alignment on two basic things.
> - Do you agree that providing userland backend via vhost-user is the right way to do with vswitch workload.
>     Otherwise, we probably shall go back to revisit vhost-user itself rather than talking anything new happening on vhost-user.

I agree.

> - Do you agree vhost-user is a right way for qemu to allow multi-process?
>     Please refer to https://www.linux-kvm.org/images/f/fc/KVM_FORUM_multi-process.pdf

This is questionable. From both performance and security points. We had 
example of performance (vIOMMU). For security, e.g in this patch, qemu 
can setup memory region based on the request from vhost-user slave, does 
this increase the attack surface?

I think you missed my point some how, as replied in previous thread, I 
did't object what you propose here. I just want to understand the reason 
you choose vhost-user. And in the cover letter, vswitch case is not 
mentioned at all, instead and it compares vDPA with VFIO. This makes 
reader easily to think that qemu will monopoly the device, so it's 
rather nature to ask why not do it inside qemu.

>
>>>>>     On workloads point of view, it's not excited to be part of qemu process.
>>>> Don't see why, qemu have dataplane for virtio-blk/scsi.
>>> Qemu has vhost-user for scsi too. I'm not saying which one is bad, just
>> point out sometime it's very workloads driven. Network is different with
>> blk/scsi/crypto.
>>
>> What's the main difference from your point of view which makes
>> vhost-user a must in this case?
> Network devices, a NIC or a Smart NIC usually has vendor specific driver. DPDK takes devices by its user space drivers to run OVS. Virtual interface is all vhost-user based talking with qemu. For some virtual interface, it now tries to bypass the traffic. It's looking forward a consistent vhost-user interface there.

So the point is probably you can keep vhost-user for sw path while 
implementing offloaded path in qemu completely?

>   Linking OVS-DPDK with qemu, TBH, it's far away from today's usage.
>
>>>>> That comes up with the idea of vhost-user extension. Userland
>> workloads
>>>> decides to enable accelerators or not, qemu provides the common control
>>>> plane infrastructure.
>>>>
>>>> It brings extra complexity: endless new types of messages and a huge
>> brunch
>>>> of bugs. And what's more important, the split model tends to be less
>> efficient
>>>> in some cases, e.g guest IOMMU integration. I'm pretty sure we will meet
>>>> more in the future.
>>> vIOMMU relevant message has been supported by vhost protocol. It's
>> independent effort there.
>>
>> The point is vIOMMU integration is very inefficient in vhost-user for
>> some cases. If you have lots of dynamic mappings, it can have only
>> 5%-10% performance compared to vIOMMU disabled. A huge amount of
>> translation request will be generated in this case. The main issue here
>> is you can not offload datapath completely to vhost-user backends
>> completely, IOMMU translations were still done in qemu. This is one of
>> the defect of vhost-user when datapath need to access the device state.
> It's vIOMMU's challenge of dynamic mapping, besides vhost-user, kernel vhost shall face the same situation. Static mapping w/ DPDK looks much better. It's not fair to blame vhost-user by vIOMMU overhead.

Yes, that's why I want a vhost dataplane inside qemu. (btw vhost-user 
should be even worse consider syscall is less expensive than IPC).

>
>>> I don't see this patch introduce endless new types.
>> Not this patch but we can imagine vhost-user protocol will become
>> complex in the future.
>>
>>> My taking of your fundamental concern is about continues adding new
>> features on vhost-user.
>>> Feel free to correct me if I misunderstood your point.
>> Unfortunately not, endless itself is not a problem but we'd better only
>> try to extend it only when it was really needed. The main questions are:
>>
>> 1) whether or not we need to split things like what you suggested here?
>> 2) if needed, is vhost-user the best method?
> Sounds good. BTW, this patch(vhost-user extention) is a performance improvement patch for DPDK vDPA usage(Refer DPDK patches). Another RFC patch stay tuned for kernel space usage which will propose a qemu native vhost adaptor for in-kernel mediated device driver.

Any pointer to this patch?

[...]

>> Why not? We've already had userspace NVME driver.
> There's huge amount of vendor specific driver for network. NVMe is much generalized than NIC.
> The idea of linking an external dataplane sounds interesting, but it's not used in real world. Looking forward the progress.
>
>>>    that's end up with another vhost-vfio in my slides.
>> I don't get why we can't implement it purely through a userspace driver
>> inside qemu.
> TBH, we think about this before. There're a few reasons stopping us.
> - qemu hasn't an abstraction layout of network device(HW NIC) for userspace drivers

Well, you can still use vhost (but not vhost-user).

> - qemu launch process, linking dpdk w/ qemu is not problem. Gap is on ovs integration, effort/impact is not small

We can keep vhost-user datapath.

> - for qemu native virtio SW backend, it lacks of efficient ways to talk with external process. The change efforts/impact is not small.

By keeping vhost-user datapath there's no such worries. Btw, we will 
probably need a channel between qemu and ovs directly which can 
negotiate more offloads.

> - qemu native userspace driver only used for qemu, userspace driver in DPDK can be used for others
>


Thanks

     prev parent reply	other threads:[~2018-01-08 10:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-22  6:41 [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Tiwei Bie
2017-12-22  6:41 ` [Qemu-devel] [RFC 1/3] vhost-user: support receiving file descriptors in slave_read Tiwei Bie
2017-12-22  6:41 ` [Qemu-devel] [RFC 2/3] vhost-user: introduce shared vhost-user state Tiwei Bie
2017-12-22  6:41 ` [Qemu-devel] [RFC 3/3] vhost-user: add VFIO based accelerators support Tiwei Bie
2018-01-16 17:23   ` Alex Williamson
2018-01-17  5:00     ` Tiwei Bie
2018-01-02  2:42 ` [Qemu-devel] [RFC 0/3] Extend vhost-user to support VFIO based accelerators Alexey Kardashevskiy
2018-01-02  5:49   ` Liang, Cunming
2018-01-02  6:01     ` Alexey Kardashevskiy
2018-01-02  6:48       ` Liang, Cunming
2018-01-03 14:34 ` [Qemu-devel] [virtio-dev] " Jason Wang
2018-01-04  6:18   ` Tiwei Bie
2018-01-04  7:21     ` Jason Wang
2018-01-05  6:58       ` Liang, Cunming
2018-01-05  8:38         ` Jason Wang
2018-01-05 10:25           ` Liang, Cunming
2018-01-08  3:23             ` Jason Wang
2018-01-08  8:23               ` [Qemu-devel] [virtio-dev] " Liang, Cunming
2018-01-08 10:06                 ` Jason Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06f74212-4148-0630-66ea-1e2f933c63fb@redhat.com \
    --to=jasowang@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=cunming.liang@intel.com \
    --cc=dan.daly@intel.com \
    --cc=jianfeng.tan@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=tiwei.bie@intel.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=xiao.w.wang@intel.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).