From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Tiwei Bie <tiwei.bie@intel.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, shahafs@mellanox.com, jgg@mellanox.com,
rob.miller@broadcom.com, haotian.wang@sifive.com,
eperezma@redhat.com, lulu@redhat.com, parav@mellanox.com,
rdunlap@infradead.org, hch@infradead.org, jiri@mellanox.com,
hanand@xilinx.com, mhabets@solarflare.com,
maxime.coquelin@redhat.com, lingshan.zhu@intel.com,
dan.daly@intel.com, cunming.liang@intel.com,
zhihong.wang@intel.com
Subject: Re: [PATCH] vhost: introduce vDPA based backend
Date: Wed, 5 Feb 2020 02:16:57 -0500 [thread overview]
Message-ID: <20200205020547-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <2dd43fb5-6f02-2dcc-5c27-9f7419ef72fc@redhat.com>
On Wed, Feb 05, 2020 at 02:49:31PM +0800, Jason Wang wrote:
>
> On 2020/2/5 下午2:30, Michael S. Tsirkin wrote:
> > On Wed, Feb 05, 2020 at 01:50:28PM +0800, Jason Wang wrote:
> > > On 2020/2/5 下午1:31, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 05, 2020 at 11:12:21AM +0800, Jason Wang wrote:
> > > > > On 2020/2/5 上午10:05, Tiwei Bie wrote:
> > > > > > On Tue, Feb 04, 2020 at 02:46:16PM +0800, Jason Wang wrote:
> > > > > > > On 2020/2/4 下午2:01, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Feb 04, 2020 at 11:30:11AM +0800, Jason Wang wrote:
> > > > > > > > > 5) generate diffs of memory table and using IOMMU API to setup the dma
> > > > > > > > > mapping in this method
> > > > > > > > Frankly I think that's a bunch of work. Why not a MAP/UNMAP interface?
> > > > > > > >
> > > > > > > Sure, so that basically VHOST_IOTLB_UPDATE/INVALIDATE I think?
> > > > > > Do you mean we let userspace to only use VHOST_IOTLB_UPDATE/INVALIDATE
> > > > > > to do the DMA mapping in vhost-vdpa case? When vIOMMU isn't available,
> > > > > > userspace will set msg->iova to GPA, otherwise userspace will set
> > > > > > msg->iova to GIOVA, and vhost-vdpa module will get HPA from msg->uaddr?
> > > > > >
> > > > > > Thanks,
> > > > > > Tiwei
> > > > > I think so. Michael, do you think this makes sense?
> > > > >
> > > > > Thanks
> > > > to make sure, could you post the suggested argument format for
> > > > these ioctls?
> > > >
> > > It's the existed uapi:
> > >
> > > /* no alignment requirement */
> > > struct vhost_iotlb_msg {
> > > __u64 iova;
> > > __u64 size;
> > > __u64 uaddr;
> > > #define VHOST_ACCESS_RO 0x1
> > > #define VHOST_ACCESS_WO 0x2
> > > #define VHOST_ACCESS_RW 0x3
> > > __u8 perm;
> > > #define VHOST_IOTLB_MISS 1
> > > #define VHOST_IOTLB_UPDATE 2
> > > #define VHOST_IOTLB_INVALIDATE 3
> > > #define VHOST_IOTLB_ACCESS_FAIL 4
> > > __u8 type;
> > > };
> > >
> > > #define VHOST_IOTLB_MSG 0x1
> > > #define VHOST_IOTLB_MSG_V2 0x2
> > >
> > > struct vhost_msg {
> > > int type;
> > > union {
> > > struct vhost_iotlb_msg iotlb;
> > > __u8 padding[64];
> > > };
> > > };
> > >
> > > struct vhost_msg_v2 {
> > > __u32 type;
> > > __u32 reserved;
> > > union {
> > > struct vhost_iotlb_msg iotlb;
> > > __u8 padding[64];
> > > };
> > > };
> > Oh ok. So with a real device, I suspect we do not want to wait for each
> > change to be processed by device completely, so we might want an asynchronous variant
> > and then some kind of flush that tells device "you better apply these now".
>
>
> Let me explain:
>
> There are two types of devices:
>
> 1) device without on-chip IOMMU, DMA was done via IOMMU API which only
> support incremental map/unmap
Most IOMMUs have queues nowdays though. Whether APIs within kernel
expose that matters but we are better off on emulating
hardware not specific guest behaviour.
> 2) device with on-chip IOMMU, DMA could be done by device driver itself, and
> we could choose to pass the whole mappings to the driver at one time through
> vDPA bus operation (set_map)
>
> For vhost-vpda, there're two types of memory mapping:
>
> a) memory table, setup by userspace through VHOST_SET_MEM_TABLE, the whole
> mapping is updated in this way
> b) IOTLB API, incrementally done by userspace through vhost message
> (IOTLB_UPDATE/IOTLB_INVALIDATE)
>
> The current design is:
>
> - Reuse VHOST_SET_MEM_TABLE, and for type 1), we can choose to send diffs
> through IOMMU API or flush all the mappings then map new ones. For type 2),
> just send the whole mapping through set_map()
I know that at least for RDMA based things, you can't change
a mapping if it's active. So drivers will need to figure out the
differences which just looks ugly: userspace knows what
it was changing (really just adding/removing some guest memory).
> - Reuse vhost IOTLB, so for type 1), simply forward update/invalidate
> request via IOMMU API, for type 2), send IOTLB to vDPA device driver via
> set_map(), device driver may choose to send diffs or rebuild all mapping at
> their will
>
> Technically we can use vhost IOTLB API (map/umap) for building
> VHOST_SET_MEM_TABLE, but to avoid device to process the each request, it
> looks to me we need new UAPI which seems sub optimal.
>
> What's you thought?
>
> Thanks
I suspect we can't completely avoid a new UAPI.
>
> >
next prev parent reply other threads:[~2020-02-05 7:17 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-31 3:36 [PATCH] vhost: introduce vDPA based backend Tiwei Bie
2020-01-31 3:56 ` Randy Dunlap
2020-01-31 5:12 ` Randy Dunlap
2020-01-31 5:54 ` Tiwei Bie
2020-01-31 5:52 ` Tiwei Bie
2020-02-04 3:30 ` Jason Wang
2020-02-04 6:01 ` Michael S. Tsirkin
2020-02-04 6:46 ` Jason Wang
2020-02-05 2:05 ` Tiwei Bie
2020-02-05 3:12 ` Jason Wang
2020-02-05 5:31 ` Michael S. Tsirkin
2020-02-05 5:50 ` Jason Wang
2020-02-05 6:30 ` Michael S. Tsirkin
2020-02-05 6:49 ` Jason Wang
2020-02-05 7:16 ` Michael S. Tsirkin [this message]
2020-02-05 7:42 ` Jason Wang
2020-02-05 9:22 ` Michael S. Tsirkin
2020-02-05 2:02 ` Tiwei Bie
2020-02-05 3:11 ` Jason Wang
2020-02-05 7:15 ` Shahaf Shuler
2020-02-05 7:50 ` Jason Wang
2020-02-05 9:23 ` Michael S. Tsirkin
2020-02-06 3:07 ` Jason Wang
2020-02-05 9:30 ` Shahaf Shuler
2020-02-05 10:33 ` Michael S. Tsirkin
2020-02-06 3:09 ` Jason Wang
2020-02-06 3:04 ` Jason Wang
2020-02-05 12:56 ` Jason Gunthorpe
2020-02-05 13:14 ` Michael S. Tsirkin
2020-02-06 3:11 ` Jason Wang
2020-02-06 3:21 ` Zhu Lingshan
2020-02-18 13:53 ` Jason Gunthorpe
2020-02-19 2:52 ` Tiwei Bie
2020-02-19 13:11 ` Jason Gunthorpe
2020-02-20 2:42 ` Tiwei Bie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200205020547-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=cunming.liang@intel.com \
--cc=dan.daly@intel.com \
--cc=eperezma@redhat.com \
--cc=hanand@xilinx.com \
--cc=haotian.wang@sifive.com \
--cc=hch@infradead.org \
--cc=jasowang@redhat.com \
--cc=jgg@mellanox.com \
--cc=jiri@mellanox.com \
--cc=kvm@vger.kernel.org \
--cc=lingshan.zhu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lulu@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=mhabets@solarflare.com \
--cc=netdev@vger.kernel.org \
--cc=parav@mellanox.com \
--cc=rdunlap@infradead.org \
--cc=rob.miller@broadcom.com \
--cc=shahafs@mellanox.com \
--cc=tiwei.bie@intel.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=zhihong.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).