All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Tiwei Bie <tiwei.bie@intel.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, shahafs@mellanox.com, jgg@mellanox.com,
	rob.miller@broadcom.com, haotian.wang@sifive.com,
	eperezma@redhat.com, lulu@redhat.com, parav@mellanox.com,
	rdunlap@infradead.org, hch@infradead.org, jiri@mellanox.com,
	hanand@xilinx.com, mhabets@solarflare.com,
	maxime.coquelin@redhat.com, lingshan.zhu@intel.com,
	dan.daly@intel.com, cunming.liang@intel.com,
	zhihong.wang@intel.com
Subject: Re: [PATCH] vhost: introduce vDPA based backend
Date: Wed, 5 Feb 2020 02:16:57 -0500	[thread overview]
Message-ID: <20200205020547-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <2dd43fb5-6f02-2dcc-5c27-9f7419ef72fc@redhat.com>

On Wed, Feb 05, 2020 at 02:49:31PM +0800, Jason Wang wrote:
> 
> On 2020/2/5 下午2:30, Michael S. Tsirkin wrote:
> > On Wed, Feb 05, 2020 at 01:50:28PM +0800, Jason Wang wrote:
> > > On 2020/2/5 下午1:31, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 05, 2020 at 11:12:21AM +0800, Jason Wang wrote:
> > > > > On 2020/2/5 上午10:05, Tiwei Bie wrote:
> > > > > > On Tue, Feb 04, 2020 at 02:46:16PM +0800, Jason Wang wrote:
> > > > > > > On 2020/2/4 下午2:01, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Feb 04, 2020 at 11:30:11AM +0800, Jason Wang wrote:
> > > > > > > > > 5) generate diffs of memory table and using IOMMU API to setup the dma
> > > > > > > > > mapping in this method
> > > > > > > > Frankly I think that's a bunch of work. Why not a MAP/UNMAP interface?
> > > > > > > > 
> > > > > > > Sure, so that basically VHOST_IOTLB_UPDATE/INVALIDATE I think?
> > > > > > Do you mean we let userspace to only use VHOST_IOTLB_UPDATE/INVALIDATE
> > > > > > to do the DMA mapping in vhost-vdpa case? When vIOMMU isn't available,
> > > > > > userspace will set msg->iova to GPA, otherwise userspace will set
> > > > > > msg->iova to GIOVA, and vhost-vdpa module will get HPA from msg->uaddr?
> > > > > > 
> > > > > > Thanks,
> > > > > > Tiwei
> > > > > I think so. Michael, do you think this makes sense?
> > > > > 
> > > > > Thanks
> > > > to make sure, could you post the suggested argument format for
> > > > these ioctls?
> > > > 
> > > It's the existed uapi:
> > > 
> > > /* no alignment requirement */
> > > struct vhost_iotlb_msg {
> > >      __u64 iova;
> > >      __u64 size;
> > >      __u64 uaddr;
> > > #define VHOST_ACCESS_RO      0x1
> > > #define VHOST_ACCESS_WO      0x2
> > > #define VHOST_ACCESS_RW      0x3
> > >      __u8 perm;
> > > #define VHOST_IOTLB_MISS           1
> > > #define VHOST_IOTLB_UPDATE         2
> > > #define VHOST_IOTLB_INVALIDATE     3
> > > #define VHOST_IOTLB_ACCESS_FAIL    4
> > >      __u8 type;
> > > };
> > > 
> > > #define VHOST_IOTLB_MSG 0x1
> > > #define VHOST_IOTLB_MSG_V2 0x2
> > > 
> > > struct vhost_msg {
> > >      int type;
> > >      union {
> > >          struct vhost_iotlb_msg iotlb;
> > >          __u8 padding[64];
> > >      };
> > > };
> > > 
> > > struct vhost_msg_v2 {
> > >      __u32 type;
> > >      __u32 reserved;
> > >      union {
> > >          struct vhost_iotlb_msg iotlb;
> > >          __u8 padding[64];
> > >      };
> > > };
> > Oh ok.  So with a real device, I suspect we do not want to wait for each
> > change to be processed by device completely, so we might want an asynchronous variant
> > and then some kind of flush that tells device "you better apply these now".
> 
> 
> Let me explain:
> 
> There are two types of devices:
> 
> 1) device without on-chip IOMMU, DMA was done via IOMMU API which only
> support incremental map/unmap

Most IOMMUs have queues nowdays though. Whether APIs within kernel
expose that matters but we are better off on emulating
hardware not specific guest behaviour.

> 2) device with on-chip IOMMU, DMA could be done by device driver itself, and
> we could choose to pass the whole mappings to the driver at one time through
> vDPA bus operation (set_map)
> 
> For vhost-vpda, there're two types of memory mapping:
> 
> a) memory table, setup by userspace through VHOST_SET_MEM_TABLE, the whole
> mapping is updated in this way
> b) IOTLB API, incrementally done by userspace through vhost message
> (IOTLB_UPDATE/IOTLB_INVALIDATE)
> 
> The current design is:
> 
> - Reuse VHOST_SET_MEM_TABLE, and for type 1), we can choose to send diffs
> through IOMMU API or flush all the mappings then map new ones. For type 2),
> just send the whole mapping through set_map()

I know that at least for RDMA based things, you can't change
a mapping if it's active. So drivers will need to figure out the
differences which just looks ugly: userspace knows what
it was changing (really just adding/removing some guest memory).



> - Reuse vhost IOTLB, so for type 1), simply forward update/invalidate
> request via IOMMU API, for type 2), send IOTLB to vDPA device driver via
> set_map(), device driver may choose to send diffs or rebuild all mapping at
> their will
> 
> Technically we can use vhost IOTLB API (map/umap) for building
> VHOST_SET_MEM_TABLE, but to avoid device to process the each request, it
> looks to me we need new UAPI which seems sub optimal.
> 
> What's you thought?
> 
> Thanks

I suspect we can't completely avoid a new UAPI.

> 
> > 

  reply	other threads:[~2020-02-05  7:16 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-31  3:36 [PATCH] vhost: introduce vDPA based backend Tiwei Bie
2020-01-31  3:56 ` Randy Dunlap
2020-01-31  5:12   ` Randy Dunlap
2020-01-31  5:54     ` Tiwei Bie
2020-01-31  5:52   ` Tiwei Bie
2020-02-04  3:30 ` Jason Wang
2020-02-04  6:01   ` Michael S. Tsirkin
2020-02-04  6:01     ` Michael S. Tsirkin
2020-02-04  6:46     ` Jason Wang
2020-02-05  2:05       ` Tiwei Bie
2020-02-05  3:12         ` Jason Wang
2020-02-05  5:31           ` Michael S. Tsirkin
2020-02-05  5:50             ` Jason Wang
2020-02-05  5:50               ` Jason Wang
2020-02-05  6:30               ` Michael S. Tsirkin
2020-02-05  6:49                 ` Jason Wang
2020-02-05  6:49                   ` Jason Wang
2020-02-05  7:16                   ` Michael S. Tsirkin [this message]
2020-02-05  7:42                     ` Jason Wang
2020-02-05  9:22                       ` Michael S. Tsirkin
2020-02-05  2:02   ` Tiwei Bie
2020-02-05  3:11     ` Jason Wang
2020-02-05  7:15     ` Shahaf Shuler
2020-02-05  7:15       ` Shahaf Shuler
2020-02-05  7:50       ` Jason Wang
2020-02-05  7:50         ` Jason Wang
2020-02-05  9:23         ` Michael S. Tsirkin
2020-02-05  9:23           ` Michael S. Tsirkin
2020-02-06  3:07           ` Jason Wang
2020-02-06  3:07             ` Jason Wang
2020-02-05  9:30         ` Shahaf Shuler
2020-02-05  9:30           ` Shahaf Shuler
2020-02-05 10:33           ` Michael S. Tsirkin
2020-02-05 10:33             ` Michael S. Tsirkin
2020-02-06  3:09             ` Jason Wang
2020-02-06  3:09               ` Jason Wang
2020-02-06  3:04           ` Jason Wang
2020-02-06  3:04             ` Jason Wang
2020-02-05 12:56         ` Jason Gunthorpe
2020-02-05 12:56           ` Jason Gunthorpe
2020-02-05 13:14           ` Michael S. Tsirkin
2020-02-05 13:14             ` Michael S. Tsirkin
2020-02-06  3:11             ` Jason Wang
2020-02-06  3:11               ` Jason Wang
2020-02-06  3:21               ` Zhu Lingshan
2020-02-06  3:21                 ` Zhu Lingshan
2020-02-18 13:53 ` Jason Gunthorpe
2020-02-19  2:52   ` Tiwei Bie
2020-02-19 13:11     ` Jason Gunthorpe
2020-02-20  2:42       ` Tiwei Bie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200205020547-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=cunming.liang@intel.com \
    --cc=dan.daly@intel.com \
    --cc=eperezma@redhat.com \
    --cc=hanand@xilinx.com \
    --cc=haotian.wang@sifive.com \
    --cc=hch@infradead.org \
    --cc=jasowang@redhat.com \
    --cc=jgg@mellanox.com \
    --cc=jiri@mellanox.com \
    --cc=kvm@vger.kernel.org \
    --cc=lingshan.zhu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lulu@redhat.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mhabets@solarflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=parav@mellanox.com \
    --cc=rdunlap@infradead.org \
    --cc=rob.miller@broadcom.com \
    --cc=shahafs@mellanox.com \
    --cc=tiwei.bie@intel.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.