Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Tiwei Bie <tiwei.bie@intel.com>, Jason Wang <jasowang@redhat.com>,
	qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	stefanha@redhat.com, cunming.liang@intel.com, dan.daly@intel.com,
	jianfeng.tan@intel.com, zhihong.wang@intel.com,
	xiao.w.wang@intel.com
Subject: Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
Date: Wed, 7 Feb 2018 23:59:47 +0200	[thread overview]
Message-ID: <20180207202015-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAKgT0UciD4XyAovFt0vyeJBCisjjE3i4_tFu3SMMYrmWpxos1A@mail.gmail.com>

On Wed, Feb 07, 2018 at 10:02:24AM -0800, Alexander Duyck wrote:
> On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
> >> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie <tiwei.bie@intel.com> wrote:
> >> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> >> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> >> >> > > The virtual IOMMU isn't supported by the accelerators for now.
> >> >> > > Because vhost-user currently lacks of an efficient way to share
> >> >> > > the IOMMU table in VM to vhost backend. That's why the software
> >> >> > > implementation of virtual IOMMU support in vhost-user backend
> >> >> > > can't support dynamic mapping well.
> >> >> > What exactly is meant by that? vIOMMU seems to work for people,
> >> >> > it's not that fast if you change mappings all the time,
> >> >> > but e.g. dpdk within guest doesn't.
> >> >>
> >> >> Yes, software implementation support dynamic mapping for sure. I think the
> >> >> point is, current vhost-user backend can not program hardware IOMMU. So it
> >> >> can not let hardware accelerator to cowork with software vIOMMU.
> >> >
> >> > Vhost-user backend can program hardware IOMMU. Currently
> >> > vhost-user backend (or more precisely the vDPA driver in
> >> > vhost-user backend) will use the memory table (delivered
> >> > by the VHOST_USER_SET_MEM_TABLE message) to program the
> >> > IOMMU via vfio, and that's why accelerators can use the
> >> > GPA (guest physical address) in descriptors directly.
> >> >
> >> > Theoretically, we can use the IOVA mapping info (delivered
> >> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> >> > and accelerators will be able to use IOVA. But the problem
> >> > is that in vhost-user QEMU won't push all the IOVA mappings
> >> > to backend directly. Backend needs to ask for those info
> >> > when it meets a new IOVA. Such design and implementation
> >> > won't work well for dynamic mappings anyway and couldn't
> >> > be supported by hardware accelerators.
> >> >
> >> >> I think
> >> >> that's another call to implement the offloaded path inside qemu which has
> >> >> complete support for vIOMMU co-operated VFIO.
> >> >
> >> > Yes, that's exactly what we want. After revisiting the
> >> > last paragraph in the commit message, I found it's not
> >> > really accurate. The practicability of dynamic mappings
> >> > support is a common issue for QEMU. It also exists for
> >> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> >> > map/unmap events, the data path performance couldn't be
> >> > high. If we want to thoroughly fix this issue especially
> >> > for vfio (hw/vfio in QEMU), we need to have the offload
> >> > path Jason mentioned in QEMU. And I think accelerators
> >> > could use it too.
> >> >
> >> > Best regards,
> >> > Tiwei Bie
> >>
> >> I wonder if we couldn't look at coming up with an altered security
> >> model for the IOMMU drivers to address some of the performance issues
> >> seen with typical hardware IOMMU?
> >>
> >> In the case of most network devices, we seem to be moving toward a
> >> model where the Rx pages are mapped for an extended period of time and
> >> see a fairly high rate of reuse. As such pages mapped as being
> >> writable or read/write by the device are left mapped for an extended
> >> period of time while Tx pages, which are read only, are often
> >> mapped/unmapped since they are coming from some other location in the
> >> kernel beyond the driver's control.
> >>
> >> If we were to somehow come up with a model where the read-only(Tx)
> >> pages had access to a pre-allocated memory mapped address, and the
> >> read/write(descriptor rings), write-only(Rx) pages were provided with
> >> dynamic addresses we might be able to come up with a solution that
> >> would allow for fairly high network performance while at least
> >> protecting from memory corruption. The only issue it would open up is
> >> that the device would have the ability to read any/all memory on the
> >> guest. I was wondering about doing something like this with the vIOMMU
> >> with VFIO for the Intel NICs this way since an interface like igb,
> >> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
> >> performance under such a model and as long as the writable pages were
> >> being tracked by the vIOMMU. It could even allow for live migration
> >> support if the vIOMMU provided the info needed for migratable/dirty
> >> page tracking and we held off on migrating any of the dynamically
> >> mapped pages until after they were either unmapped or an FLR reset the
> >> device.
> >>
> >> Thanks.
> >>
> >> - Alex
> >
> >
> >
> > It might be a good idea to change the iommu instead - how about a
> > variant of strict in intel iommu which forces an IOTLB flush after
> > invalidating a writeable mapping but not a RO mapping?  Not sure what the
> > name would be - relaxed-ro?
> >
> > This is probably easier than poking at the drivers and net core.
> >
> > Keeping the RX pages mapped in the IOMMU was envisioned for XDP.
> > That might be a good place to start.
> 
> My plan is to update the Intel IOMMU driver first since it seems like
> something that shouldn't require too much expertise in the operation
> of the IOMMU to accomplish. My idea was more along the lines of
> something like a "iommu=read-only-pt" or maybe "iommu=pt-ro" where the
> Tx data would be identity mapped, and the descriptor rings and Rx data
> could be in the dynamic mapping setup. The idea is loosely based on
> the existing "iommu=pt" option that is normally used on the host if
> you want to avoid the cost for dynamic mapping. Basically we just need
> to keep an eye on the number of mappings that the device can write to.
> Ideally if we leave the Tx as identity mapped that means we never have
> to actually write to update any mapping which would mean no having to
> jump into the hypervisor to deal with the update.

Just noting that updating page tables does not require jumping
to the hypervisor by itself. Only invalidation requires that.

> The fact that most
> of the drivers already leave the Rx buffers and descriptor rings
> statically mapped should essentially take care of the rest for us.
> What this would become is a version of "iommu=pt" where the user cares
> about preventing the device from possibly corrupting memory, but would
> still like better performance at the cost of the device being able to
> ready and/all memory on the system.
> 
> As far as if it is strict or not I don't know how much we would need
> to worry about that for the migration case. Essentially a deferred
> IOTLB flush would result in us having extra pages marked as dirty and
> non-migratable, but we would need to see how much overhead there is in
> the migration to deal with those extra pages versus the cost of having
> to do an IOTLB flush on every unmap call.
> 
> Anyway this is an idea that just occurred to me the other day so I
> still need to do some more research into how easy/difficult
> implementing a solution like this would be.
> 
> Thanks.
> 
> - Alex

Right. And I think if you do a straight pt, then this is not
a security as much as a robustness feature. I guess both
have a place under the sun.

-- 
MST

next prev parent reply	other threads:[~2018-02-07 22:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-25  4:03 [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 1/6] vhost-user: support receiving file descriptors in slave_read Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 2/6] vhost-user: introduce shared vhost-user state Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 3/6] virtio: support adding sub-regions for notify region Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 4/6] vfio: support getting VFIOGroup from groupfd Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 5/6] vfio: remove DPRINTF() definition from vfio-common.h Tiwei Bie
2018-01-25  4:03 ` [Qemu-devel] [PATCH v1 6/6] vhost-user: add VFIO based accelerators support Tiwei Bie
2018-01-25 23:59   ` Michael S. Tsirkin
2018-01-26  3:41     ` [Qemu-devel] [virtio-dev] " Jason Wang
2018-01-26  5:57       ` Tiwei Bie
2018-02-04 21:49         ` Alexander Duyck
2018-02-07 16:43           ` Michael S. Tsirkin
2018-02-07 18:02             ` Alexander Duyck
2018-02-07 21:59               ` Michael S. Tsirkin [this message]
2018-02-05 17:47   ` [Qemu-devel] [virtio-dev] " Paolo Bonzini
2018-02-06  4:40     ` Tiwei Bie
2018-02-07 15:23       ` Paolo Bonzini
2018-01-25 14:22 ` [Qemu-devel] [PATCH v1 0/6] Extend vhost-user to support VFIO based accelerators Stefan Hajnoczi
2018-01-25 16:10   ` Liang, Cunming
2018-01-26  7:17     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180207202015-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=cunming.liang@intel.com \
    --cc=dan.daly@intel.com \
    --cc=jasowang@redhat.com \
    --cc=jianfeng.tan@intel.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=tiwei.bie@intel.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=xiao.w.wang@intel.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).