qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Jürgen Groß" <jgross@suse.com>
Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	qemu-devel@nongnu.org, sstabellini@kernel.org,
	vikram.garhwal@amd.com
Subject: Re: [QEMU][PATCH v3 5/7] memory: add MemoryRegion map and unmap callbacks
Date: Tue, 16 Apr 2024 11:55:28 -0400	[thread overview]
Message-ID: <Zh6fcLzbm4cpknbT@x1n> (raw)
In-Reply-To: <3abfdbdd-ee70-4b61-a652-c7b2490732d6@suse.com>

On Tue, Apr 16, 2024 at 03:28:41PM +0200, Jürgen Groß wrote:
> On 16.04.24 13:32, Edgar E. Iglesias wrote:
> > On Wed, Apr 10, 2024 at 8:56 PM Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > On Wed, Apr 10, 2024 at 06:44:38PM +0200, Edgar E. Iglesias wrote:
> > > > On Tue, Feb 27, 2024 at 11:37 PM Vikram Garhwal <vikram.garhwal@amd.com>
> > > > wrote:
> > > > 
> > > > > From: Juergen Gross <jgross@suse.com>
> > > > > 
> > > > > In order to support mapping and unmapping guest memory dynamically to
> > > > > and from qemu during address_space_[un]map() operations add the map()
> > > > > and unmap() callbacks to MemoryRegionOps.
> > > > > 
> > > > > Those will be used e.g. for Xen grant mappings when performing guest
> > > > > I/Os.
> > > > > 
> > > > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > > > > Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
> > > > > 
> > > > 
> > > > 
> > > > Paolo, Peter, David, Phiippe, do you guys have any concerns with this patch?
> > > 
> > 
> > Thanks for your comments Peter,
> > 
> > 
> > > This introduces a 3rd memory type afaict, neither direct nor !direct.
> > > 
> > > What happens if someone does address_space_write() to it?  I didn't see it
> > > covered here..
> > 
> > You're right, that won't work, the memory needs to be mapped before it
> > can be used.
> > At minimum there should be some graceful failure, right now this will crash.
> > 
> > > 
> > > OTOH, the cover letter didn't mention too much either on the big picture:
> > > 
> > > https://lore.kernel.org/all/20240227223501.28475-1-vikram.garhwal@amd.com/
> > > 
> > > I want to have a quick grasp on whether it's justified worthwhile we should
> > > introduce this complexity to qemu memory core.
> > > 
> > > Could I request a better cover letter when repost?  It'll be great to
> > > mention things like:
> > 
> > I'll do that, but also answer inline in the meantime since we should
> > perhaps change the approach.
> > 
> > > 
> > >    - what is grant mapping, why it needs to be used, when it can be used (is
> > >      it only relevant to vIOMMU=on)?  Some more information on the high
> > >      level design using this type or MR would be great.
> > 
> > https://github.com/xen-project/xen/blob/master/docs/misc/grant-tables.txt
> > 
> > Xen VM's that use QEMU's VirtIO have a QEMU instance running in a separate VM.
> > 
> > There's basically two mechanisms for QEMU's Virtio backends to access
> > the guest's RAM.
> > 1. Foreign mappings. This gives the VM running QEMU access to the
> > entire RAM of the guest VM.
> 
> Additionally it requires qemu to run in dom0, while in general Xen allows
> to run backends in less privileged "driver domains", which are usually not
> allowed to perform foreign mappings.
> 
> > 2. Grant mappings. This allows the guest to dynamically grant and
> > remove access to pages as needed.
> > So the VM running QEMU, cannot map guest RAM unless it's been
> > instructed to do so by the guest.
> > 
> > #2 is desirable because if QEMU gets compromised it has a smaller
> > attack surface onto the guest.
> 
> And it allows to run the virtio backend in a less privileged VM.
> 
> > 
> > > 
> > >    - why a 3rd memory type is required?  Do we have other alternatives?
> > 
> > Yes, there are alternatives.
> > 
> > 1. It was suggested by Stefano to try to handle this in existing qemu/hw/xen/*.
> > This would be less intrusive but perhaps also less explicit.
> > Concerns about touching the Memory API have been raised before, so
> > perhaps we should try this.
> > I'm a little unsure how we would deal with unmapping when the guest
> > removes our grants and we're using models that don't map but use
> > address_space_read/write().
> 
> Those would either need to use grant-copy hypercalls, or they'd need to map,
> read/write, unmap.
> 
> > 
> > 2. Another approach could be to change the Xen grant-iommu in the
> > Linux kernel to work with a grant vIOMMU in QEMU.
> > Linux could explicitly ask QEMU's grant vIOMMU to map/unmap granted regions.
> > This would have the benefit that we wouldn't need to allocate
> > address-bit 63 for grants.
> > A drawback is that it may be slower since we're going to need to
> > bounce between guest/qemu a bit more.
> 
> It would be a _lot_ slower, unless virtio-iommu and grants are both modified
> to match. I have looked into that, but the needed effort is rather large. At
> the last Xen summit I have suggested to introduce a new grant format which
> would work more like a normal page table structure. Using the same format
> for virtio-iommu would allow to avoid the additional bounces between qemu and
> the guest (and in fact that was one of the motivations to suggest the new
> grant format).

I have a better picture now, thanks both.

It really looks like an vIOMMU already to me, perhaps with a special refID
mapping playing similar roles as IOVAs in the rest IOMMU worlds.

I can't yet tell what's the best way for Xen - as of now QEMU's memory API
does provide such translations via IOMMUMemoryRegionClass.translate(), but
only from that.  So far it works for all vIOMMU emulations in QEMU, and I'd
hope we don't need to hack another memory type if possible for this,
especially if for performance's sake; more on below.

QEMU also suffers from similar issues with other kind of DMA protections,
at least that's what I'm aware with using either VT-d, SMMU, etc.. where
dynamic DMA mappings will slow the IOs down to a degree that it may not be
usable in real production.  We kept it like that and so far AFAIK we don't
yet have a solution for that simply because of the nature on how DMA
buffers are mapped and managed within a guest OS no matter Linux or not.

For Linux as a guest we basically suggest enabling iommu=pt so that kernel
drivers are trusted, and kernel driven devices can have full access to
guest RAMs by using the vIOMMU's passthrough mode. Perhaps similar to
foreign mappings for Xen, but maybe still different, as Xen's topology is
definitely special as a hypervisor here.

While for userspace drivers within the guest OS it'll always go through
vfio-pci now, which will enforce effective DMA mappings not the passthrough
mode. Then it's suggested to only map as less as possible, e.g. DPDK only
maps at the start of the user driver so it's mostly not affected by the
slowness of frequently changing DMA mappings.

I'm not sure whether above ideas would even be applicable for Xen, but I
just to share the status quo regarding to how we manage protected DMAs when
without Xen, just in case there's anything useful to help route the path
forward.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2024-04-16 15:56 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 22:34 [QEMU][PATCH v3 0/7] Xen: support grant mappings Vikram Garhwal
2024-02-27 22:34 ` [QEMU][PATCH v3 1/7] softmmu: physmem: Split ram_block_add() Vikram Garhwal
2024-03-01 11:33   ` Alex Bennée
2024-04-10 11:10     ` Edgar E. Iglesias
2024-02-27 22:34 ` [QEMU][PATCH v3 2/7] xen: add pseudo RAM region for grant mappings Vikram Garhwal
2024-03-01 14:05   ` Alex Bennée
2024-04-10 11:12     ` Edgar E. Iglesias
2024-02-27 22:34 ` [QEMU][PATCH v3 3/7] softmmu: let qemu_map_ram_ptr() use qemu_ram_ptr_length() Vikram Garhwal
2024-03-01 17:04   ` Alex Bennée
2024-03-06 20:58     ` Vikram Garhwal
2024-04-10 11:15   ` Edgar E. Iglesias
2024-02-27 22:34 ` [QEMU][PATCH v3 4/7] xen: let xen_ram_addr_from_mapcache() return -1 in case of not found entry Vikram Garhwal
2024-03-01 17:08   ` Alex Bennée
2024-04-10 11:14     ` Edgar E. Iglesias
2024-02-27 22:34 ` [QEMU][PATCH v3 5/7] memory: add MemoryRegion map and unmap callbacks Vikram Garhwal
2024-02-29 23:10   ` Stefano Stabellini
2024-04-10 11:16     ` Edgar E. Iglesias
2024-04-10 16:44   ` Edgar E. Iglesias
2024-04-10 18:56     ` Peter Xu
2024-04-16 11:32       ` Edgar E. Iglesias
2024-04-16 13:28         ` Jürgen Groß
2024-04-16 15:55           ` Peter Xu [this message]
2024-04-17 10:34             ` Edgar E. Iglesias
2024-02-27 22:35 ` [QEMU][PATCH v3 6/7] xen: add map and unmap callbacks for grant region Vikram Garhwal
2024-02-29 23:10   ` Stefano Stabellini
2024-04-10 11:11     ` Edgar E. Iglesias
2024-02-27 22:35 ` [QEMU][PATCH v3 7/7] hw: arm: Add grant mapping Vikram Garhwal
2024-03-01 17:10   ` Alex Bennée
2024-03-06 20:56     ` Vikram Garhwal
2024-04-10 11:09       ` Edgar E. Iglesias
2024-02-28 13:27 ` [QEMU][PATCH v3 0/7] Xen: support grant mappings Manos Pitsidianakis
2024-02-28 18:59   ` Vikram Garhwal
2024-04-10 12:43     ` Edgar E. Iglesias

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zh6fcLzbm4cpknbT@x1n \
    --to=peterx@redhat.com \
    --cc=david@redhat.com \
    --cc=edgar.iglesias@gmail.com \
    --cc=jgross@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sstabellini@kernel.org \
    --cc=vikram.garhwal@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).