Re: [RFC PATCH 0/6] Enable shared device assignment

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: "Chenyi Qiang" <chenyi.qiang@intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Michael Roth" <michael.roth@amd.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Williams Dan J <dan.j.williams@intel.com>,
	Edgecombe Rick P <rick.p.edgecombe@intel.com>,
	Wang Wei W <wei.w.wang@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>, Wu Hao <hao.wu@intel.com>,
	Xu Yilun <yilun.xu@intel.com>
Subject: Re: [RFC PATCH 0/6] Enable shared device assignment
Date: Thu, 25 Jul 2024 16:04:12 +0200	[thread overview]
Message-ID: <ace9bb98-1415-460f-b8f5-e50607fbce20@redhat.com> (raw)
In-Reply-To: <20240725072118.358923-1-chenyi.qiang@intel.com>

> Open
> ====
> Implementing a RamDiscardManager to notify VFIO of page conversions
> causes changes in semantics: private memory is treated as discarded (or
> hot-removed) memory. This isn't aligned with the expectation of current
> RamDiscardManager users (e.g. VFIO or live migration) who really
> expect that discarded memory is hot-removed and thus can be skipped when
> the users are processing guest memory. Treating private memory as
> discarded won't work in future if VFIO or live migration needs to handle
> private memory. e.g. VFIO may need to map private memory to support
> Trusted IO and live migration for confidential VMs need to migrate
> private memory.

"VFIO may need to map private memory to support Trusted IO"

I've been told that the way we handle shared memory won't be the way 
this is going to work with guest_memfd. KVM will coordinate directly 
with VFIO or $whatever and update the IOMMU tables itself right in the 
kernel; the pages are pinned/owned by guest_memfd, so that will just 
work. So I don't consider that currently a concern. guest_memfd private 
memory is not mapped into user page tables and as it currently seems it 
never will be.

Similarly: live migration. We cannot simply migrate that memory the 
traditional way. We even have to track the dirty state differently.

So IMHO, treating both memory as discarded == don't touch it the usual 
way might actually be a feature not a bug ;)

> 
> There are two possible ways to mitigate the semantics changes.
> 1. Develop a new mechanism to notify the page conversions between
> private and shared. For example, utilize the notifier_list in QEMU. VFIO
> registers its own handler and gets notified upon page conversions. This
> is a clean approach which only touches the notifier workflow. A
> challenge is that for device hotplug, existing shared memory should be
> mapped in IOMMU. This will need additional changes.
> 
> 2. Extend the existing RamDiscardManager interface to manage not only
> the discarded/populated status of guest memory but also the
> shared/private status. RamDiscardManager users like VFIO will be
> notified with one more argument indicating what change is happening and
> can take action accordingly. It also has challenges e.g. QEMU allows
> only one RamDiscardManager, how to support virtio-mem for confidential
> VMs would be a problem. And some APIs like .is_populated() exposed by
> RamDiscardManager are meaningless to shared/private memory. So they may
> need some adjustments.

Think of all of that in terms of "shared memory is populated, private 
memory is some inaccessible stuff that needs very special way and other 
means for device assignment, live migration, etc.". Then it actually 
quite makes sense to use of RamDiscardManager (AFAIKS :) ).

> 
> Testing
> =======
> This patch series is tested based on the internal TDX KVM/QEMU tree.
> 
> To facilitate shared device assignment with the NIC, employ the legacy
> type1 VFIO with the QEMU command:
> 
> qemu-system-x86_64 [...]
>      -device vfio-pci,host=XX:XX.X
> 
> The parameter of dma_entry_limit needs to be adjusted. For example, a
> 16GB guest needs to adjust the parameter like
> vfio_iommu_type1.dma_entry_limit=4194304.

But here you note the biggest real issue I see (not related to 
RAMDiscardManager, but that we have to prepare for conversion of each 
possible private page to shared and back): we need a single IOMMU 
mapping for each 4 KiB page.

Doesn't that mean that we limit shared memory to 4194304*4096 == 16 GiB. 
Does it even scale then?

There is the alternative of having in-place private/shared conversion 
when we also let guest_memfd manage some shared memory. It has plenty of 
downsides, but for the problem at hand it would mean that we don't 
discard on shared/private conversion.

But whenever we want to convert memory shared->private we would 
similarly have to from IOMMU page tables via VFIO. (the in-place 
conversion will only be allowed if any additional references on a page 
are gone -- when it is inaccessible by userspace/kernel).

Again, if IOMMU page tables would be managed by KVM in the kernel 
without user space intervention/vfio this would work with device 
assignment just fine. But I guess it will take a while until we actually 
have that option.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2024-07-25 14:05 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25  7:21 [RFC PATCH 0/6] Enable shared device assignment Chenyi Qiang
2024-07-25  7:21 ` [RFC PATCH 1/6] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager Chenyi Qiang
2024-07-25  7:21 ` [RFC PATCH 2/6] guest_memfd: Introduce a helper to notify the shared/private state change Chenyi Qiang
2024-07-25  7:21 ` [RFC PATCH 3/6] KVM: Notify the state change via RamDiscardManager helper during shared/private conversion Chenyi Qiang
2024-07-25  7:21 ` [RFC PATCH 4/6] memory: Register the RamDiscardManager instance upon guest_memfd creation Chenyi Qiang
2024-07-25  7:21 ` [RFC PATCH 5/6] guest-memfd: Default to discarded (private) in guest_memfd_manager Chenyi Qiang
2024-07-25  7:21 ` [RFC PATCH 6/6] RAMBlock: make guest_memfd require coordinate discard Chenyi Qiang
2024-07-25 14:04 ` David Hildenbrand [this message]
2024-07-26  5:02   ` [RFC PATCH 0/6] Enable shared device assignment Tian, Kevin
2024-07-26  7:08     ` David Hildenbrand
2024-07-31  7:12       ` Xu Yilun
2024-07-31 11:05         ` David Hildenbrand
2024-07-26  6:20   ` Chenyi Qiang
2024-07-26  7:20     ` David Hildenbrand
2024-07-26 10:56       ` Chenyi Qiang
2024-07-31 11:18         ` David Hildenbrand
2024-08-02  7:00           ` Chenyi Qiang
2024-08-01  7:32       ` Yin, Fengwei
2024-08-16  3:02 ` Chenyi Qiang
2024-10-08  8:59   ` Chenyi Qiang
2024-11-15 16:47     ` Rob Nertney
2024-11-15 17:20       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ace9bb98-1415-460f-b8f5-e50607fbce20@redhat.com \
    --to=david@redhat.com \
    --cc=chao.gao@intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=chenyi.qiang@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=hao.wu@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=wei.w.wang@intel.com \
    --cc=yilun.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).