qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Chenyi Qiang <chenyi.qiang@intel.com>
To: "Alexey Kardashevskiy" <aik@amd.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Michael Roth" <michael.roth@amd.com>
Cc: <qemu-devel@nongnu.org>, <kvm@vger.kernel.org>,
	Williams Dan J <dan.j.williams@intel.com>,
	Peng Chao P <chao.p.peng@intel.com>,
	Gao Chao <chao.gao@intel.com>, Xu Yilun <yilun.xu@intel.com>
Subject: Re: [PATCH 0/7] Enable shared device assignment
Date: Thu, 9 Jan 2025 16:49:36 +0800	[thread overview]
Message-ID: <57a3869d-f3d1-4125-aaa5-e529fb659421@intel.com> (raw)
In-Reply-To: <d4b57eb8-03f1-40f3-bc7a-23b24294e3d7@amd.com>



On 1/9/2025 4:18 PM, Alexey Kardashevskiy wrote:
> 
> 
> On 9/1/25 18:52, Chenyi Qiang wrote:
>>
>>
>> On 1/8/2025 7:38 PM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 8/1/25 17:28, Chenyi Qiang wrote:
>>>> Thanks Alexey for your review!
>>>>
>>>> On 1/8/2025 12:47 PM, Alexey Kardashevskiy wrote:
>>>>> On 13/12/24 18:08, Chenyi Qiang wrote:
>>>>>> Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
>>>>>> discard") effectively disables device assignment when using
>>>>>> guest_memfd.
>>>>>> This poses a significant challenge as guest_memfd is essential for
>>>>>> confidential guests, thereby blocking device assignment to these VMs.
>>>>>> The initial rationale for disabling device assignment was due to
>>>>>> stale
>>>>>> IOMMU mappings (see Problem section) and the assumption that TEE I/O
>>>>>> (SEV-TIO, TDX Connect, COVE-IO, etc.) would solve the device-
>>>>>> assignment
>>>>>> problem for confidential guests [1]. However, this assumption has
>>>>>> proven
>>>>>> to be incorrect. TEE I/O relies on the ability to operate devices
>>>>>> against
>>>>>> "shared" or untrusted memory, which is crucial for device
>>>>>> initialization
>>>>>> and error recovery scenarios. As a result, the current implementation
>>>>>> does
>>>>>> not adequately support device assignment for confidential guests,
>>>>>> necessitating
>>>>>> a reevaluation of the approach to ensure compatibility and
>>>>>> functionality.
>>>>>>
>>>>>> This series enables shared device assignment by notifying VFIO of
>>>>>> page
>>>>>> conversions using an existing framework named RamDiscardListener.
>>>>>> Additionally, there is an ongoing patch set [2] that aims to add 1G
>>>>>> page
>>>>>> support for guest_memfd. This patch set introduces in-place page
>>>>>> conversion,
>>>>>> where private and shared memory share the same physical pages as the
>>>>>> backend.
>>>>>> This development may impact our solution.
>>>>>>
>>>>>> We presented our solution in the guest_memfd meeting to discuss its
>>>>>> compatibility with the new changes and potential future directions
>>>>>> (see [3]
>>>>>> for more details). The conclusion was that, although our solution may
>>>>>> not be
>>>>>> the most elegant (see the Limitation section), it is sufficient for
>>>>>> now and
>>>>>> can be easily adapted to future changes.
>>>>>>
>>>>>> We are re-posting the patch series with some cleanup and have removed
>>>>>> the RFC
>>>>>> label for the main enabling patches (1-6). The newly-added patch 7 is
>>>>>> still
>>>>>> marked as RFC as it tries to resolve some extension concerns
>>>>>> related to
>>>>>> RamDiscardManager for future usage.
>>>>>>
>>>>>> The overview of the patches:
>>>>>> - Patch 1: Export a helper to get intersection of a
>>>>>> MemoryRegionSection
>>>>>>      with a given range.
>>>>>> - Patch 2-6: Introduce a new object to manage the guest-memfd with
>>>>>>      RamDiscardManager, and notify the shared/private state change
>>>>>> during
>>>>>>      conversion.
>>>>>> - Patch 7: Try to resolve a semantics concern related to
>>>>>> RamDiscardManager
>>>>>>      i.e. RamDiscardManager is used to manage memory plug/unplug
>>>>>> state
>>>>>>      instead of shared/private state. It would affect future users of
>>>>>>      RamDiscardManger in confidential VMs. Attach it behind as a RFC
>>>>>> patch[4].
>>>>>>
>>>>>> Changes since last version:
>>>>>> - Add a patch to export some generic helper functions from virtio-mem
>>>>>> code.
>>>>>> - Change the bitmap in guest_memfd_manager from default shared to
>>>>>> default
>>>>>>      private. This keeps alignment with virtio-mem that 1-setting in
>>>>>> bitmap
>>>>>>      represents the populated state and may help to export more
>>>>>> generic
>>>>>> code
>>>>>>      if necessary.
>>>>>> - Add the helpers to initialize/uninitialize the guest_memfd_manager
>>>>>> instance
>>>>>>      to make it more clear.
>>>>>> - Add a patch to distinguish between the shared/private state change
>>>>>> and
>>>>>>      the memory plug/unplug state change in RamDiscardManager.
>>>>>> - RFC: https://lore.kernel.org/qemu-devel/20240725072118.358923-1-
>>>>>> chenyi.qiang@intel.com/
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Background
>>>>>> ==========
>>>>>> Confidential VMs have two classes of memory: shared and private
>>>>>> memory.
>>>>>> Shared memory is accessible from the host/VMM while private memory is
>>>>>> not. Confidential VMs can decide which memory is shared/private and
>>>>>> convert memory between shared/private at runtime.
>>>>>>
>>>>>> "guest_memfd" is a new kind of fd whose primary goal is to serve
>>>>>> guest
>>>>>> private memory. The key differences between guest_memfd and normal
>>>>>> memfd
>>>>>> are that guest_memfd is spawned by a KVM ioctl, bound to its owner
>>>>>> VM and
>>>>>> cannot be mapped, read or written by userspace.
>>>>>
>>>>> The "cannot be mapped" seems to be not true soon anymore (if not
>>>>> already).
>>>>>
>>>>> https://lore.kernel.org/all/20240801090117.3841080-1-
>>>>> tabba@google.com/T/
>>>>
>>>> Exactly, allowing guest_memfd to do mmap is the direction. I mentioned
>>>> it below with in-place page conversion. Maybe I would move it here to
>>>> make it more clear.
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> In QEMU's implementation, shared memory is allocated with normal
>>>>>> methods
>>>>>> (e.g. mmap or fallocate) while private memory is allocated from
>>>>>> guest_memfd. When a VM performs memory conversions, QEMU frees pages
>>>>>> via
>>>>>> madvise() or via PUNCH_HOLE on memfd or guest_memfd from one side and
>>>>>> allocates new pages from the other side.
>>>>>>
>>>>
>>>> [...]
>>>>
>>>>>>
>>>>>> One limitation (also discussed in the guest_memfd meeting) is that
>>>>>> VFIO
>>>>>> expects the DMA mapping for a specific IOVA to be mapped and unmapped
>>>>>> with
>>>>>> the same granularity. The guest may perform partial conversions,
>>>>>> such as
>>>>>> converting a small region within a larger region. To prevent such
>>>>>> invalid
>>>>>> cases, all operations are performed with 4K granularity. The possible
>>>>>> solutions we can think of are either to enable VFIO to support
>>>>>> partial
>>>>>> unmap
>>>
>>> btw the old VFIO does not split mappings but iommufd seems to be capable
>>> of it - there is iopt_area_split(). What happens if you try unmapping a
>>> smaller chunk that does not exactly match any mapped chunk? thanks,
>>
>> iopt_cut_iova() happens in iommufd vfio_compat.c, which is to make
>> iommufd be compatible with old VFIO_TYPE1. IIUC, it happens with
>> disable_large_page=true. That means the large IOPTE is also disabled in
>> IOMMU. So it can do the split easily. See the comment in
>> iommufd_vfio_set_iommu().
>>
>> iommufd VFIO compatible mode is a transition from legacy VFIO to
>> iommufd. For the normal iommufd, it requires the iova/length must be a
>> superset of a previously mapped range. If not match, will return error.
> 
> 
> This is all true but this also means that "The former requires complex
> changes in VFIO" is not entirely true - some code is already there. Thanks,

Hmm, my statement is a little confusing.  The bottleneck is that the
IOMMU driver doesn't support the large page split. So if we want to
enable large page and want to do partial unmap, it requires complex change.

> 
> 
> 



  reply	other threads:[~2025-01-09  8:50 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13  7:08 [PATCH 0/7] Enable shared device assignment Chenyi Qiang
2024-12-13  7:08 ` [PATCH 1/7] memory: Export a helper to get intersection of a MemoryRegionSection with a given range Chenyi Qiang
2024-12-18 12:33   ` David Hildenbrand
2025-01-08  4:47   ` Alexey Kardashevskiy
2025-01-08  6:41     ` Chenyi Qiang
2024-12-13  7:08 ` [PATCH 2/7] guest_memfd: Introduce an object to manage the guest-memfd with RamDiscardManager Chenyi Qiang
2024-12-18  6:45   ` Chenyi Qiang
2025-01-08  4:48   ` Alexey Kardashevskiy
2025-01-08 10:56     ` Chenyi Qiang
2025-01-08 11:20       ` Alexey Kardashevskiy
2025-01-09  2:11         ` Chenyi Qiang
2025-01-09  2:55           ` Alexey Kardashevskiy
2025-01-09  4:29             ` Chenyi Qiang
2025-01-10  0:58               ` Alexey Kardashevskiy
2025-01-10  6:38                 ` Chenyi Qiang
2025-01-09 21:00                   ` Xu Yilun
2025-01-09 21:50                     ` Xu Yilun
2025-01-13  3:34                       ` Chenyi Qiang
2025-01-12 22:23                         ` Xu Yilun
2025-01-14  1:14                           ` Chenyi Qiang
2025-01-15  4:06                   ` Alexey Kardashevskiy
2025-01-15  6:15                     ` Chenyi Qiang
     [not found]                       ` <2b2730f3-6e1a-4def-b126-078cf6249759@amd.com>
2025-01-20 20:46                         ` Peter Xu
2024-06-24 16:31                           ` Xu Yilun
2025-01-21 15:18                             ` Peter Xu
2025-01-22  4:30                               ` Alexey Kardashevskiy
2025-01-22  9:41                                 ` Xu Yilun
2025-01-22 16:43                                   ` Peter Xu
2025-01-23  9:33                                     ` Xu Yilun
2025-01-23 16:47                                       ` Peter Xu
2025-01-24  9:47                                         ` Xu Yilun
2025-01-24 15:55                                           ` Peter Xu
2025-01-24 18:17                                             ` David Hildenbrand
2025-01-26  3:34                                             ` Xu Yilun
2025-01-30 16:28                                               ` Peter Xu
2025-01-30 16:51                                                 ` David Hildenbrand
2025-02-06 10:41                                                 ` Xu Yilun
2025-02-06 20:03                                                   ` Peter Xu
2025-01-14  6:45               ` Chenyi Qiang
2025-01-13 10:54       ` David Hildenbrand
2025-01-14  1:10         ` Chenyi Qiang
2025-01-15  4:05         ` Alexey Kardashevskiy
     [not found]           ` <f3aaffe7-7045-4288-8675-349115a867ce@redhat.com>
2025-01-20 17:21             ` Peter Xu
2025-01-20 17:54               ` David Hildenbrand
2025-01-20 18:33                 ` Peter Xu
2025-01-20 18:47                   ` David Hildenbrand
2025-01-20 20:19                     ` Peter Xu
2025-01-20 20:25                       ` David Hildenbrand
2025-01-20 20:43                         ` Peter Xu
2025-01-21  1:35                   ` Chenyi Qiang
2025-01-21 16:35                     ` Peter Xu
2025-01-22  3:28                       ` Chenyi Qiang
2025-01-22  5:38                         ` Xiaoyao Li
2025-01-24  0:15                           ` Alexey Kardashevskiy
2025-01-24  3:09                             ` Chenyi Qiang
2025-01-24  5:56                               ` Alexey Kardashevskiy
2025-01-24 16:12                                 ` Peter Xu
2025-01-20 18:09   ` Peter Xu
2025-01-21  9:00     ` Chenyi Qiang
2025-01-21  9:26       ` David Hildenbrand
2025-01-21 10:16         ` Chenyi Qiang
2025-01-21 10:26           ` David Hildenbrand
2025-01-22  6:43             ` Chenyi Qiang
2025-01-21 15:38       ` Peter Xu
2025-01-24  3:40         ` Chenyi Qiang
2024-12-13  7:08 ` [PATCH 3/7] guest_memfd: Introduce a callback to notify the shared/private state change Chenyi Qiang
2024-12-13  7:08 ` [PATCH 4/7] KVM: Notify the state change event during shared/private conversion Chenyi Qiang
2024-12-13  7:08 ` [PATCH 5/7] memory: Register the RamDiscardManager instance upon guest_memfd creation Chenyi Qiang
2025-01-08  4:47   ` Alexey Kardashevskiy
2025-01-09  5:34     ` Chenyi Qiang
2025-01-09  9:32       ` Alexey Kardashevskiy
2025-01-10  5:13         ` Chenyi Qiang
     [not found]           ` <59bd0e82-f269-4567-8f75-a32c9c997ca9@redhat.com>
2025-01-24  3:27             ` Alexey Kardashevskiy
2025-01-24  5:36               ` Chenyi Qiang
2025-01-09  8:14   ` Zhao Liu
2025-01-09  8:17     ` Chenyi Qiang
2024-12-13  7:08 ` [PATCH 6/7] RAMBlock: make guest_memfd require coordinate discard Chenyi Qiang
2025-01-13 10:56   ` David Hildenbrand
2025-01-14  1:38     ` Chenyi Qiang
     [not found]       ` <e1141052-1dec-435b-8635-a41881fedd4c@redhat.com>
2025-01-21  6:26         ` Chenyi Qiang
2025-01-21  8:05           ` David Hildenbrand
2024-12-13  7:08 ` [RFC PATCH 7/7] memory: Add a new argument to indicate the request attribute in RamDismcardManager helpers Chenyi Qiang
2025-01-08  4:47 ` [PATCH 0/7] Enable shared device assignment Alexey Kardashevskiy
2025-01-08  6:28   ` Chenyi Qiang
2025-01-08 11:38     ` Alexey Kardashevskiy
2025-01-09  7:52       ` Chenyi Qiang
2025-01-09  8:18         ` Alexey Kardashevskiy
2025-01-09  8:49           ` Chenyi Qiang [this message]
2025-01-10  1:42             ` Alexey Kardashevskiy
2025-01-10  7:06               ` Chenyi Qiang
2025-01-10  8:26                 ` David Hildenbrand
2025-01-10 13:20                   ` Jason Gunthorpe
2025-01-10 13:45                     ` David Hildenbrand
2025-01-10 14:14                       ` Jason Gunthorpe
2025-01-10 14:50                         ` David Hildenbrand
2025-01-15  3:39                         ` Alexey Kardashevskiy
2025-01-15 12:49                           ` Jason Gunthorpe
     [not found]                             ` <cc3428b1-22b7-432a-9c74-12b7e36b6cc6@redhat.com>
2025-01-20 18:39                               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57a3869d-f3d1-4125-aaa5-e529fb659421@intel.com \
    --to=chenyi.qiang@intel.com \
    --cc=aik@amd.com \
    --cc=chao.gao@intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=yilun.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).