Re: [RFC 1/2] system/memory: Allow creating IOMMU mappings from RAM discard populate notifiers

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Chenyi Qiang <chenyi.qiang@intel.com>
To: David Hildenbrand <david@redhat.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	<philmd@linaro.org>, <peterx@redhat.com>, <pbonzini@redhat.com>,
	<peter.maydell@linaro.org>
Cc: <qemu-arm@nongnu.org>, <qemu-devel@nongnu.org>
Subject: Re: [RFC 1/2] system/memory: Allow creating IOMMU mappings from RAM discard populate notifiers
Date: Fri, 21 Feb 2025 18:04:00 +0800	[thread overview]
Message-ID: <108be3a5-cfc1-4268-a913-3a41d1e4451f@intel.com> (raw)
In-Reply-To: <ce2306f9-19a4-4979-80e6-29b1e8a92318@redhat.com>



On 2/21/2025 4:09 PM, David Hildenbrand wrote:
> On 21.02.25 03:25, Chenyi Qiang wrote:
>>
>>
>> On 2/21/2025 3:39 AM, David Hildenbrand wrote:
>>> On 20.02.25 17:13, Jean-Philippe Brucker wrote:
>>>> For Arm CCA we'd like the guest_memfd discard notifier to call the
>>>> IOMMU
>>>> notifiers and create e.g. VFIO mappings. The default VFIO discard
>>>> notifier isn't sufficient for CCA because the DMA addresses need a
>>>> translation (even without vIOMMU).
>>>>
>>>> At the moment:
>>>> * guest_memfd_state_change() calls the populate() notifier
>>>> * the populate notifier() calls IOMMU notifiers
>>>> * the IOMMU notifier handler calls memory_get_xlat_addr() to get a VA
>>>> * it calls ram_discard_manager_is_populated() which fails.
>>>>
>>>> guest_memfd_state_change() only changes the section's state after
>>>> calling the populate() notifier. We can't easily invert the order of
>>>> operation because it uses the old state bitmap to know which pages need
>>>> the populate() notifier.
>>>
>>> I assume we talk about this code: [1]
>>>
>>> [1] https://lkml.kernel.org/r/20250217081833.21568-1-
>>> chenyi.qiang@intel.com
>>>
>>>
>>> +static int memory_attribute_state_change(MemoryAttributeManager *mgr,
>>> uint64_t offset,
>>> +                                         uint64_t size, bool
>>> shared_to_private)
>>> +{
>>> +    int block_size = memory_attribute_manager_get_block_size(mgr);
>>> +    int ret = 0;
>>> +
>>> +    if (!memory_attribute_is_valid_range(mgr, offset, size)) {
>>> +        error_report("%s, invalid range: offset 0x%lx, size 0x%lx",
>>> +                     __func__, offset, size);
>>> +        return -1;
>>> +    }
>>> +
>>> +    if ((shared_to_private && memory_attribute_is_range_discarded(mgr,
>>> offset, size)) ||
>>> +        (!shared_to_private && memory_attribute_is_range_populated(mgr,
>>> offset, size))) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (shared_to_private) {
>>> +        memory_attribute_notify_discard(mgr, offset, size);
>>> +    } else {
>>> +        ret = memory_attribute_notify_populate(mgr, offset, size);
>>> +    }
>>> +
>>> +    if (!ret) {
>>> +        unsigned long first_bit = offset / block_size;
>>> +        unsigned long nbits = size / block_size;
>>> +
>>> +        g_assert((first_bit + nbits) <= mgr->bitmap_size);
>>> +
>>> +        if (shared_to_private) {
>>> +            bitmap_clear(mgr->shared_bitmap, first_bit, nbits);
>>> +        } else {
>>> +            bitmap_set(mgr->shared_bitmap, first_bit, nbits);
>>> +        }
>>> +
>>> +        return 0;
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>>
>>> Then, in memory_attribute_notify_populate(), we walk the bitmap again.
>>>
>>> Why?
>>>
>>> We just checked that it's all in the expected state, no?
>>>
>>>
>>> virtio-mem doesn't handle it that way, so I'm curious why we would have
>>> to do it here?
>>
>> I was concerned about the case where the guest issues a request that
>> only partial of the range is in the desired state.
>> I think the main problem is the policy for the guest conversion request.
>> My current handling is:
>>
>> 1. When a conversion request is made for a range already in the desired
>>    state, the helper simply returns success.
> 
> Yes.
> 
>> 2. For requests involving a range partially in the desired state, only
>>    the necessary segments are converted, ensuring the entire range
>>    complies with the request efficiently.
> 
> 
> Ah, now I get:
> 
> +    if ((shared_to_private && memory_attribute_is_range_discarded(mgr,
> offset, size)) ||
> +        (!shared_to_private && memory_attribute_is_range_populated(mgr,
> offset, size))) {
> +        return 0;
> +    }
> +
> 
> We're not failing if it might already partially be in the other state.
> 
>> 3. In scenarios where a conversion request is declined by other systems,
>>    such as a failure from VFIO during notify_populate(), the helper will
>>    roll back the request, maintaining consistency.
>>
>> And the policy of virtio-mem is to refuse the state change if not all
>> blocks are in the opposite state.
> 
> Yes.
> 
>>
>> Actually, this part is still a uncertain to me.
>>
> 
> IIUC, the problem does not exist if we only convert a single page at a
> time.
> 
> Is there a known use case where such partial conversions could happen?

I don't see such case yet. Actually, I'm trying to follow the behavior
of KVM_SET_MEMORY_ATTRIBUTES ioctl during page conversion. In KVM, it
doesn't reject the request if the whole range isn't in the opposite
state. It just uses xa_store() to update it. Also, I don't see the spec
says how to handle such case. To be robust, I just allow this special case.

> 
>> BTW, per the status/bitmap track, the virtio-mem also changes the bitmap
>> after the plug/unplug notifier. This is the same, correct?
> Right. But because we reject these partial requests, we don't have to
> traverse the bitmap and could just adjust the bitmap operations.

Yes, If we treat it as a guest error/bug, we can adjust it.

>

next prev parent reply	other threads:[~2025-02-21 10:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-20 16:13 [RFC 0/2] arm: Add DMA remapping for CCA Jean-Philippe Brucker
2025-02-20 16:13 ` [RFC 1/2] system/memory: Allow creating IOMMU mappings from RAM discard populate notifiers Jean-Philippe Brucker
2025-02-20 19:39   ` David Hildenbrand
2025-02-21  2:25     ` Chenyi Qiang
2025-02-21  8:09       ` David Hildenbrand
2025-02-21 10:04         ` Chenyi Qiang [this message]
2025-02-25  2:00           ` Chenyi Qiang
2025-02-25  9:41             ` David Hildenbrand
2025-02-26 12:43               ` Chenyi Qiang
2025-02-27  3:26                 ` Chenyi Qiang
2025-02-27 11:27                   ` David Hildenbrand
2025-02-28  5:39                     ` Chenyi Qiang
2025-02-20 16:13 ` [RFC 2/2] target/arm/kvm-rme: Add DMA remapping for the shared memory region Jean-Philippe Brucker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=108be3a5-cfc1-4268-a913-3a41d1e4451f@intel.com \
    --to=chenyi.qiang@intel.com \
    --cc=david@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).