From: David Hildenbrand <david@redhat.com>
To: Matthew Rosato <mjrosato@linux.ibm.com>, qemu-s390x@nongnu.org
Cc: farman@linux.ibm.com, schnelle@linux.ibm.com, thuth@redhat.com,
pasic@linux.ibm.com, borntraeger@linux.ibm.com,
richard.henderson@linaro.org, iii@linux.ibm.com,
clegoate@redhat.com, qemu-devel@nongnu.org,
Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH 1/2] s390x/pci: add support for guests that request direct mapping
Date: Tue, 10 Dec 2024 09:58:36 +0100 [thread overview]
Message-ID: <f38ed904-d235-43b5-a351-ea64e54984b4@redhat.com> (raw)
In-Reply-To: <a6bfbf15-cbe4-4023-9a8e-d172cf8c8128@linux.ibm.com>
On 10.12.24 00:22, Matthew Rosato wrote:
> On 12/9/24 5:09 PM, David Hildenbrand wrote:
>> On 09.12.24 22:45, Matthew Rosato wrote:
>>> On 12/9/24 4:01 PM, David Hildenbrand wrote:
>>>> On 09.12.24 20:29, Matthew Rosato wrote:
>>>>
>>>> Hi,
>>>>
>>>> Trying to wrap my head around that ... you mention that "pin the entirety of guest memory".
>>>>
>>>> Do you mean that we will actually end up longterm pinning all guest RAM in the kernel, similar to what vfio ends up doing?
>>>
>>> Yes. Actually, the usecase here is specifically PCI passthrough via vfio-pci on s390. Unlike other platforms, the default s390 approach only pins on-demand and doesn't longterm pin all guest RAM, which is nice from a memory footprint perspective but pays a price via all those guest-2 RPCIT instructions. The goal here is now provide the optional alternative to longterm pin like other platforms.
>>
>> Okay, thanks for confirming. One more question: who will trigger this longterm-pinning? Is it vfio?
>>
>> (the code flow from your code to the pinning code would be nice)
>>
Thanks for the details.
>
> Yes, the vfio IOMMU code triggers it. My s390_pci_setup_stage2_map added by this patch calls memory_region_notify_iommu in a loop such that we trigger iommu notifications to map iova X+pba -> GPA X for all GPA, where pba = a "base address" offset that has to be applied when mapping on s390.
Ah, now I see that you use "ms->ram_size", so indeed, this will map
initial RAM only.
The more-future-proof approach would indeed be to register a memory
listener on &address_space_memory, to then map/unmap whenever notified
about map/unmap.
See "struct vfio_memory_listener" with its region_add/region_del
callbacks that do that, and also implement the RAMDiscardManager plumbing.
And now I wonder if there would be a way to just reuse "struct
vfio_memory_listener" here some way? Or at least reuse the map/unmap
functions, because ...
The notifications are sent in the largest batches possible to minimize
vfio ioctls / use the least number of vfio dma mappings.
>
> The iommu notifications get picked up in vfio_iommu_map_notify where we will follow the container DMA path down to vfio_legacy_dma_map; then ultimately vfio is handling the pinning via the VFIO_IOMMU_MAP_DMA ioctl.
... vfio_listener_region_add() will also just call vfio_container_dma_map().
Maybe there is a reason s390x needs to handle this using
memory_region_notify_iommu() callbacks instead of doing it similar to
"struct vfio_memory_listener" when registered on &address_space_memory
without a viommu.
>
>>>
>>>>
>>>> In that case, it would be incompatible with virtio-balloon (and without modifications with upcoming virtio-mem). Is there already a mechanism in place to handle that -- a call to ram_block_discard_disable() -- or even a way to support coordinated discarding of RAM (e.g., virtio-mem + vfio)?
>>>
>>> Good point, should be calling add ram_block_discard_disable(true) when set register + a corresponding (false) during deregister... Will add for v2.
>>>
>>> As for supporting coordinated discard, I was hoping to subsequently look at virtio-mem for this.
>>
>> As long as discarding is blocked for now, we're good. To support it, the RAMDiscardManager would have to be wired up, similar to vfio.
>>
>> I think the current way of handling it via
>> vf
>> + IOMMUTLBEvent event = {
>> + .type = IOMMU_NOTIFIER_MAP,
>> + .entry = {
>> + .target_as = &address_space_memory,
>> + .translated_addr = 0,
>> + .perm = IOMMU_RW,
>> + },
>> + };
>>
>>
>> Is probably not ideal: it cannot cope with memory holes (which virtio-mem would create).
>>
>> Likely, you'd instead want an address space notifier, and really only map the memory region sections you get notified about.
>>
>> There, you can test for RAMDiscardManager and handle it like vfio does.
>>
>
> I'll start looking into this; for the moment I'll plan on blocking discarding in this series with a follow-on to then enable virtio-mem, but if I get something working sooner I'll add it to this series. Either way I'll put you on CC.
Thanks, exploring whether we can reuse the vfio bits to map/unmap might
be very valuable and simplify things.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-12-10 8:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-09 19:29 [PATCH 0/2] s390x/pci: relax I/O address translation requirement Matthew Rosato
2024-12-09 19:29 ` [PATCH 1/2] s390x/pci: add support for guests that request direct mapping Matthew Rosato
2024-12-09 21:01 ` David Hildenbrand
2024-12-09 21:45 ` Matthew Rosato
2024-12-09 22:09 ` David Hildenbrand
2024-12-09 23:22 ` Matthew Rosato
2024-12-10 8:58 ` David Hildenbrand [this message]
2024-12-13 22:46 ` Matthew Rosato
2024-12-11 11:34 ` Thomas Huth
2024-12-11 14:40 ` Cédric Le Goater
2024-12-11 15:17 ` Matthew Rosato
2024-12-09 19:29 ` [PATCH 2/2] s390x/pci: indicate QEMU supports relaxed translation for passthrough Matthew Rosato
2024-12-11 11:40 ` Thomas Huth
2024-12-12 9:10 ` [PATCH 0/2] s390x/pci: relax I/O address translation requirement Thomas Huth
2024-12-12 14:42 ` Matthew Rosato
2024-12-13 9:07 ` Cédric Le Goater
2024-12-13 9:24 ` Thomas Huth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f38ed904-d235-43b5-a351-ea64e54984b4@redhat.com \
--to=david@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=borntraeger@linux.ibm.com \
--cc=clegoate@redhat.com \
--cc=farman@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=mjrosato@linux.ibm.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-s390x@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=schnelle@linux.ibm.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).