qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Matthew Rosato <mjrosato@linux.ibm.com>, qemu-s390x@nongnu.org
Cc: farman@linux.ibm.com, schnelle@linux.ibm.com, thuth@redhat.com,
	pasic@linux.ibm.com, borntraeger@linux.ibm.com,
	richard.henderson@linaro.org, iii@linux.ibm.com,
	clegoate@redhat.com, qemu-devel@nongnu.org,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH 1/2] s390x/pci: add support for guests that request direct mapping
Date: Tue, 10 Dec 2024 09:58:36 +0100	[thread overview]
Message-ID: <f38ed904-d235-43b5-a351-ea64e54984b4@redhat.com> (raw)
In-Reply-To: <a6bfbf15-cbe4-4023-9a8e-d172cf8c8128@linux.ibm.com>

On 10.12.24 00:22, Matthew Rosato wrote:
> On 12/9/24 5:09 PM, David Hildenbrand wrote:
>> On 09.12.24 22:45, Matthew Rosato wrote:
>>> On 12/9/24 4:01 PM, David Hildenbrand wrote:
>>>> On 09.12.24 20:29, Matthew Rosato wrote:
>>>>
>>>> Hi,
>>>>
>>>> Trying to wrap my head around that ... you mention that "pin the entirety of guest memory".
>>>>
>>>> Do you mean that we will actually end up longterm pinning all guest RAM in the kernel, similar to what vfio ends up doing?
>>>
>>> Yes.  Actually, the usecase here is specifically PCI passthrough via vfio-pci on s390.  Unlike other platforms, the default s390 approach only pins on-demand and doesn't longterm pin all guest RAM, which is nice from a memory footprint perspective but pays a price via all those guest-2 RPCIT instructions.  The goal here is now provide the optional alternative to longterm pin like other platforms.
>>
>> Okay, thanks for confirming. One more question: who will trigger this longterm-pinning? Is it vfio?
>>
>> (the code flow from your code to the pinning code would be nice)
>>

Thanks for the details.

> 
> Yes, the vfio IOMMU code triggers it.  My s390_pci_setup_stage2_map added by this patch calls memory_region_notify_iommu in a loop such that we trigger iommu notifications to map iova X+pba -> GPA X for all GPA, where pba = a "base address" offset that has to be applied when mapping on s390.

Ah, now I see that you use "ms->ram_size", so indeed, this will map 
initial RAM only.

The more-future-proof approach would indeed be to register a memory 
listener on &address_space_memory, to then map/unmap whenever notified 
about map/unmap.

See "struct vfio_memory_listener" with its region_add/region_del 
callbacks that do that, and also implement the RAMDiscardManager plumbing.


And now I wonder if there would be a way to just reuse "struct 
vfio_memory_listener" here some way? Or at least reuse the map/unmap 
functions, because ...

The notifications are sent in the largest batches possible to minimize 
vfio ioctls / use the least number of vfio dma mappings.
> 
> The iommu notifications get picked up in vfio_iommu_map_notify where we will follow the container DMA path down to vfio_legacy_dma_map; then ultimately vfio is handling the pinning via the VFIO_IOMMU_MAP_DMA ioctl.

... vfio_listener_region_add() will also just call vfio_container_dma_map().

Maybe there is a reason s390x needs to handle this using 
memory_region_notify_iommu() callbacks instead of doing it similar to 
"struct vfio_memory_listener" when registered on &address_space_memory 
without a viommu.

> 
>>>
>>>>
>>>> In that case, it would be incompatible with virtio-balloon (and without modifications with upcoming virtio-mem). Is there already a mechanism in place to handle that -- a call  to ram_block_discard_disable() -- or even a way to support coordinated discarding of RAM (e.g., virtio-mem + vfio)?
>>>
>>> Good point, should be calling add ram_block_discard_disable(true) when set register + a corresponding (false) during deregister...  Will add for v2.
>>>
>>> As for supporting coordinated discard, I was hoping to subsequently look at virtio-mem for this.
>>
>> As long as discarding is blocked for now, we're good. To support it, the RAMDiscardManager would have to be wired up, similar to vfio.
>>
>> I think the current way of handling it via
>> vf
>> +    IOMMUTLBEvent event = {
>> +        .type = IOMMU_NOTIFIER_MAP,
>> +        .entry = {
>> +            .target_as = &address_space_memory,
>> +            .translated_addr = 0,
>> +            .perm = IOMMU_RW,
>> +        },
>> +    };
>>
>>
>> Is probably not ideal: it cannot cope with memory holes (which virtio-mem would create).
>>
>> Likely, you'd instead want an address space notifier, and really only map the memory region sections you get notified about.
>>
>> There, you can test for RAMDiscardManager and handle it like vfio does.
>>
> 
> I'll start looking into this; for the moment I'll plan on blocking discarding in this series with a follow-on to then enable virtio-mem, but if I get something working sooner I'll add it to this series.  Either way I'll put you on CC.

Thanks, exploring whether we can reuse the vfio bits to map/unmap might 
be very valuable and simplify things.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-12-10  8:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-09 19:29 [PATCH 0/2] s390x/pci: relax I/O address translation requirement Matthew Rosato
2024-12-09 19:29 ` [PATCH 1/2] s390x/pci: add support for guests that request direct mapping Matthew Rosato
2024-12-09 21:01   ` David Hildenbrand
2024-12-09 21:45     ` Matthew Rosato
2024-12-09 22:09       ` David Hildenbrand
2024-12-09 23:22         ` Matthew Rosato
2024-12-10  8:58           ` David Hildenbrand [this message]
2024-12-13 22:46             ` Matthew Rosato
2024-12-11 11:34   ` Thomas Huth
2024-12-11 14:40   ` Cédric Le Goater
2024-12-11 15:17     ` Matthew Rosato
2024-12-09 19:29 ` [PATCH 2/2] s390x/pci: indicate QEMU supports relaxed translation for passthrough Matthew Rosato
2024-12-11 11:40   ` Thomas Huth
2024-12-12  9:10 ` [PATCH 0/2] s390x/pci: relax I/O address translation requirement Thomas Huth
2024-12-12 14:42   ` Matthew Rosato
2024-12-13  9:07     ` Cédric Le Goater
2024-12-13  9:24     ` Thomas Huth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f38ed904-d235-43b5-a351-ea64e54984b4@redhat.com \
    --to=david@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=clegoate@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=schnelle@linux.ibm.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).