From: Robin Murphy <robin.murphy@arm.com>
To: Niklas Schnelle <schnelle@linux.ibm.com>,
Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>,
linux-s390@vger.kernel.org, alex.williamson@redhat.com,
cohuck@redhat.com, farman@linux.ibm.com, pmorel@linux.ibm.com,
borntraeger@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com,
gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com,
svens@linux.ibm.com, frankja@linux.ibm.com, david@redhat.com,
imbrenda@linux.ibm.com, vneethv@linux.ibm.com,
oberpar@linux.ibm.com, freude@linux.ibm.com, thuth@redhat.com,
pasic@linux.ibm.com
Subject: Re: s390-iommu.c default domain conversion
Date: Fri, 20 May 2022 16:51:11 +0100 [thread overview]
Message-ID: <db3ebceb-6a1b-1986-c774-45b425923467@arm.com> (raw)
In-Reply-To: <6271dd24bfcf82b0c1b911a163ae9549c24691a4.camel@linux.ibm.com>
On 2022-05-20 16:17, Niklas Schnelle wrote:
> On Fri, 2022-05-20 at 10:44 -0300, Jason Gunthorpe wrote:
>> On Fri, May 20, 2022 at 03:05:46PM +0200, Niklas Schnelle wrote:
>>
>>> I did some testing and created a prototype that gets rid of
>>> arch/s390/pci_dma.c and works soley via dma-iommu on top of our IOMMU
>>> driver. It looks like the existing dma-iommu code allows us to do this
>>> with relatively simple changes to the IOMMU driver only, mostly just
>>> implementing iotlb_sync(), iotlb_sync_map() and flush_iotlb_all() so
>>> that's great. They also do seem to map quite well to our RPCIT I/O TLB
>>> flush so that's great. For now the prototype still uses 4k pages only.
>>
>> You are going to want to improve that page sizes in the iommu driver
>> anyhow for VFIO.
>
> Ok, we'll look into this.
>
>>
>>> With that the performance on the LPAR machine hypervisor (no paging) is
>>> on par with our existing code. On paging hypervisors (z/VM and KVM)
>>> i.e. with the hypervisor shadowing the I/O translation tables, it's
>>> still slower than our existing code and interestingly strict mode seems
>>> to be better than lazy here. One thing I haven't done yet is implement
>>> the map_pages() operation or adding larger page sizes.
>>
>> map_pages() speeds thiings up if there is contiguous memory, I'm not
>> sure what work load you are testing with so hard to guess if that is
>> interesting or not.
>
> Our most important driver is mlx5 with both IP and RDMA traffic on
> ConnectX-4/5/6 but we also support NVMes.
Since you already have the loop in s390_iommu_update_trans(), updating
to map/unmap_pages should be trivial, and well worth it.
>>> Maybe you have some tips what you'd expect to be most beneficial?
>>> Either way we're optimistic this can be solved and this conversion
>>> will be a high ranking item on my backlog going forward.
>>
>> I'm not really sure I understand the differences, do you have a sense
>> what is making it slower? Maybe there is some small feature that can
>> be added to the core code? It is very strange that strict is faster,
>> that should not be, strict requires synchronous flush in the unmap
>> cas, lazy does not. Are you sure you are getting the lazy flushes
>> enabled?
>
> The lazy flushes are the timer triggered flush_iotlb_all() in
> fq_flush_iotlb(), right? I definitely see that when tracing my
> flush_iotlb_all() implementation via that path. That flush_iotlb_all()
> in my prototype is basically the same as the global RPCIT we did once
> we wrapped around our IOVA address space. I suspect that this just
> happens much more often with the timer than our wrap around and
> flushing the entire aperture is somewhat slow because it causes the
> hypervisor to re-examine the entire I/O translation table. On the other
> hand in strict mode the iommu_iotlb_sync() call in __iommu_unmap()
> always flushes a relatively small contiguous range as I'm using the
> following construct to extend gather:
>
> if (iommu_iotlb_gather_is_disjoint(gather, iova, size))
> iommu_iotlb_sync(domain, gather);
>
> iommu_iotlb_gather_add_range(gather, iova, size);
>
> Maybe the smaller contiguous ranges just help with locality/caching
> because the flushed range in the guests I/O tables was just updated.
That's entirely believable - both the AMD and Intel drivers force strict
mode when virtualised for similar reasons, so feel free to do the same.
Robin.
next prev parent reply other threads:[~2022-05-20 15:51 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-09 23:35 s390-iommu.c default domain conversion Jason Gunthorpe
2022-05-10 15:25 ` Matthew Rosato
2022-05-10 16:09 ` Jason Gunthorpe
2022-05-20 13:05 ` Niklas Schnelle
2022-05-20 13:44 ` Jason Gunthorpe
2022-05-20 15:17 ` Niklas Schnelle
2022-05-20 15:51 ` Robin Murphy [this message]
2022-05-20 15:56 ` Jason Gunthorpe
2022-05-20 16:26 ` Niklas Schnelle
2022-05-20 16:43 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=db3ebceb-6a1b-1986-c774-45b425923467@arm.com \
--to=robin.murphy@arm.com \
--cc=agordeev@linux.ibm.com \
--cc=alex.williamson@redhat.com \
--cc=borntraeger@linux.ibm.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=farman@linux.ibm.com \
--cc=frankja@linux.ibm.com \
--cc=freude@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=jgg@nvidia.com \
--cc=linux-s390@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=oberpar@linux.ibm.com \
--cc=pasic@linux.ibm.com \
--cc=pmorel@linux.ibm.com \
--cc=schnelle@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=thuth@redhat.com \
--cc=vneethv@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox