public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Rosato <mjrosato@linux.ibm.com>
To: Niklas Schnelle <schnelle@linux.ibm.com>,
	Pierre Morel <pmorel@linux.ibm.com>,
	iommu@lists.linux.dev
Cc: linux-s390@vger.kernel.org, borntraeger@linux.ibm.com,
	hca@linux.ibm.com, gor@linux.ibm.com,
	gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com,
	svens@linux.ibm.com, joro@8bytes.org, will@kernel.org,
	robin.murphy@arm.com, jgg@nvidia.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 1/2] iommu/s390: Fix race with release_device ops
Date: Thu, 1 Sep 2022 16:28:35 -0400	[thread overview]
Message-ID: <6be7b0ff-63d4-0352-a7de-e66a93411c2b@linux.ibm.com> (raw)
In-Reply-To: <52d3fe0b86bdc04fdbf3aae095b2f71f4ea12d44.camel@linux.ibm.com>

On 9/1/22 5:37 AM, Niklas Schnelle wrote:
> On Thu, 2022-09-01 at 09:56 +0200, Pierre Morel wrote:
>>
>> On 8/31/22 22:12, Matthew Rosato wrote:
>>> With commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
>>> calls") s390-iommu is supposed to handle dynamic switching between IOMMU
>>> domains and the DMA API handling.  However, this commit does not
>>> sufficiently handle the case where the device is released via a call
>>> to the release_device op as it may occur at the same time as an opposing
>>> attach_dev or detach_dev since the group mutex is not held over
>>> release_device.  This was observed if the device is deconfigured during a
>>> small window during vfio-pci initialization and can result in WARNs and
>>> potential kernel panics.
>>>
>>> Handle this by tracking when the device is probed/released via
>>> dev_iommu_priv_set/get().  Ensure that once the device is released only
>>> release_device handles the re-init of the device DMA.
>>>
>>> Fixes: fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev calls")
>>> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
>>> ---
>>>   arch/s390/include/asm/pci.h |  1 +
>>>   arch/s390/pci/pci.c         |  1 +
>>>   drivers/iommu/s390-iommu.c  | 39 ++++++++++++++++++++++++++++++++++---
>>>   3 files changed, 38 insertions(+), 3 deletions(-)
>>>
>>>
> ---8<---
>>>   
>>> @@ -206,10 +221,28 @@ static void s390_iommu_release_device(struct device *dev)
>>>
>>
> ---8<---
>>> +		/* Make sure this device is removed from the domain list */
>>>   		domain = iommu_get_domain_for_dev(dev);
>>>   		if (domain)
>>>   			s390_iommu_detach_device(domain, dev);
>>> +		/* Now ensure DMA is initialized from here */
>>> +		mutex_lock(&zdev->dma_domain_lock);
>>> +		if (zdev->s390_domain) {
>>> +			zdev->s390_domain = NULL;
>>> +			zpci_unregister_ioat(zdev, 0);
>>> +			zpci_dma_init_device(zdev);
>>
>> Sorry if it is a stupid question, but two things looks strange to me:
>>
>> - having DMA initialized just after having unregistered the IOAT
>> Is that really all we need to unregister before calling dma_init_device?

This is also how s390-iommu has been handling detach_dev (and still does)

>>
>> - having DMA initialized inside the release_device callback:
>> Why isn't it done in the device_probe ?
> 
> As I understand it iommu_release_device() which calls this code is only
> used when a device goes away. So, I think you're right in that it makes
> little sense to re-initialize DMA at this point, it's going to be torn
> down immediately after anyway. I do wonder if it would be an acceptably
> small change to just set zdev->s390_domain = NULL here and leave DMA
> uninitialized while making zpci_dma_exit_device() deal with that e.g.
> by doing nothing if zdev->dma_table is NULL but I'm not sure.

Right -- since it's a fix, I was trying to keep the changes minimal and this behavior (re-init DMA even on release_device) was existing, it was just always done within s390_iommu_detach_device before.

If you want, I could experiment with setting zdev->dma_table = NULL on the release path only (and checking it in zpci_dma_exit_device())

> 
> Either way I fear this mess really is just a symptom of our current
> design oddity of driving the same IOMMU hardware through both our DMA
> API implementation (arch/s390/pci_dma.c) and the IOMMU driver
> (driver/iommu/s390-iommu.c) and trying to hand off between them
> smoothly where common code instead just layers one atop the other when
> using an IOMMU at all.
> 
> I think the correct medium term solution is to use the common DMA API
> implementation (drivers/iommu/dma-iommu.c) like everyone else. But that
> isn't the minimal fix we need now. 

Agree

> 
> I do have a working prototype of using the common implementation but
> the big problem that I'm still searching a solution for is its
> performance with a virtualized IOMMU where IOTLB flushes (RPCIT on
> s390) are used for shadowing and are expensive and serialized. The
> optimization we used so far for unmap, only doing one global IOTLB
> flush once we run out of IOVA space, is just too much better in that
> scenario to just ignore. As one data point, on an NVMe I get about
> _twice_ the IOPS when using our existing scheme compared to strict
> mode. Which makes sense as IOTLB flushes are known as the bottleneck
> and optimizing unmap like that reduces them by almost half. Queued
> flushing is still much worse likely due to serialization of the
> shadowing, though again it works great on LPAR. To make sure it's not
> due to some bug in the IOMMU driver I even tried converting our
> existing DMA driver to layer on top of the IOMMU driver with the same
> result.
> 
> 


  parent reply	other threads:[~2022-09-01 20:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-31 20:12 [PATCH v4 0/2] iommu/s390: fixes related to repeat attach_dev calls Matthew Rosato
2022-08-31 20:12 ` [PATCH v4 1/2] iommu/s390: Fix race with release_device ops Matthew Rosato
2022-09-01  7:56   ` Pierre Morel
2022-09-01  9:37     ` Niklas Schnelle
2022-09-01 11:01       ` Robin Murphy
2022-09-01 13:42         ` Niklas Schnelle
2022-09-01 14:17           ` Niklas Schnelle
2022-09-01 14:29           ` Robin Murphy
2022-09-01 14:34             ` Jason Gunthorpe
2022-09-01 15:03               ` Robin Murphy
2022-09-01 15:49                 ` Jason Gunthorpe
2022-09-01 17:00                   ` Robin Murphy
2022-09-01 20:28       ` Matthew Rosato [this message]
2022-09-02  7:49         ` Niklas Schnelle
2022-09-01 10:25   ` Robin Murphy
2022-09-01 16:14     ` Matthew Rosato
2022-09-01 20:37       ` Jason Gunthorpe
2022-09-02 17:11         ` Matthew Rosato
2022-09-02 17:21           ` Jason Gunthorpe
2022-09-02 18:20             ` Matthew Rosato
2022-09-05  9:46             ` Robin Murphy
2022-09-06 13:36               ` Jason Gunthorpe
2022-09-02 10:48       ` Robin Murphy
2022-08-31 20:12 ` [PATCH v4 2/2] iommu/s390: fix leak of s390_domain_device Matthew Rosato

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6be7b0ff-63d4-0352-a7de-e66a93411c2b@linux.ibm.com \
    --to=mjrosato@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pmorel@linux.ibm.com \
    --cc=robin.murphy@arm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox