public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew Rosato <mjrosato@linux.ibm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
	iommu@lists.linux.dev,
	Alex Williamson <alex.williamson@redhat.com>,
	linux-s390@vger.kernel.org, schnelle@linux.ibm.com,
	pmorel@linux.ibm.com, borntraeger@linux.ibm.com,
	hca@linux.ibm.com, gor@linux.ibm.com,
	gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com,
	svens@linux.ibm.com, joro@8bytes.org, will@kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 1/2] iommu/s390: Fix race with release_device ops
Date: Fri, 2 Sep 2022 14:20:56 -0400	[thread overview]
Message-ID: <62637e13-7680-28ff-d395-10e6232fd3e6@linux.ibm.com> (raw)
In-Reply-To: <YxI7kzuchcJz8sRX@nvidia.com>

On 9/2/22 1:21 PM, Jason Gunthorpe wrote:
> On Fri, Sep 02, 2022 at 01:11:09PM -0400, Matthew Rosato wrote:
>> On 9/1/22 4:37 PM, Jason Gunthorpe wrote:
>>> On Thu, Sep 01, 2022 at 12:14:24PM -0400, Matthew Rosato wrote:
>>>> On 9/1/22 6:25 AM, Robin Murphy wrote:
>>>>> On 2022-08-31 21:12, Matthew Rosato wrote:
>>>>>> With commit fa7e9ecc5e1c ("iommu/s390: Tolerate repeat attach_dev
>>>>>> calls") s390-iommu is supposed to handle dynamic switching between IOMMU
>>>>>> domains and the DMA API handling.  However, this commit does not
>>>>>> sufficiently handle the case where the device is released via a call
>>>>>> to the release_device op as it may occur at the same time as an opposing
>>>>>> attach_dev or detach_dev since the group mutex is not held over
>>>>>> release_device.  This was observed if the device is deconfigured during a
>>>>>> small window during vfio-pci initialization and can result in WARNs and
>>>>>> potential kernel panics.
>>>>>
>>>>> Hmm, the more I think about it, something doesn't sit right about this whole situation... release_device is called via the notifier from device_del() after the device has been removed from its parent bus and largely dismantled; it should definitely not still have a driver bound by that point, so how is VFIO doing things that manage to race at all?
>>>>>
>>>>> Robin.
>>>>
>>>> So, I generally have seen the issue manifest as one of the calls
>>>> into the iommu core from __vfio_group_unset_container
>>>> (e.g. iommu_deatch_group via vfio_type1_iommu) failing with a WARN.
>>>> This happens when the vfio group fd is released, which could be
>>>> coming e.g. from a userspace ioctl VFIO_GROUP_UNSET_CONTAINER.
>>>> AFAICT there's nothing serializing the notion of calling into the
>>>> iommu core here against a device that is simultaneously going
>>>> through release_device (because we don't enter release_device with
>>>> the group mutex held), resulting in unpredictable behavior between
>>>> the dueling attach_dev/detach_dev and the release_device for
>>>> s390-iommu at least.
>>>
>>> Oh, this is a vfio bug.
>>
>> I've been running with your diff applied today on s390 and this
>> indeed fixes the issue by preventing the detach-after-release coming
>> out of vfio. 
> 
> Heh, I'm shocked it worked at all
> 
> I've been trying to understand Robin's latest remarks because maybe I
> don't really understand your situation right.
> 
> IMHO this is definately a VFIO bug, because in a single-device group
> we must not allow the domain to remain attached past remove(). Or more
> broadly we shouldn't be holding ownership of a group without also
> having a driver attached.> 
> But this dicussion with Robin about multi-device groups and hotplug
> makes me wonder what your situation is? There is certainly something
> interesting there too, and this can't be a solution to that problem.
> 

It's just a single-device group, that's all we do in s390 today.  Of course, as we move forward to consume s390-iommu for other use-cases (e.g. Niklas DMA conversion) then yeah I think we will still need something within s390-iommu to handle competing e.g. attach/detach and release_device calls.

For this particular issue, the scenario is triggered by deconfiguring that one host device (effectively, pull the plug) while the device is passed to QEMU via vfio-pci.  

>> Can you send as a patch for review?
> 
> After I wrote this I had a better idea, to avoid the completion and
> just fully orphan the group fd.
> 
> And the patch is kind of messy
> 
> Can you forward me the backtrace you hit also?

It manifests in a few different ways (depends on the timing of the iommu_detach_group vs release_device), but this is by far the most common:

[  402.718355] iommu driver failed to attach the default/blocking domain
[  402.718380] WARNING: CPU: 0 PID: 5082 at drivers/iommu/iommu.c:1961 iommu_detach_group+0x6c/0x80
[  402.718389] Modules linked in: macvtap macvlan tap vfio_pci vfio_pci_core irqbypass vfio_virqfd kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_ib sunrpc ib_uverbs ism smc uvdevice ib_core s390_trng eadm_sch tape_3590 tape tape_class vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 mlx5_core des_s390 libdes sha3_512_s390 nvme sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme_core zfcp scsi_transport_fc pkey zcrypt rng_core autofs4
[  402.718439] CPU: 0 PID: 5082 Comm: qemu-system-s39 Tainted: G        W          6.0.0-rc3 #5
[  402.718442] Hardware name: IBM 3931 A01 782 (LPAR)
[  402.718443] Krnl PSW : 0704c00180000000 000000095bb10d28 (iommu_detach_group+0x70/0x80)
[  402.718447]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  402.718450] Krnl GPRS: 0000000000000001 0000000900000027 0000000000000039 000000095c97ffe0
[  402.718453]            00000000fffeffff 00000009fc290000 00000000af1fda50 00000000af590b58
[  402.718455]            00000000af1fdaf0 0000000135c7a320 0000000135e52258 0000000135e52200
[  402.718457]            00000000a29e8000 00000000af590b40 000000095bb10d24 0000038004b13c98
[  402.718464] Krnl Code: 000000095bb10d18: c020003d56fc	larl	%r2,000000095c2bbb10
                          000000095bb10d1e: c0e50019d901	brasl	%r14,000000095be4bf20
                         #000000095bb10d24: af000000		mc	0,0
                         >000000095bb10d28: b904002a		lgr	%r2,%r10
                          000000095bb10d2c: ebaff0a00004	lmg	%r10,%r15,160(%r15)
                          000000095bb10d32: c0f4001aa867	brcl	15,000000095be65e00
                          000000095bb10d38: c004002168e0	brcl	0,000000095bf3def8
                          000000095bb10d3e: eb6ff0480024	stmg	%r6,%r15,72(%r15)
[  402.718532] Call Trace:
[  402.718534]  [<000000095bb10d28>] iommu_detach_group+0x70/0x80 
[  402.718538] ([<000000095bb10d24>] iommu_detach_group+0x6c/0x80)
[  402.718540]  [<000003ff80243b0e>] vfio_iommu_type1_detach_group+0x136/0x6c8 [vfio_iommu_type1] 
[  402.718546]  [<000003ff80137780>] __vfio_group_unset_container+0x58/0x158 [vfio] 
[  402.718552]  [<000003ff80138a16>] vfio_group_fops_unl_ioctl+0x1b6/0x210 [vfio] 
[  402.718557] pci 0004:00:00.0: Removing from iommu group 4
[  402.718555]  [<000000095b5b62e8>] __s390x_sys_ioctl+0xc0/0x100 
[  402.718562]  [<000000095be5d3b4>] __do_syscall+0x1d4/0x200 
[  402.718566]  [<000000095be6c072>] system_call+0x82/0xb0 
[  402.718570] Last Breaking-Event-Address:
[  402.718571]  [<000000095be4bf80>] __warn_printk+0x60/0x68


The WARN is hit because the attach fails due to the simultaneously in-flight release_device.

The subject patch was basically attempting to serialize the value that was being stomped over within s390-iommu (zdev->s390_domain) as a result of the competing iommu_detach_group + release_device.  By ensuring the vfio group is cleaned up before the release, I no longer see the competing threads, with release_device happening after the group is cleaned up. 

> 
> (Though I'm not sure I can get to this promptly, I have only 4 working
> days before LPC and still many things to do)
> 
> Thanks,
> Jason


  reply	other threads:[~2022-09-02 18:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-31 20:12 [PATCH v4 0/2] iommu/s390: fixes related to repeat attach_dev calls Matthew Rosato
2022-08-31 20:12 ` [PATCH v4 1/2] iommu/s390: Fix race with release_device ops Matthew Rosato
2022-09-01  7:56   ` Pierre Morel
2022-09-01  9:37     ` Niklas Schnelle
2022-09-01 11:01       ` Robin Murphy
2022-09-01 13:42         ` Niklas Schnelle
2022-09-01 14:17           ` Niklas Schnelle
2022-09-01 14:29           ` Robin Murphy
2022-09-01 14:34             ` Jason Gunthorpe
2022-09-01 15:03               ` Robin Murphy
2022-09-01 15:49                 ` Jason Gunthorpe
2022-09-01 17:00                   ` Robin Murphy
2022-09-01 20:28       ` Matthew Rosato
2022-09-02  7:49         ` Niklas Schnelle
2022-09-01 10:25   ` Robin Murphy
2022-09-01 16:14     ` Matthew Rosato
2022-09-01 20:37       ` Jason Gunthorpe
2022-09-02 17:11         ` Matthew Rosato
2022-09-02 17:21           ` Jason Gunthorpe
2022-09-02 18:20             ` Matthew Rosato [this message]
2022-09-05  9:46             ` Robin Murphy
2022-09-06 13:36               ` Jason Gunthorpe
2022-09-02 10:48       ` Robin Murphy
2022-08-31 20:12 ` [PATCH v4 2/2] iommu/s390: fix leak of s390_domain_device Matthew Rosato

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62637e13-7680-28ff-d395-10e6232fd3e6@linux.ibm.com \
    --to=mjrosato@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pmorel@linux.ibm.com \
    --cc=robin.murphy@arm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox