From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>,
Pierre Morel <pmorel@linux.ibm.com>,
iommu@lists.linux.dev, linux-s390@vger.kernel.org,
borntraeger@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com,
gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com,
svens@linux.ibm.com, joro@8bytes.org, will@kernel.org,
robin.murphy@arm.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/3] iommu/s390: Fix duplicate domain attachments
Date: Tue, 27 Sep 2022 18:24:54 +0200 [thread overview]
Message-ID: <07d5527984912ef4c9174fad038ae951a79fd4dc.camel@linux.ibm.com> (raw)
In-Reply-To: <YzGuc7jVSvE2g91T@nvidia.com>
On Mon, 2022-09-26 at 10:51 -0300, Jason Gunthorpe wrote:
> On Mon, Sep 26, 2022 at 03:29:49PM +0200, Niklas Schnelle wrote:
> > I did miss a problem in my initial answer. While zpci_register_ioat()
> > is indeed the actual "attach" operation, it assumes that at that point
> > no DMA address translations are registered. In that state DMA is
> > blocked of course. With that zpci_register_ioat() needs to come after
> > the zpci_unregister_ioat() that is part of __s390_iommu_detach_device()
> > and zpci_dma_exit_device(). If we do call those though we fundamentally
> > need to restore the previous domain / DMA API state on any subsequent
> > failure. If we don't restore we would leave the device detached from
> > any domain with DMA blocked. I wonder if this could be an acceptable
> > failure state though? It's safe as no DMA is possible and we could get
> > out of it with a successful attach.
>
> Is this because of that FW call it does?
>
> It seems like an FW API misdesign to not allow an unfailable change of
> translation, if the FW forces an unregister first then there are
> always error cases you can't correctly recover from.
>
> IMHO if the FW fails an attach you are just busted, there is no reason
> to think it would suddenly accept attaching the old domain just
> because it has a different physical address, right?
While I can't come up with a case where an immediate retry would
actually help, there is at least one error case that one should be able
to recover from. The attach can fail if a firmware driven PCI device
recovery is in progress. Now if that happens during a switch between
domains I think one will have to do the equivalent of
echo 0 > /sys/bus/pci/slots/<dev>/power; echo 1 > /.../power
That of course tears down the whole PCI device so we don't have to
answer the question if the old or new domain is the active one.
So I think in the consequences you're still right, attempting to re-
attach is a lot of hassle for little or no gain.
>
> So make it simple, leave it DMA blocked and throw a WARN_ON..
To me a WARN_ON() isn't appropriate here, as stated above there is at
least one error case that is recoverable and doesn't necessarily
indicate a progamming bug. Also while not recoverable the device having
been surprise unplugged is another case where the attach failing is not
necessarily a bug. It would also thus cause false panics for e.g. CI
systems with PANIC_ON_WARN.
That said yes I think leaving DMA blocked and the device detached from
any domain is reasonable. That way the above recover scenario will work
and the device is in a defined and isolated state. Maybe in the future
we could even make this explicit and attach to the blocking domain on
failed attach, does that make sense?
next prev parent reply other threads:[~2022-09-27 16:25 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-22 9:52 [PATCH v2 0/3] iommu/s390: Fixes related to repeat attach_dev calls Niklas Schnelle
2022-09-22 9:52 ` [PATCH v2 1/3] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
2022-09-22 14:33 ` Jason Gunthorpe
2022-09-26 9:00 ` Niklas Schnelle
2022-09-26 13:46 ` Jason Gunthorpe
2022-09-27 16:33 ` Niklas Schnelle
2022-09-27 16:56 ` Jason Gunthorpe
2022-09-28 8:58 ` Niklas Schnelle
2022-09-28 13:32 ` Jason Gunthorpe
2022-09-29 7:47 ` Niklas Schnelle
2022-09-26 13:29 ` Niklas Schnelle
2022-09-26 13:51 ` Jason Gunthorpe
2022-09-27 16:24 ` Niklas Schnelle [this message]
2022-09-27 16:40 ` Jason Gunthorpe
2022-09-22 9:52 ` [PATCH v2 2/3] s390/pci: remove unused bus_next field from struct zpci_dev Niklas Schnelle
2022-09-26 9:17 ` Pierre Morel
2022-09-26 9:23 ` Pierre Morel
2022-09-26 13:41 ` Niklas Schnelle
2022-09-22 9:52 ` [PATCH v2 3/3] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=07d5527984912ef4c9174fad038ae951a79fd4dc.camel@linux.ibm.com \
--to=schnelle@linux.ibm.com \
--cc=agordeev@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=pmorel@linux.ibm.com \
--cc=robin.murphy@arm.com \
--cc=svens@linux.ibm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).