public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>,
	Pierre Morel <pmorel@linux.ibm.com>,
	iommu@lists.linux.dev, linux-s390@vger.kernel.org,
	borntraeger@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com,
	gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com,
	svens@linux.ibm.com, joro@8bytes.org, will@kernel.org,
	robin.murphy@arm.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments
Date: Thu, 06 Oct 2022 13:52:44 +0200	[thread overview]
Message-ID: <8c01ad419e91c0ce06bec8700d960c57f1a7c436.camel@linux.ibm.com> (raw)
In-Reply-To: <Yz1vF7B0FLvLVvE0@nvidia.com>

On Wed, 2022-10-05 at 08:48 -0300, Jason Gunthorpe wrote:
> On Wed, Oct 05, 2022 at 09:58:58AM +0200, Niklas Schnelle wrote:
> 
> > A failed aperture test leaving the IOAT registered would indeed be bad.
> > I guess I focused too much on the failure scenarios at the state after
> > these patches where this can't happen. I think this would leave us in a
> > bad state because zpci_register_ioat() succeeded with the domain's DMA
> > table but we won't have attached leading to the wrong decisions in
> > recovery paths (see below).
> 
> Domain attach should either completely move to the new domain and
> succeed, or it should leave everything as is and fail.
> 
> So it looks OK to me.
> 
> > Recovery (via zpci_hot_reset_device()) should then be able to deal with
> > these situations as long as zdev->dma_table matches the IOAT
> > registration state.
> 
> If you are doing reset the s390 driver should keep track of what
> domain is supposed to be attached and fix it when the reset is
> completed. In this case it should not fail attach here for the
> mandatory success domain types.

Our reset/recovery code won't do a detach/attach, it directly re-
establishes the DMA table that was previously in use with firmware. If
that fails the reset fails and one will have to "power cyle"* the
device.

Also automatic recovery is blocked while the IOMMU API is in use.
Though "echo 1 > /sys/bus/pci/../reset" is available and does re-
register the DMA table if the device was in an error state.

> 
> The core code does not reasonably handle failures from this routine,
> it must be avoided if you want it to be robust.
> 
> Jason

Makes sense. I see the following failure cases:

1. After patch 3 failure in the aperture check leaves
   everything as it is. Before that my proposal would
   leave it with DMA blocked and no domain attached
   so it will need to be "power cycled"*.

2. If zpci_register_ioat() fails the device is left detached
   from all domains. This however only happens in one of two cases:

   2a. The device was surprise unplugged. This seems fine as
       we tear things down and the calling code just needs to
       back off which from what I can see it does.
   2b. The device has entered an error state.

In case 2b the device is going to need recovery and will not be usable
until that succeeded (DMA and MMIO access is blocked). In automatic
recovery if zdev->dma_table == NULL the device will re-initialized for
use with the DMA API while if the IOMMU API is in use we currently
don't attempt recovery and the user needs to "power cycle"* the device
manually. The "re-initalized for DMA API" part of course doesn't work
for the upcoming DMA API conversion.

One option I see would be to ignore the error return from
zpci_register_ioat() if it indicates case 2b. Then we would still add
the device to the IOMMU's devices list and return success despite
knowing that the device is inaccessible (DMA and MMIO blocked).

Then the recovery/reset code will register the new domain once the
device comes out of the error state. At least from an IOMMU API point
of view that would make the attachment always succeed for all
zpci_register_ioat() error cases that aren't programming bugs and can
conceivably be recovered from.

If you agree I would propose adding this as a robustness improvement as
part of my upcoming series of IOMMU improvements needed for the DMA API
conversion. As stated above before the DMA API conversion any error
that would cause zpci_register_ioat() to fail while the IOMMU API is
being used will need a "power cycle" anyway so postponing this doesn't
hurt.

* I say "power cycle" but this isn't usually a real power cycle rather
an architecture specific low level disabled/enable but from Linux
driver point of view the device is completely unplugged and re-plugged.

Niklas


  reply	other threads:[~2022-10-06 11:53 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04 12:07 [PATCH v4 0/5] iommu/s390: Fixes related to attach and aperture handling Niklas Schnelle
2022-10-04 12:07 ` [PATCH v4 1/5] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
2022-10-04 12:43   ` Jason Gunthorpe
2022-10-04 16:18   ` Matthew Rosato
2022-10-05  7:58     ` Niklas Schnelle
2022-10-05 11:48       ` Jason Gunthorpe
2022-10-06 11:52         ` Niklas Schnelle [this message]
2022-10-06 12:02           ` Jason Gunthorpe
2022-10-06 13:01             ` Niklas Schnelle
2022-10-04 12:07 ` [PATCH v4 2/5] iommu/s390: Get rid of s390_domain_device Niklas Schnelle
2022-10-04 16:20   ` Matthew Rosato
2022-10-04 12:07 ` [PATCH v4 3/5] iommu/s390: Fix potential s390_domain aperture shrinking Niklas Schnelle
2022-10-04 21:12   ` Matthew Rosato
2022-10-04 12:07 ` [PATCH v4 4/5] iommu/s390: Fix incorrect aperture check Niklas Schnelle
2022-10-04 12:07 ` [PATCH v4 5/5] iommu/s390: Fix incorrect pgsize_bitmap Niklas Schnelle
2022-10-04 14:38   ` Matthew Rosato
2022-10-04 15:02   ` Robin Murphy
2022-10-04 15:12     ` Matthew Rosato
2022-10-04 15:31       ` Robin Murphy
2022-10-04 16:13         ` Niklas Schnelle
2022-10-05  9:53           ` Robin Murphy
2022-10-05 11:03             ` Niklas Schnelle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c01ad419e91c0ce06bec8700d960c57f1a7c436.camel@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pmorel@linux.ibm.com \
    --cc=robin.murphy@arm.com \
    --cc=svens@linux.ibm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox