From: Bjorn Helgaas <helgaas@kernel.org>
To: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
bhelgaas@google.com, benh@au1.ibm.com, benh@kernel.crashing.org,
mpe@ellerman.id.au, clsoto@us.ibm.com
Subject: Re: [PATCH v2 2/2] PCI: Don't disable PF's memory decoding when enabling SRIOV
Date: Mon, 24 Oct 2016 22:51:13 -0500 [thread overview]
Message-ID: <20161025035113.GA21423@localhost> (raw)
In-Reply-To: <20161025014728.GA32507@gwshan>
On Tue, Oct 25, 2016 at 12:47:28PM +1100, Gavin Shan wrote:
> On Mon, Oct 24, 2016 at 09:03:16AM -0500, Bjorn Helgaas wrote:
> >On Mon, Oct 24, 2016 at 10:28:02AM +1100, Gavin Shan wrote:
> >> On Fri, Oct 21, 2016 at 11:50:34AM -0500, Bjorn Helgaas wrote:
> >> >Hi Gavin,
> >> >
> >> >On Fri, Sep 30, 2016 at 09:47:50AM +1000, Gavin Shan wrote:
> >> >> pci_update_resource() might be called to update (shift) IOV BARs
> >> >> in PPC PowerNV specific pcibios_sriov_enable() when enabling PF's
> >> >> SRIOV capability. At that point, the PF have been functional if
> >> >> the SRIOV is enabled through sysfs entry "sriov_numvfs". The PF's
> >> >> memory decoding (0x2 in PCI_COMMAND) shouldn't be disabled when
> >> >> updating its IOV BARs with pci_update_resource(). Otherwise, we
> >> >> receives EEH error caused by MMIO access to PF's memory BARs during
> >> >> the window when PF's memory decoding is disabled.
> >> >
> >> >The fact that you get EEH errors is irrelevant. We can't write code
> >> >simply to avoid errors -- we have to write code to make the system
> >> >work correctly.
> >> >
> >> >I do not want to add a "mmio_force_on" parameter to
> >> >pci_update_resource(). That puts the burden on the caller to
> >> >understand this subtle issue. If the caller passes mmio_force_on=1
> >> >when it shouldn't, things will almost always work, but once in a blue
> >> >moon a half-updated BAR will conflict with some other device in the
> >> >system, and we'll have an unreproducible, undebuggable crash.
> >> >
> >>
> >> Bjorn, thanks for your comments. Yes, the EEH error was caused by MMIO
> >> access to PF's normal BARs, not VF BARs. Yeah, I also had the conern
> >> that adding parameter to pci_update_resource() increases the visible
> >> complexity to the caller of the function.
> >>
> >> >What you do need is an explanation of why it's safe to non-atomically
> >> >update a VF BARx in the SR-IOV capability. I think this probably
> >> >involves the VF MSE bit, and you probably have to either disable VFs
> >> >completely or clear the MSE bit while you're updating the VF BARx. We
> >> >should be able to do this inside pci_update_resource() without
> >> >changing the interface.
> >> >
> >>
> >> Yes, It's what PATCH[1/2] does: (PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE)
> >> are set after VF BARs are updated with pci_update_resource() in this PPC
> >> specific scenario. There are other two situations where the IOV BARs are
> >> updated: PCI resource resizing and allocation during system booting or hot
> >> adding PF. The VF shouldn't be enabled in both cases when updating IOV BARs.
> >>
> >> I think you suggest to add check in pci_update_resource(): Don't disable
> >> PF's memory decoding when updating VF BARs. I will send updated revision
> >> if it's what you want.
> >>
> >> /*
> >> * The PF might be functional when updating its IOV BARs. So PF's
> >> * memory decoding shouldn't be disabled when updating its IOV BARs.
> >> */
> >> disable = (res->flags & IORESOURCE_MEM_64) && !dev->mmio_always_on;
> >> #ifdef CONFIG_PCI_IOV
> >> disable &= !(resno >= PCI_IOV_RESOURCES &&
> >> resno <= PCI_IOV_RESOURCE_END);
> >> #endif
> >> if (disable) {
> >> pci_read_config_word(dev, PCI_COMMAND, &cmd);
> >> pci_write_config_word(dev, PCI_COMMAND,
> >> cmd & ~PCI_COMMAND_MEMORY);
> >> }
> >
> >Not exactly. A 64-bit BAR cannot be updated atomically. The whole
> >point of this exercise is to make sure that when we update such a BAR,
> >whether it is a normal PCI BAR or an SR-IOV BAR, the transient state
> >does not conflict with anything else in the system.
> >
> >For example, consider two devices that do not conflict:
> >
> > device A BAR 0: 0x00000000_20000000
> > device B BAR 0: 0x00000000_40000000
> >
> >We want to update A's BAR 0 to 0x00000001_40000000. We can't do the
> >update atomically, so we have this sequence:
> >
> > before update: device A BAR 0: 0x00000000_20000000
> > after writing lower half: device A BAR 0: 0x00000000_40000000
> > after writing upper half: device A BAR 0: 0x00000001_40000000
> >
> >If the driver for device B accesses B between the writes, both A and B
> >may respond, which is a fatal error.
> >
>
> Thanks for the detailed explanation. Apart from pdev->mmio_always_on,
> the normal BARs are updated with PCI_COMMAND_MEMORY cleared. With
> PATCH[1/2], The IOV BARs are updated with PCI_SRIOV_CTRL_MSE and
> PCI_SRIOV_CTRL_VFE cleared in the problematic path the patch tried
> to address initially. However, I prefer your suggestion at end of
> the reply.
>
> >Probably the *best* thing would be to make pci_update_resource()
> >return an error if it's asked to update a BAR that is enabled, but I
> >haven't looked at all the callers to see whether that's practical.
> >
>
> It arguably enforces users to tackle the limitation: the memory
> decoding (PCI_COMMAND_MEMORY or PCI_SRIOV_CTRL_VFE) should be
> disabled before updating the BARs with pci_update_resource().
> It means user cannot call APIs in relatively relaxed order
> as before. For example, pci_enable_device() followed by
> pci_update_resource(), which is allowed previously, won't
> work.
That specific case (pci_enable_device() followed by
pci_update_resource()) should *not* work. pci_enable_device() is
normally called by a driver's .probe() method, and after we call a
.probe() method, the PCI core shouldn't touch the device at all
because there's no means of mutual exclusion between the driver and
the PCI core.
I think pci_update_resource() should only be called in situations
where the caller already knows that nobody is using the device. For
regular PCI BARs, that doesn't necessarily mean PCI_COMMAND_MEMORY is
turned off, because firmware leaves PCI_COMMAND_MEMORY enabled for
many devices, even though nobody is using them.
Anyway, I think that's a project for another day. That's too much to
tackle for the limited problem you're trying to solve.
> We can hide the limitation inside pci_update_resource() because
> nobody accesses the device's memory space that is being updated
> by pci_update_resource().
>
> >The current strategy in pci_update_resource() is to clear
> >PCI_COMMAND_MEMORY when updating a 64-bit memory BAR. This only
> >applies to the regular PCI BARs 0-5.
> >
> >Extending that strategy to SR-IOV would mean clearing
> >PCI_SRIOV_CTRL_MSE when updating a 64-bit VF BAR. Obviously you
> >wouldn't clear PCI_COMMAND_MEMORY in this case because
> >PCI_COMMAND_MEMORY doesn't affect the VF BARs.
> >
>
> Yeah, it would be the solution to have. If you agree, I will post
> updatd version according to this: Clearing PCI_SRIOV_CTRL_MSE when
> updating IOV BARs. The bit won't be touched if pdev->mmio_always_on
> is true.
I think you should ignore pdev->mmio_always_on for IOV BARs.
mmio_always_on is basically a workaround for devices that either don't
follow the spec or where we didn't completely understand the problem.
I don't think there's any reason to set mmio_always_on for SR-IOV
devices.
Bjorn
next prev parent reply other threads:[~2016-10-25 3:51 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-29 23:47 [PATCH v2 1/2] pci: Call pcibios_sriov_enable() before IOV BARs are enabled Gavin Shan
2016-09-29 23:47 ` [PATCH v2 2/2] PCI: Don't disable PF's memory decoding when enabling SRIOV Gavin Shan
2016-10-21 16:50 ` Bjorn Helgaas
2016-10-23 23:28 ` Gavin Shan
2016-10-24 14:03 ` Bjorn Helgaas
2016-10-25 1:47 ` Gavin Shan
2016-10-25 3:51 ` Bjorn Helgaas [this message]
2016-10-26 1:02 ` Gavin Shan
2016-10-11 5:33 ` [PATCH v2 1/2] pci: Call pcibios_sriov_enable() before IOV BARs are enabled Gavin Shan
2016-10-21 20:23 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161025035113.GA21423@localhost \
--to=helgaas@kernel.org \
--cc=benh@au1.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=bhelgaas@google.com \
--cc=clsoto@us.ibm.com \
--cc=gwshan@linux.vnet.ibm.com \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.