linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gwshan@linux.vnet.ibm.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>,
	linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	bhelgaas@google.com, benh@au1.ibm.com, benh@kernel.crashing.org,
	mpe@ellerman.id.au, clsoto@us.ibm.com
Subject: Re: [PATCH v2 2/2] PCI: Don't disable PF's memory decoding when enabling SRIOV
Date: Tue, 25 Oct 2016 12:47:28 +1100	[thread overview]
Message-ID: <20161025014728.GA32507@gwshan> (raw)
In-Reply-To: <20161024140316.GA14177@localhost>

On Mon, Oct 24, 2016 at 09:03:16AM -0500, Bjorn Helgaas wrote:
>On Mon, Oct 24, 2016 at 10:28:02AM +1100, Gavin Shan wrote:
>> On Fri, Oct 21, 2016 at 11:50:34AM -0500, Bjorn Helgaas wrote:
>> >Hi Gavin,
>> >
>> >On Fri, Sep 30, 2016 at 09:47:50AM +1000, Gavin Shan wrote:
>> >> pci_update_resource() might be called to update (shift) IOV BARs
>> >> in PPC PowerNV specific pcibios_sriov_enable() when enabling PF's
>> >> SRIOV capability. At that point, the PF have been functional if
>> >> the SRIOV is enabled through sysfs entry "sriov_numvfs". The PF's
>> >> memory decoding (0x2 in PCI_COMMAND) shouldn't be disabled when
>> >> updating its IOV BARs with pci_update_resource(). Otherwise, we
>> >> receives EEH error caused by MMIO access to PF's memory BARs during
>> >> the window when PF's memory decoding is disabled.
>> >
>> >The fact that you get EEH errors is irrelevant.  We can't write code
>> >simply to avoid errors -- we have to write code to make the system
>> >work correctly.
>> >
>> >I do not want to add a "mmio_force_on" parameter to
>> >pci_update_resource().  That puts the burden on the caller to
>> >understand this subtle issue.  If the caller passes mmio_force_on=1
>> >when it shouldn't, things will almost always work, but once in a blue
>> >moon a half-updated BAR will conflict with some other device in the
>> >system, and we'll have an unreproducible, undebuggable crash.
>> >
>> 
>> Bjorn, thanks for your comments. Yes, the EEH error was caused by MMIO
>> access to PF's normal BARs, not VF BARs. Yeah, I also had the conern
>> that adding parameter to pci_update_resource() increases the visible
>> complexity to the caller of the function.
>> 
>> >What you do need is an explanation of why it's safe to non-atomically
>> >update a VF BARx in the SR-IOV capability.  I think this probably
>> >involves the VF MSE bit, and you probably have to either disable VFs
>> >completely or clear the MSE bit while you're updating the VF BARx.  We
>> >should be able to do this inside pci_update_resource() without
>> >changing the interface.
>> >
>> 
>> Yes, It's what PATCH[1/2] does: (PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE)
>> are set after VF BARs are updated with pci_update_resource() in this PPC
>> specific scenario. There are other two situations where the IOV BARs are
>> updated: PCI resource resizing and allocation during system booting or hot
>> adding PF. The VF shouldn't be enabled in both cases when updating IOV BARs.
>> 
>> I think you suggest to add check in pci_update_resource(): Don't disable
>> PF's memory decoding when updating VF BARs. I will send updated revision
>> if it's what you want.
>> 
>> 	/*
>> 	 * The PF might be functional when updating its IOV BARs. So PF's
>> 	 * memory decoding shouldn't be disabled when updating its IOV BARs.
>> 	 */
>> 	disable = (res->flags & IORESOURCE_MEM_64) && !dev->mmio_always_on;
>> #ifdef CONFIG_PCI_IOV
>> 	disable &= !(resno >= PCI_IOV_RESOURCES &&
>> 		     resno <= PCI_IOV_RESOURCE_END);
>> #endif
>> 	if (disable) {
>> 		pci_read_config_word(dev, PCI_COMMAND, &cmd);
>> 		pci_write_config_word(dev, PCI_COMMAND,
>> 				      cmd & ~PCI_COMMAND_MEMORY);
>> 	}
>
>Not exactly.  A 64-bit BAR cannot be updated atomically.  The whole
>point of this exercise is to make sure that when we update such a BAR,
>whether it is a normal PCI BAR or an SR-IOV BAR, the transient state
>does not conflict with anything else in the system.
>
>For example, consider two devices that do not conflict:
>
>  device A BAR 0: 0x00000000_20000000
>  device B BAR 0: 0x00000000_40000000
>
>We want to update A's BAR 0 to 0x00000001_40000000.  We can't do the
>update atomically, so we have this sequence:
>
>  before update:            device A BAR 0: 0x00000000_20000000
>  after writing lower half: device A BAR 0: 0x00000000_40000000
>  after writing upper half: device A BAR 0: 0x00000001_40000000
>
>If the driver for device B accesses B between the writes, both A and B
>may respond, which is a fatal error.
>

Thanks for the detailed explanation. Apart from pdev->mmio_always_on,
the normal BARs are updated with PCI_COMMAND_MEMORY cleared. With
PATCH[1/2], The IOV BARs are updated with PCI_SRIOV_CTRL_MSE and
PCI_SRIOV_CTRL_VFE cleared in the problematic path the patch tried
to address initially. However, I prefer your suggestion at end of
the reply.

>Probably the *best* thing would be to make pci_update_resource()
>return an error if it's asked to update a BAR that is enabled, but I
>haven't looked at all the callers to see whether that's practical.
>

It arguably enforces users to tackle the limitation: the memory
decoding (PCI_COMMAND_MEMORY or PCI_SRIOV_CTRL_VFE) should be
disabled before updating the BARs with pci_update_resource().
It means user cannot call APIs in relatively relaxed order
as before. For example, pci_enable_device() followed by
pci_update_resource(), which is allowed previously, won't
work.

We can hide the limitation inside pci_update_resource() because
nobody accesses the device's memory space that is being updated
by pci_update_resource(). 

>The current strategy in pci_update_resource() is to clear
>PCI_COMMAND_MEMORY when updating a 64-bit memory BAR.  This only
>applies to the regular PCI BARs 0-5.
>
>Extending that strategy to SR-IOV would mean clearing
>PCI_SRIOV_CTRL_MSE when updating a 64-bit VF BAR.  Obviously you
>wouldn't clear PCI_COMMAND_MEMORY in this case because
>PCI_COMMAND_MEMORY doesn't affect the VF BARs.
>

Yeah, it would be the solution to have. If you agree, I will post
updatd version according to this: Clearing PCI_SRIOV_CTRL_MSE when
updating IOV BARs. The bit won't be touched if pdev->mmio_always_on
is true.

Thanks,
Gavin

  reply	other threads:[~2016-10-25  1:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-29 23:47 [PATCH v2 1/2] pci: Call pcibios_sriov_enable() before IOV BARs are enabled Gavin Shan
2016-09-29 23:47 ` [PATCH v2 2/2] PCI: Don't disable PF's memory decoding when enabling SRIOV Gavin Shan
2016-10-21 16:50   ` Bjorn Helgaas
2016-10-23 23:28     ` Gavin Shan
2016-10-24 14:03       ` Bjorn Helgaas
2016-10-25  1:47         ` Gavin Shan [this message]
2016-10-25  3:51           ` Bjorn Helgaas
2016-10-26  1:02             ` Gavin Shan
2016-10-11  5:33 ` [PATCH v2 1/2] pci: Call pcibios_sriov_enable() before IOV BARs are enabled Gavin Shan
2016-10-21 20:23 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161025014728.GA32507@gwshan \
    --to=gwshan@linux.vnet.ibm.com \
    --cc=benh@au1.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=clsoto@us.ibm.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).