All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gavin Shan <gwshan@linux.vnet.ibm.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>,
	linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	bhelgaas@google.com, benh@au1.ibm.com, benh@kernel.crashing.org,
	mpe@ellerman.id.au, clsoto@us.ibm.com
Subject: Re: [PATCH v2 2/2] PCI: Don't disable PF's memory decoding when enabling SRIOV
Date: Tue, 25 Oct 2016 12:47:28 +1100	[thread overview]
Message-ID: <20161025014728.GA32507@gwshan> (raw)
In-Reply-To: <20161024140316.GA14177@localhost>

On Mon, Oct 24, 2016 at 09:03:16AM -0500, Bjorn Helgaas wrote:
>On Mon, Oct 24, 2016 at 10:28:02AM +1100, Gavin Shan wrote:
>> On Fri, Oct 21, 2016 at 11:50:34AM -0500, Bjorn Helgaas wrote:
>> >Hi Gavin,
>> >
>> >On Fri, Sep 30, 2016 at 09:47:50AM +1000, Gavin Shan wrote:
>> >> pci_update_resource() might be called to update (shift) IOV BARs
>> >> in PPC PowerNV specific pcibios_sriov_enable() when enabling PF's
>> >> SRIOV capability. At that point, the PF have been functional if
>> >> the SRIOV is enabled through sysfs entry "sriov_numvfs". The PF's
>> >> memory decoding (0x2 in PCI_COMMAND) shouldn't be disabled when
>> >> updating its IOV BARs with pci_update_resource(). Otherwise, we
>> >> receives EEH error caused by MMIO access to PF's memory BARs during
>> >> the window when PF's memory decoding is disabled.
>> >
>> >The fact that you get EEH errors is irrelevant.  We can't write code
>> >simply to avoid errors -- we have to write code to make the system
>> >work correctly.
>> >
>> >I do not want to add a "mmio_force_on" parameter to
>> >pci_update_resource().  That puts the burden on the caller to
>> >understand this subtle issue.  If the caller passes mmio_force_on=1
>> >when it shouldn't, things will almost always work, but once in a blue
>> >moon a half-updated BAR will conflict with some other device in the
>> >system, and we'll have an unreproducible, undebuggable crash.
>> >
>> 
>> Bjorn, thanks for your comments. Yes, the EEH error was caused by MMIO
>> access to PF's normal BARs, not VF BARs. Yeah, I also had the conern
>> that adding parameter to pci_update_resource() increases the visible
>> complexity to the caller of the function.
>> 
>> >What you do need is an explanation of why it's safe to non-atomically
>> >update a VF BARx in the SR-IOV capability.  I think this probably
>> >involves the VF MSE bit, and you probably have to either disable VFs
>> >completely or clear the MSE bit while you're updating the VF BARx.  We
>> >should be able to do this inside pci_update_resource() without
>> >changing the interface.
>> >
>> 
>> Yes, It's what PATCH[1/2] does: (PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE)
>> are set after VF BARs are updated with pci_update_resource() in this PPC
>> specific scenario. There are other two situations where the IOV BARs are
>> updated: PCI resource resizing and allocation during system booting or hot
>> adding PF. The VF shouldn't be enabled in both cases when updating IOV BARs.
>> 
>> I think you suggest to add check in pci_update_resource(): Don't disable
>> PF's memory decoding when updating VF BARs. I will send updated revision
>> if it's what you want.
>> 
>> 	/*
>> 	 * The PF might be functional when updating its IOV BARs. So PF's
>> 	 * memory decoding shouldn't be disabled when updating its IOV BARs.
>> 	 */
>> 	disable = (res->flags & IORESOURCE_MEM_64) && !dev->mmio_always_on;
>> #ifdef CONFIG_PCI_IOV
>> 	disable &= !(resno >= PCI_IOV_RESOURCES &&
>> 		     resno <= PCI_IOV_RESOURCE_END);
>> #endif
>> 	if (disable) {
>> 		pci_read_config_word(dev, PCI_COMMAND, &cmd);
>> 		pci_write_config_word(dev, PCI_COMMAND,
>> 				      cmd & ~PCI_COMMAND_MEMORY);
>> 	}
>
>Not exactly.  A 64-bit BAR cannot be updated atomically.  The whole
>point of this exercise is to make sure that when we update such a BAR,
>whether it is a normal PCI BAR or an SR-IOV BAR, the transient state
>does not conflict with anything else in the system.
>
>For example, consider two devices that do not conflict:
>
>  device A BAR 0: 0x00000000_20000000
>  device B BAR 0: 0x00000000_40000000
>
>We want to update A's BAR 0 to 0x00000001_40000000.  We can't do the
>update atomically, so we have this sequence:
>
>  before update:            device A BAR 0: 0x00000000_20000000
>  after writing lower half: device A BAR 0: 0x00000000_40000000
>  after writing upper half: device A BAR 0: 0x00000001_40000000
>
>If the driver for device B accesses B between the writes, both A and B
>may respond, which is a fatal error.
>

Thanks for the detailed explanation. Apart from pdev->mmio_always_on,
the normal BARs are updated with PCI_COMMAND_MEMORY cleared. With
PATCH[1/2], The IOV BARs are updated with PCI_SRIOV_CTRL_MSE and
PCI_SRIOV_CTRL_VFE cleared in the problematic path the patch tried
to address initially. However, I prefer your suggestion at end of
the reply.

>Probably the *best* thing would be to make pci_update_resource()
>return an error if it's asked to update a BAR that is enabled, but I
>haven't looked at all the callers to see whether that's practical.
>

It arguably enforces users to tackle the limitation: the memory
decoding (PCI_COMMAND_MEMORY or PCI_SRIOV_CTRL_VFE) should be
disabled before updating the BARs with pci_update_resource().
It means user cannot call APIs in relatively relaxed order
as before. For example, pci_enable_device() followed by
pci_update_resource(), which is allowed previously, won't
work.

We can hide the limitation inside pci_update_resource() because
nobody accesses the device's memory space that is being updated
by pci_update_resource(). 

>The current strategy in pci_update_resource() is to clear
>PCI_COMMAND_MEMORY when updating a 64-bit memory BAR.  This only
>applies to the regular PCI BARs 0-5.
>
>Extending that strategy to SR-IOV would mean clearing
>PCI_SRIOV_CTRL_MSE when updating a 64-bit VF BAR.  Obviously you
>wouldn't clear PCI_COMMAND_MEMORY in this case because
>PCI_COMMAND_MEMORY doesn't affect the VF BARs.
>

Yeah, it would be the solution to have. If you agree, I will post
updatd version according to this: Clearing PCI_SRIOV_CTRL_MSE when
updating IOV BARs. The bit won't be touched if pdev->mmio_always_on
is true.

Thanks,
Gavin

  reply	other threads:[~2016-10-25  1:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-29 23:47 [PATCH v2 1/2] pci: Call pcibios_sriov_enable() before IOV BARs are enabled Gavin Shan
2016-09-29 23:47 ` [PATCH v2 2/2] PCI: Don't disable PF's memory decoding when enabling SRIOV Gavin Shan
2016-10-21 16:50   ` Bjorn Helgaas
2016-10-23 23:28     ` Gavin Shan
2016-10-24 14:03       ` Bjorn Helgaas
2016-10-25  1:47         ` Gavin Shan [this message]
2016-10-25  3:51           ` Bjorn Helgaas
2016-10-26  1:02             ` Gavin Shan
2016-10-11  5:33 ` [PATCH v2 1/2] pci: Call pcibios_sriov_enable() before IOV BARs are enabled Gavin Shan
2016-10-21 20:23 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161025014728.GA32507@gwshan \
    --to=gwshan@linux.vnet.ibm.com \
    --cc=benh@au1.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=clsoto@us.ibm.com \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.