public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Ordering between PCI config space writes and MMIO reads?
@ 2006-10-24 19:13 Roland Dreier
  2006-10-24 19:22 ` Jeff Garzik
                   ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Roland Dreier @ 2006-10-24 19:13 UTC (permalink / raw)
  To: linux-pci, linux-ia64; +Cc: linux-kernel, openib-general, John Partridge

John Partridge found an interesting bug involving mthca (Mellanox
InfiniBand HCA driver) on IA64/Altix systems.  Basically, during
initialization, mthca does:

    - do some config writes, including enabling BARs
    - then start a firmware command
      - read an MMIO register from a BAR (to check if FW is busy)

However, John found that the Altix PCI-X bridge was allowing the MMIO
read to start before the config write was done (which is allowed by
the PCI spec).  The PCI trace looked like:

23454:    Config Write     REG = 01 TYPE = 1    BE = 0000  Req = (0,0,0)  Tag = 1  Bus = 1 Device = 0 Function = 0     WAIT = 2
23462:    Memory Rd DW     A = 00280698  BE = 0000  Req = (0,0,0)  Tag = 0      WAIT = 2
23470:    Split compl.     Lower A = 00  Req = (0,0,0)  Tag = 0  Comp = (0,2,0)     WAIT = 1   (Error completion)
23476:    Split compl.     Lower A = 00  Req = (0,0,0)  Tag = 1  Comp = (0,2,0)     WAIT = 1   (Normal completion of WRITE)

and that "Error completion" leads to a crash.

John proposed the following patch to fix this, which looks good to
me.  However, I have a couple of questions about this situation:

 1) Is this something that should be fixed in the driver?  The PCI
    spec allows MMIO cycles to start before an earlier config cycle
    completed, but do we want to expose this fact to drivers?  Would
    it be better for ia64 to use some sort of barrier to make sure
    pci_write_config_xxx() is strongly ordered with MMIO?

 2) Is this issue lurking in other drivers?

Thanks,
  Roland

commit 424b50b6360b325ce642ece687756a600c25d28a
Author: John Partridge <johnip@sgi.com>
Date:   Tue Oct 24 11:54:16 2006 -0700

    IB/mthca: Make sure all PCI config writes reach device before doing MMIO
    
    During initialization, mthca writes some PCI config space registers
    and then does an MMIO read from one of the BARs it just enabled.  This
    MMIO read sometimes failed and caused a crash on SGI Altix machines,
    because the PCI-X host bridge (legitimately, according to the PCI
    spec) allowed the MMIO read to start before the config write completed.
    
    To fix this, add a config read after all config writes to make sure
    they are all done before starting the MMIO read.
    
    Signed-off-by: John Partridge <johnip@sgi.com>
    Signed-off-by: Roland Dreier <rolandd@cisco.com>

diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c b/drivers/infiniband/hw/mthca/mthca_reset.c
index 91934f2..578dc7c 100644
--- a/drivers/infiniband/hw/mthca/mthca_reset.c
+++ b/drivers/infiniband/hw/mthca/mthca_reset.c
@@ -281,6 +281,20 @@ good:
 		goto out;
 	}
 
+	/*
+	 * Perform a "flush" of the PCI config writes here by reading
+	 * the PCI_COMMAND register.  This is needed to make sure that
+	 * we don't try to touch other PCI BARs before the config
+	 * writes are done -- otherwise an MMIO cycle could start
+	 * before the config writes are done and reach the HCA before
+	 * the BAR is actually enabled.
+	 */
+	if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) {
+		err = -ENODEV;
+		mthca_err(mdev, "Couldn't access HCA memory after restoring, "
+			  "aborting.\n");
+	}
+
 out:
 	if (bridge)
 		pci_dev_put(bridge);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 19:13 Ordering between PCI config space writes and MMIO reads? Roland Dreier
@ 2006-10-24 19:22 ` Jeff Garzik
  2006-10-24 21:47   ` Matthew Wilcox
  2006-10-24 21:01 ` [openib-general] " JWM
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Jeff Garzik @ 2006-10-24 19:22 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-pci, linux-ia64, linux-kernel, openib-general,
	John Partridge

On Tue, Oct 24, 2006 at 12:13:19PM -0700, Roland Dreier wrote:
>  1) Is this something that should be fixed in the driver?  The PCI
>     spec allows MMIO cycles to start before an earlier config cycle
>     completed, but do we want to expose this fact to drivers?  Would
>     it be better for ia64 to use some sort of barrier to make sure
>     pci_write_config_xxx() is strongly ordered with MMIO?

The PCI config APIs have traditionally enforced very strong ordering.
Heck, the PCI config APIs often take a spinlock on each read or write;
so they are definitely not intended to be as fast as MMIO.

	Jeff




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [openib-general] Ordering between PCI config space writes and MMIO reads?
  2006-10-24 19:13 Ordering between PCI config space writes and MMIO reads? Roland Dreier
  2006-10-24 19:22 ` Jeff Garzik
@ 2006-10-24 21:01 ` JWM
  2006-10-24 21:24 ` Alan Cox
  2006-10-25  6:30 ` Grant Grundler
  3 siblings, 0 replies; 38+ messages in thread
From: JWM @ 2006-10-24 21:01 UTC (permalink / raw)
  To: linux-pci, linux-ia64, Roland Dreier; +Cc: linux-kernel, openib-general

    On IA64 there are two memory barriers mf and mf.a. To protect against 
the scenario below mf.a (slower of course) would be required.
    ....JW
----- Original Message ----- 
From: "Roland Dreier" <rdreier@cisco.com>
To: <linux-pci@atrey.karlin.mff.cuni.cz>; <linux-ia64@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>; <openib-general@openib.org>
Sent: Tuesday, October 24, 2006 2:13 PM
Subject: [openib-general] Ordering between PCI config space writes and MMIO 
reads?


> John Partridge found an interesting bug involving mthca (Mellanox
> InfiniBand HCA driver) on IA64/Altix systems.  Basically, during
> initialization, mthca does:
>
>    - do some config writes, including enabling BARs
>    - then start a firmware command
>      - read an MMIO register from a BAR (to check if FW is busy)
>
> However, John found that the Altix PCI-X bridge was allowing the MMIO
> read to start before the config write was done (which is allowed by
> the PCI spec).  The PCI trace looked like:
>
> 23454:    Config Write     REG = 01 TYPE = 1    BE = 0000  Req = (0,0,0) 
> Tag = 1  Bus = 1 Device = 0 Function = 0     WAIT = 2
> 23462:    Memory Rd DW     A = 00280698  BE = 0000  Req = (0,0,0)  Tag = 0 
> WAIT = 2
> 23470:    Split compl.     Lower A = 00  Req = (0,0,0)  Tag = 0  Comp = 
> (0,2,0)     WAIT = 1   (Error completion)
> 23476:    Split compl.     Lower A = 00  Req = (0,0,0)  Tag = 1  Comp = 
> (0,2,0)     WAIT = 1   (Normal completion of WRITE)
>
> and that "Error completion" leads to a crash.
>
> John proposed the following patch to fix this, which looks good to
> me.  However, I have a couple of questions about this situation:
>
> 1) Is this something that should be fixed in the driver?  The PCI
>    spec allows MMIO cycles to start before an earlier config cycle
>    completed, but do we want to expose this fact to drivers?  Would
>    it be better for ia64 to use some sort of barrier to make sure
>    pci_write_config_xxx() is strongly ordered with MMIO?
>
> 2) Is this issue lurking in other drivers?
>
> Thanks,
>  Roland
>
> commit 424b50b6360b325ce642ece687756a600c25d28a
> Author: John Partridge <johnip@sgi.com>
> Date:   Tue Oct 24 11:54:16 2006 -0700
>
>    IB/mthca: Make sure all PCI config writes reach device before doing 
> MMIO
>
>    During initialization, mthca writes some PCI config space registers
>    and then does an MMIO read from one of the BARs it just enabled.  This
>    MMIO read sometimes failed and caused a crash on SGI Altix machines,
>    because the PCI-X host bridge (legitimately, according to the PCI
>    spec) allowed the MMIO read to start before the config write completed.
>
>    To fix this, add a config read after all config writes to make sure
>    they are all done before starting the MMIO read.
>
>    Signed-off-by: John Partridge <johnip@sgi.com>
>    Signed-off-by: Roland Dreier <rolandd@cisco.com>
>
> diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c 
> b/drivers/infiniband/hw/mthca/mthca_reset.c
> index 91934f2..578dc7c 100644
> --- a/drivers/infiniband/hw/mthca/mthca_reset.c
> +++ b/drivers/infiniband/hw/mthca/mthca_reset.c
> @@ -281,6 +281,20 @@ good:
>  goto out;
>  }
>
> + /*
> + * Perform a "flush" of the PCI config writes here by reading
> + * the PCI_COMMAND register.  This is needed to make sure that
> + * we don't try to touch other PCI BARs before the config
> + * writes are done -- otherwise an MMIO cycle could start
> + * before the config writes are done and reach the HCA before
> + * the BAR is actually enabled.
> + */
> + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) {
> + err = -ENODEV;
> + mthca_err(mdev, "Couldn't access HCA memory after restoring, "
> +   "aborting.\n");
> + }
> +
> out:
>  if (bridge)
>  pci_dev_put(bridge);
>
> _______________________________________________
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
>
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 19:13 Ordering between PCI config space writes and MMIO reads? Roland Dreier
  2006-10-24 19:22 ` Jeff Garzik
  2006-10-24 21:01 ` [openib-general] " JWM
@ 2006-10-24 21:24 ` Alan Cox
  2006-10-24 21:29   ` Roland Dreier
  2006-10-25  6:30 ` Grant Grundler
  3 siblings, 1 reply; 38+ messages in thread
From: Alan Cox @ 2006-10-24 21:24 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-pci, linux-ia64, linux-kernel, openib-general,
	John Partridge

Ar Maw, 2006-10-24 am 12:13 -0700, ysgrifennodd Roland Dreier:
>  1) Is this something that should be fixed in the driver?  The PCI
>     spec allows MMIO cycles to start before an earlier config cycle
>     completed, but do we want to expose this fact to drivers?  Would
>     it be better for ia64 to use some sort of barrier to make sure
>     pci_write_config_xxx() is strongly ordered with MMIO?

It is good to be conservative in this area. Some AMD chipsets at least
had ordering problems with some configurations in the K7 era.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 21:24 ` Alan Cox
@ 2006-10-24 21:29   ` Roland Dreier
  2006-10-24 21:37     ` Jeff Garzik
  0 siblings, 1 reply; 38+ messages in thread
From: Roland Dreier @ 2006-10-24 21:29 UTC (permalink / raw)
  To: Alan Cox
  Cc: linux-pci, linux-ia64, linux-kernel, openib-general,
	John Partridge

 > It is good to be conservative in this area. Some AMD chipsets at least
 > had ordering problems with some configurations in the K7 era.

Could you expand a little?  Do you mean that the arch implementation
of pci_write_config_xxx() should have extra barriers, or that drivers
should do belt-and-suspenders flushes to make sure config writes are
really done properly?

 - R.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 21:29   ` Roland Dreier
@ 2006-10-24 21:37     ` Jeff Garzik
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff Garzik @ 2006-10-24 21:37 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Alan Cox, linux-pci, linux-ia64, linux-kernel, openib-general,
	John Partridge

On Tue, Oct 24, 2006 at 02:29:47PM -0700, Roland Dreier wrote:
>  > It is good to be conservative in this area. Some AMD chipsets at least
>  > had ordering problems with some configurations in the K7 era.
> 
> Could you expand a little?  Do you mean that the arch implementation
> of pci_write_config_xxx() should have extra barriers, or that drivers
> should do belt-and-suspenders flushes to make sure config writes are
> really done properly?

Drivers are -already- written to assume the pci_write_config_xxx() has
the requisite barriers.  The fix doesn't belong in the drivers.

	Jeff




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 19:22 ` Jeff Garzik
@ 2006-10-24 21:47   ` Matthew Wilcox
  2006-10-24 21:51     ` Roland Dreier
  0 siblings, 1 reply; 38+ messages in thread
From: Matthew Wilcox @ 2006-10-24 21:47 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Roland Dreier, linux-pci, linux-ia64, linux-kernel,
	openib-general, John Partridge

On Tue, Oct 24, 2006 at 03:22:10PM -0400, Jeff Garzik wrote:
> The PCI config APIs have traditionally enforced very strong ordering.
> Heck, the PCI config APIs often take a spinlock on each read or write;
> so they are definitely not intended to be as fast as MMIO.

s/often/always/.  It's implemented in drivers/pci/access.c.

I think the right way to fix this is to ensure mmio write ordering in
the pci_write_config_*() implementations.  Like this.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index ea16805..c80f1ba 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -1,6 +1,6 @@
 #include <linux/pci.h>
 #include <linux/module.h>
-#include <linux/ioport.h>
+#include <linux/io.h>
 
 #include "pci.h"
 
@@ -45,6 +45,7 @@ int pci_bus_write_config_##size \
 	if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER;	\
 	spin_lock_irqsave(&pci_lock, flags);				\
 	res = bus->ops->write(bus, devfn, pos, len, value);		\
+	mmiowb();							\
 	spin_unlock_irqrestore(&pci_lock, flags);			\
 	return res;							\
 }
@@ -102,6 +103,7 @@ int pci_user_write_config_##size					\
 	if (likely(!dev->block_ucfg_access))				\
 		ret = dev->bus->ops->write(dev->bus, dev->devfn,	\
 					pos, sizeof(type), val);	\
+	mmiowb();							\
 	spin_unlock_irqrestore(&pci_lock, flags);			\
 	return ret;							\
 }

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 21:47   ` Matthew Wilcox
@ 2006-10-24 21:51     ` Roland Dreier
  2006-10-24 22:12       ` John Partridge
  2006-10-24 22:36       ` Matthew Wilcox
  0 siblings, 2 replies; 38+ messages in thread
From: Roland Dreier @ 2006-10-24 21:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jeff Garzik, linux-pci, linux-ia64, linux-kernel, openib-general,
	John Partridge

 > I think the right way to fix this is to ensure mmio write ordering in
 > the pci_write_config_*() implementations.  Like this.

I'm happy to fix this in the PCI core and not force drivers to worry
about this.

John, can you confirm that this patch fixes the issue for you?

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 21:51     ` Roland Dreier
@ 2006-10-24 22:12       ` John Partridge
  2006-10-24 22:36       ` Matthew Wilcox
  1 sibling, 0 replies; 38+ messages in thread
From: John Partridge @ 2006-10-24 22:12 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Matthew Wilcox, Jeff Garzik, linux-pci, linux-ia64, linux-kernel,
	openib-general

Roland Dreier wrote:
>  > I think the right way to fix this is to ensure mmio write ordering in
>  > the pci_write_config_*() implementations.  Like this.
> 
> I'm happy to fix this in the PCI core and not force drivers to worry
> about this.
> 
> John, can you confirm that this patch fixes the issue for you?
> 
> Thanks,
>   Roland

I'll give it a try and get back to you.

John

-- 
John Partridge

Silicon Graphics Inc
Tel:  651-683-3428
Vnet: 233-3428
E-Mail: johnip@sgi.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 21:51     ` Roland Dreier
  2006-10-24 22:12       ` John Partridge
@ 2006-10-24 22:36       ` Matthew Wilcox
  2006-10-24 22:43         ` David Miller
                           ` (3 more replies)
  1 sibling, 4 replies; 38+ messages in thread
From: Matthew Wilcox @ 2006-10-24 22:36 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Jeff Garzik, linux-pci, linux-ia64, linux-kernel, openib-general,
	John Partridge

On Tue, Oct 24, 2006 at 02:51:30PM -0700, Roland Dreier wrote:
>  > I think the right way to fix this is to ensure mmio write ordering in
>  > the pci_write_config_*() implementations.  Like this.
> 
> I'm happy to fix this in the PCI core and not force drivers to worry
> about this.
> 
> John, can you confirm that this patch fixes the issue for you?

Hang on.  I wasn't thinking clearly.  mmiowb() only ensures the write
has got as far as the shub.  There's no way to fix this in the pci core
-- any PCI-PCI bridge can reorder the two.

This is only really a problem for setup (when we program the BARs), so
it seems silly to enforce an ordering at any other time.  Reluctantly, I
must disagree with Jeff -- drivers need to fix this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 22:36       ` Matthew Wilcox
@ 2006-10-24 22:43         ` David Miller
  2006-10-25 14:15           ` Roland Dreier
  2006-10-24 22:59         ` [openib-general] " Jason Gunthorpe
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2006-10-24 22:43 UTC (permalink / raw)
  To: matthew
  Cc: rdreier, jeff, linux-pci, linux-ia64, linux-kernel,
	openib-general, johnip

From: Matthew Wilcox <matthew@wil.cx>
Date: Tue, 24 Oct 2006 16:36:32 -0600

> This is only really a problem for setup (when we program the BARs), so
> it seems silly to enforce an ordering at any other time.  Reluctantly, I
> must disagree with Jeff -- drivers need to fix this.

One thing is that we definitely don't want to fix this by,
for example, reading back the PCI_COMMAND register or something
like that.  That causes two problems:

1) Some PCI config writes shut the device down and make it
   no respond to some kinds of PCI config transactions.
   One example is putting the device into D3 or similar
   power state, another is performing a device reset.

2) Several drivers use PCI config space accesses to touch the
   main registers in order to workaround bugs in the PCI-X
   implementation of their chip or similar (tg3 has a few
   cases like this), doing a PCI config space readback will
   kill performance quite a bit for an already slow situation.

In fact, I do recall that one of the x86 PCI config space access
implementations did a readback like this, and we had to remove
it because it caused problems when doing a reset on tg3 chips
when using PCI config space register write to do the reset.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [openib-general] Ordering between PCI config space writes and MMIO reads?
  2006-10-24 22:36       ` Matthew Wilcox
  2006-10-24 22:43         ` David Miller
@ 2006-10-24 22:59         ` Jason Gunthorpe
  2006-10-25 14:04           ` Roland Dreier
  2006-10-24 23:09         ` Michael S. Tsirkin
  2006-10-24 23:27         ` Jack Steiner
  3 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2006-10-24 22:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Roland Dreier, linux-ia64, Jeff Garzik, linux-kernel,
	openib-general, linux-pci

On Tue, Oct 24, 2006 at 04:36:32PM -0600, Matthew Wilcox wrote:
> On Tue, Oct 24, 2006 at 02:51:30PM -0700, Roland Dreier wrote:
> >  > I think the right way to fix this is to ensure mmio write ordering in
> >  > the pci_write_config_*() implementations.  Like this.
> > 
> > I'm happy to fix this in the PCI core and not force drivers to worry
> > about this.
> > 
> > John, can you confirm that this patch fixes the issue for you?
 
> Hang on.  I wasn't thinking clearly.  mmiowb() only ensures the write
> has got as far as the shub.  There's no way to fix this in the pci core

What about shifting the requirement down to the platform? Ie on ia64
it would seem that inb/outb already solve this problem via mf.a.

All platforms that support inb/outb correctly must have a
synchronizing primitive for outb..

> This is only really a problem for setup (when we program the BARs), so
> it seems silly to enforce an ordering at any other time.  Reluctantly, I
> must disagree with Jeff -- drivers need to fix this.

I'm not sure that can work either. The PCI-X spec is very clear, you
must wait for a non-posted completion if you care about order. Doing a
config read in the driver as a surrogate flush is not good enough in
the general case. Like you say, a pci bridge is free to reorder all
in flight non-posted operations.

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 22:36       ` Matthew Wilcox
  2006-10-24 22:43         ` David Miller
  2006-10-24 22:59         ` [openib-general] " Jason Gunthorpe
@ 2006-10-24 23:09         ` Michael S. Tsirkin
  2006-10-24 23:27         ` Jack Steiner
  3 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2006-10-24 23:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Roland Dreier, linux-ia64, Jeff Garzik, linux-kernel,
	openib-general, linux-pci

Quoting r. Matthew Wilcox <matthew@wil.cx>:
> Subject: Re: Ordering between PCI config space writes and MMIO reads?
> 
> On Tue, Oct 24, 2006 at 02:51:30PM -0700, Roland Dreier wrote:
> >  > I think the right way to fix this is to ensure mmio write ordering in
> >  > the pci_write_config_*() implementations.  Like this.
> > 
> > I'm happy to fix this in the PCI core and not force drivers to worry
> > about this.
> > 
> > John, can you confirm that this patch fixes the issue for you?
> 
> Hang on.  I wasn't thinking clearly.  mmiowb() only ensures the write
> has got as far as the shub.  There's no way to fix this in the pci core
> -- any PCI-PCI bridge can reorder the two.
> 
> This is only really a problem for setup (when we program the BARs), so
> it seems silly to enforce an ordering at any other time.  Reluctantly, I
> must disagree with Jeff -- drivers need to fix this.

This can be true for any bridge.
Most arches, however, simply block until config write completes - this is why
driver doesn't issue any MMIO writes - and this is what we are looking for here
- a way to block the CPU until split completion for config write arrives.

By the way, e.g. the PCI Express spec says:
"Read Requests and I/O or Configuration Write Requests are permitted to be
blocked by or to pass other Read Requests and I/O or Configuration Write Requests."
so it is not clear that doing a config read will always flush all config writes
as you want.

-- 
MST

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 22:36       ` Matthew Wilcox
                           ` (2 preceding siblings ...)
  2006-10-24 23:09         ` Michael S. Tsirkin
@ 2006-10-24 23:27         ` Jack Steiner
  2006-10-25 14:05           ` Roland Dreier
  2006-11-02  3:05           ` Jeremy Higdon
  3 siblings, 2 replies; 38+ messages in thread
From: Jack Steiner @ 2006-10-24 23:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Roland Dreier, Jeff Garzik, linux-pci, linux-ia64, linux-kernel,
	openib-general, John Partridge

On Tue, Oct 24, 2006 at 04:36:32PM -0600, Matthew Wilcox wrote:
> On Tue, Oct 24, 2006 at 02:51:30PM -0700, Roland Dreier wrote:
> >  > I think the right way to fix this is to ensure mmio write ordering in
> >  > the pci_write_config_*() implementations.  Like this.
> > 
> > I'm happy to fix this in the PCI core and not force drivers to worry
> > about this.
> > 
> > John, can you confirm that this patch fixes the issue for you?
> 
> Hang on.  I wasn't thinking clearly.  mmiowb() only ensures the write
> has got as far as the shub.  

I think mmiowb() should work on SN hardware. mmiowb() delays until shub
reports that all previously issued PIO writes have completed. 

The processor "mf.a" guarantees "platform acceptance" which on SN means
that shub has accepted the write - not that it has actually completed (or
even forwarded anywhere by shub). That makes "mf.a" more-or-less useless
on SN. However, shub has an additional MMR register (PIO_WRITE_COUNT) that
counts actual outstanding PIOs.  mmiob() delays until that count goes to
zero.

I'll check if there is any additional reordering that can occur AFTER the
PIO_WRITE_COUNT goes to zero.  If so, it would be at bus level - not in
shub or routers.


> There's no way to fix this in the pci core
> -- any PCI-PCI bridge can reorder the two.
> 
> This is only really a problem for setup (when we program the BARs), so
> it seems silly to enforce an ordering at any other time.  Reluctantly, I
> must disagree with Jeff -- drivers need to fix this.

-- jack

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 19:13 Ordering between PCI config space writes and MMIO reads? Roland Dreier
                   ` (2 preceding siblings ...)
  2006-10-24 21:24 ` Alan Cox
@ 2006-10-25  6:30 ` Grant Grundler
  2006-10-25 14:11   ` Roland Dreier
  2006-10-25 14:18   ` Matthew Wilcox
  3 siblings, 2 replies; 38+ messages in thread
From: Grant Grundler @ 2006-10-25  6:30 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-pci, linux-ia64, linux-kernel, openib-general,
	John Partridge

On Tue, Oct 24, 2006 at 12:13:19PM -0700, Roland Dreier wrote:
> John Partridge found an interesting bug involving mthca (Mellanox
> InfiniBand HCA driver) on IA64/Altix systems.  Basically, during
> initialization, mthca does:
> 
>     - do some config writes, including enabling BARs
>     - then start a firmware command
>       - read an MMIO register from a BAR (to check if FW is busy)
> 
> However, John found that the Altix PCI-X bridge was allowing the MMIO
> read to start before the config write was done (which is allowed by
> the PCI spec).

Can someone provide a quote of the PCI Local bus spec that allows this?
(Or at least a reference to a spec version and section number)

>   The PCI trace looked like:
> 
> 23454:    Config Write     REG = 01 TYPE = 1    BE = 0000  Req = (0,0,0)  Tag = 1  Bus = 1 Device = 0 Function = 0     WAIT = 2
> 23462:    Memory Rd DW     A = 00280698  BE = 0000  Req = (0,0,0)  Tag = 0      WAIT = 2
> 23470:    Split compl.     Lower A = 00  Req = (0,0,0)  Tag = 0  Comp = (0,2,0)     WAIT = 1   (Error completion)
> 23476:    Split compl.     Lower A = 00  Req = (0,0,0)  Tag = 1  Comp = (0,2,0)     WAIT = 1   (Normal completion of WRITE)
> 
> and that "Error completion" leads to a crash.
> 
> John proposed the following patch to fix this, which looks good to
> me.  However, I have a couple of questions about this situation:
> 
>  1) Is this something that should be fixed in the driver?  The PCI
>     spec allows MMIO cycles to start before an earlier config cycle
>     completed, but do we want to expose this fact to drivers?

I would prefer we did not.

>     Would
>     it be better for ia64 to use some sort of barrier to make sure
>     pci_write_config_xxx() is strongly ordered with MMIO?

That would be my preference.

>  2) Is this issue lurking in other drivers?
> 
> Thanks,
>   Roland
> 
> commit 424b50b6360b325ce642ece687756a600c25d28a
> Author: John Partridge <johnip@sgi.com>
> Date:   Tue Oct 24 11:54:16 2006 -0700
> 
>     IB/mthca: Make sure all PCI config writes reach device before doing MMIO
>     
>     During initialization, mthca writes some PCI config space registers
>     and then does an MMIO read from one of the BARs it just enabled.  This
>     MMIO read sometimes failed and caused a crash on SGI Altix machines,
>     because the PCI-X host bridge (legitimately, according to the PCI
>     spec) allowed the MMIO read to start before the config write completed.

Because of this past discussion with jesse barnes, I'm leary of
any kind of writes traveling through SN2 fabric. The issue is
described pretty well here:
	http://www.usenetlinux.com/archive/topic.php/t-49141.html

I don't know that this is the same (or similar) problem.

>     To fix this, add a config read after all config writes to make sure
>     they are all done before starting the MMIO read.
>     
>     Signed-off-by: John Partridge <johnip@sgi.com>
>     Signed-off-by: Roland Dreier <rolandd@cisco.com>
> 
> diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c b/drivers/infiniband/hw/mthca/mthca_reset.c
> index 91934f2..578dc7c 100644
> --- a/drivers/infiniband/hw/mthca/mthca_reset.c
> +++ b/drivers/infiniband/hw/mthca/mthca_reset.c
> @@ -281,6 +281,20 @@ good:
>  		goto out;
>  	}
>  
> +	/*
> +	 * Perform a "flush" of the PCI config writes here by reading
> +	 * the PCI_COMMAND register.  This is needed to make sure that
> +	 * we don't try to touch other PCI BARs before the config
> +	 * writes are done -- otherwise an MMIO cycle could start
> +	 * before the config writes are done and reach the HCA before
> +	 * the BAR is actually enabled.
> +	 */

If this code is accepted, the comment should provide a specific reference
(PCI Version + section) to the PCI spec that allows the out-of-order.

I agree with jgarzik that the drivers already expect config cycles
to be ordered with respect to MMIO cycles.

I'm looking at arch/ia64/pci/pci.c.
Wouldn't it be reasonable to include memory barriers around calls
to SAL config space access functions?

thanks,
grant

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [openib-general] Ordering between PCI config space writes and MMIO reads?
  2006-10-24 22:59         ` [openib-general] " Jason Gunthorpe
@ 2006-10-25 14:04           ` Roland Dreier
  0 siblings, 0 replies; 38+ messages in thread
From: Roland Dreier @ 2006-10-25 14:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Wilcox, linux-ia64, Jeff Garzik, linux-kernel,
	openib-general, linux-pci

 > I'm not sure that can work either. The PCI-X spec is very clear, you
 > must wait for a non-posted completion if you care about order. Doing a
 > config read in the driver as a surrogate flush is not good enough in
 > the general case. Like you say, a pci bridge is free to reorder all
 > in flight non-posted operations.

No, hang on.  Nothing can reorder a dependent read to start after a
write that it depends on, can it?  So a config read of PCI_COMMAND
can't start until the completion of a config write of the same
register, right?

 - R.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 23:27         ` Jack Steiner
@ 2006-10-25 14:05           ` Roland Dreier
  2006-11-02  3:05           ` Jeremy Higdon
  1 sibling, 0 replies; 38+ messages in thread
From: Roland Dreier @ 2006-10-25 14:05 UTC (permalink / raw)
  To: Jack Steiner
  Cc: Matthew Wilcox, Jeff Garzik, linux-pci, linux-ia64, linux-kernel,
	openib-general, John Partridge

 > I'll check if there is any additional reordering that can occur AFTER the
 > PIO_WRITE_COUNT goes to zero.  If so, it would be at bus level - not in
 > shub or routers.

Unfortunately, at least in theory, the reordering can occur.  For
example a bridge on some card plugged into an SN slot is allowed to
reorder things too.

 - R.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-25  6:30 ` Grant Grundler
@ 2006-10-25 14:11   ` Roland Dreier
  2006-10-25 14:18   ` Matthew Wilcox
  1 sibling, 0 replies; 38+ messages in thread
From: Roland Dreier @ 2006-10-25 14:11 UTC (permalink / raw)
  To: Grant Grundler
  Cc: linux-pci, linux-ia64, linux-kernel, openib-general,
	John Partridge

 > I'm looking at arch/ia64/pci/pci.c.
 > Wouldn't it be reasonable to include memory barriers around calls
 > to SAL config space access functions?

It's reasonable, but is there a memory barrier strong enough to
guarantee that a config write has actually completed?

 - R.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 22:43         ` David Miller
@ 2006-10-25 14:15           ` Roland Dreier
  2006-10-31 19:02             ` Roland Dreier
  0 siblings, 1 reply; 38+ messages in thread
From: Roland Dreier @ 2006-10-25 14:15 UTC (permalink / raw)
  To: David Miller
  Cc: matthew, jeff, linux-pci, linux-ia64, linux-kernel,
	openib-general, johnip

 > One thing is that we definitely don't want to fix this by,
 > for example, reading back the PCI_COMMAND register or something
 > like that.  That causes two problems:
 > 
 > 1) Some PCI config writes shut the device down and make it
 >    no respond to some kinds of PCI config transactions.
 >    One example is putting the device into D3 or similar
 >    power state, another is performing a device reset.

Hmm... it seems there is no other guaranteed way to make sure a config
write has really completed except doing a config read.  And only the
driver knows what the config access it's doing means.  So the
conclusion we seem to be forced into is that drivers need to include
these reads in the cases where they are needed.

 - R.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-25  6:30 ` Grant Grundler
  2006-10-25 14:11   ` Roland Dreier
@ 2006-10-25 14:18   ` Matthew Wilcox
  2006-10-25 17:15     ` [openib-general] " Jason Gunthorpe
  2006-10-25 18:22     ` Michael S. Tsirkin
  1 sibling, 2 replies; 38+ messages in thread
From: Matthew Wilcox @ 2006-10-25 14:18 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Roland Dreier, linux-pci, linux-ia64, linux-kernel,
	openib-general, John Partridge

On Wed, Oct 25, 2006 at 12:30:22AM -0600, Grant Grundler wrote:
> Can someone provide a quote of the PCI Local bus spec that allows this?
> (Or at least a reference to a spec version and section number)

PCI-PCI bridges are allowed to do it.  If you look in table E-1 of PCI
2.3, or table 8-3 of PCI-X 2.0, you'll see that a Posted Memory Write
can pass a Delayed Write Request (or in PCI-X, a Memory Write can pass a
Split Write Request).

So mmiowb() will solve the problem for Altix, but leave everybody else
vulnerable.  I actually don't see a way of forcing the config write to
complete before a memory write -- everything is allowed to pass a config
write, even a config read.  I initially thought "But only a crack monkey
would implement a system where a config read could pass a config write",
but the spec explains that:

  In most PCI-X implementations, Split Requests are managed in separate
  buffers from Split Completions, so Split Requests naturally pass Split
  Completions. However, no deadlocks occur if Split Completions block
  Split Requests.

So all this code that checks to see if a write had an effect is unsafe.
I'm a little perturbed by this.  It means the only way to reliably
distinguish between a write that hasn't taken effect yet and a bit (say,
MWI) the device hasn't implemented is to do a memory access to the
device.  Which is hard when you're trying to program the BARs.

I suppose this hasn't bitten us before in, what, 7 years of PCI-X, so
it can't be *that* common a thing for bridges to do.  And we would have
noticed the BAR sizing code going wrong (as it does config write
followed immediately by config read), so maybe implementations aren't as
crackful as the PCI spec seems to permit them to be.

I find it really hard to believe the PCI committee have done something
this stupid.  There must be another rule somewhere that I'm missing.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [openib-general] Ordering between PCI config space writes and MMIO reads?
  2006-10-25 14:18   ` Matthew Wilcox
@ 2006-10-25 17:15     ` Jason Gunthorpe
  2006-10-25 18:22     ` Michael S. Tsirkin
  1 sibling, 0 replies; 38+ messages in thread
From: Jason Gunthorpe @ 2006-10-25 17:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Grant Grundler, linux-ia64, Roland Dreier, linux-kernel,
	openib-general, linux-pci

On Wed, Oct 25, 2006 at 08:18:59AM -0600, Matthew Wilcox wrote:

> PCI-PCI bridges are allowed to do it.  If you look in table E-1 of PCI
> 2.3, or table 8-3 of PCI-X 2.0, you'll see that a Posted Memory Write
> can pass a Delayed Write Request (or in PCI-X, a Memory Write can pass a
> Split Write Request).

Carefull here.. Only MMIO writes are of the posted variety. Non posted
transactions (config write, IO write, all reads) are not ever allowed
to pass a posted write, but they can be re-ordered.

Table 8-3 shows that Split Write vs Split Read are all Y/N
meaning they could be re-ordered with respect to each other.

The problem with mthca is exactly this, although the operations were
issued in-order by the bridges the end device (mthca) is free to
complete them in any order, and it choose to complete the MMIO read
before the config write.

>   In most PCI-X implementations, Split Requests are managed in separate
>   buffers from Split Completions, so Split Requests naturally pass Split
>   Completions. However, no deadlocks occur if Split Completions block
>   Split Requests.

Again, this is only for posted writes. All these rules are designed to
prevent the bus from deadlocking due to buffer starvation under
certain situations. [Basically split completions and posted writes are
given a seperate queue that can advance if the request queue is
stalled. Otherwise you can deadlock a bridge]

> So all this code that checks to see if a write had an effect is unsafe.
> I'm a little perturbed by this.  It means the only way to reliably

MMIO based code that does this is correct and reliable.. PIO code that
does this is only safe if the platform is waiting for the PIO write
completion before starting the PIO read.

PCI-X (pg 80) says this about non-posted transaction ordering requirements:
  As in convention PCI, if a requester requires one non-posted
  transaction to complete before another, it must not initiate the
  second transaction until the first one compeltes.

IMHO, all sane hardware implementations of config ops and PIO should
block the host bridge until the completion is generated by the end
device..I'm sure that most x86 platforms do this. (For instance I've
observed this kind of behavior with a Hyper Transport probe on
Opterons)

This is more than just worrying about ordering, it is about how to
engage a platform specific way to know that the completion has been
generated. The person who suggested polling the PIO_OUTSTANDING
register on SN2 seems to have the right idea (if that counts all
pending non-posted operations, not just PIO ones) :|

The risk of re-ordering is probably not so much in the bridges since
that would be a fairly strange thing to do - but it is very likely in
end-devices. This is especially true if the accesses are to different
internal resources!

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-25 14:18   ` Matthew Wilcox
  2006-10-25 17:15     ` [openib-general] " Jason Gunthorpe
@ 2006-10-25 18:22     ` Michael S. Tsirkin
  1 sibling, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2006-10-25 18:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Grant Grundler, linux-ia64, Roland Dreier, linux-kernel,
	openib-general, linux-pci

Quoting r. Matthew Wilcox <matthew@wil.cx>:
> Subject: Re: Ordering between PCI config space writes and MMIO reads?
> 
> On Wed, Oct 25, 2006 at 12:30:22AM -0600, Grant Grundler wrote:
> > Can someone provide a quote of the PCI Local bus spec that allows this?
> > (Or at least a reference to a spec version and section number)
> 
> PCI-PCI bridges are allowed to do it.  If you look in table E-1 of PCI
> 2.3, or table 8-3 of PCI-X 2.0, you'll see that a Posted Memory Write
> can pass a Delayed Write Request (or in PCI-X, a Memory Write can pass a
> Split Write Request).
> 
> So mmiowb() will solve the problem for Altix, but leave everybody else
> vulnerable.  I actually don't see a way of forcing the config write to
> complete before a memory write -- everything is allowed to pass a config
> write, even a config read.  I initially thought "But only a crack monkey
> would implement a system where a config read could pass a config write",
> but the spec explains that:
> 
>   In most PCI-X implementations, Split Requests are managed in separate
>   buffers from Split Completions, so Split Requests naturally pass Split
>   Completions. However, no deadlocks occur if Split Completions block
>   Split Requests.
> 
> So all this code that checks to see if a write had an effect is unsafe.
> I'm a little perturbed by this.  It means the only way to reliably
> distinguish between a write that hasn't taken effect yet and a bit (say,
> MWI) the device hasn't implemented is to do a memory access to the
> device.  Which is hard when you're trying to program the BARs.
> 
> I suppose this hasn't bitten us before in, what, 7 years of PCI-X, so
> it can't be *that* common a thing for bridges to do.  And we would have
> noticed the BAR sizing code going wrong (as it does config write
> followed immediately by config read), so maybe implementations aren't as
> crackful as the PCI spec seems to permit them to be.
> 
> I find it really hard to believe the PCI committee have done something
> this stupid.  There must be another rule somewhere that I'm missing.

I think typically CPUs stall until a non-posted operation completes.
And since config writes are non posted,

pci_config_write_...
write ....

does not *start* the write until config write has completed.
So there's only a single outstanding config operation and that's why
there's never any re-ordering, without any need for flushes.

Your Altix system seems the weird one here in that CPU actually
treats config writes as posted and does not wait for their completion.
I wander whether you can do a bus lock or something and force
waiting till the completion.
This would be much cleaner than trying to fix all drivers.

-- 
MST

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-25 14:15           ` Roland Dreier
@ 2006-10-31 19:02             ` Roland Dreier
  2006-10-31 19:53               ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Roland Dreier @ 2006-10-31 19:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Miller, matthew, jeff, linux-pci, linux-ia64,
	openib-general, johnip

The discussion fizzled out without really reaching a definitive
answer, so I'm going to apply the original patch (below), since I
pretty much convinced myself that only the driver doing the config
access has enough information to fix this reliably.

 - R.

Author: John Partridge <johnip@sgi.com>
Date:   Tue Oct 31 11:00:04 2006 -0800

    IB/mthca: Make sure all PCI config writes reach device before doing MMIO
    
    During initialization, mthca writes some PCI config space registers
    and then does an MMIO read from one of the BARs it just enabled.  This
    MMIO read sometimes failed and caused a crash on SGI Altix machines,
    because the PCI-X host bridge (legitimately, according to the PCI
    spec) allowed the MMIO read to start before the config write completed.
    
    To fix this, add a config read after all config writes to make sure
    they are all done before starting the MMIO read.
    
    Signed-off-by: John Partridge <johnip@sgi.com>
    Signed-off-by: Roland Dreier <rolandd@cisco.com>

diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c b/drivers/infiniband/hw/mthca/mthca_reset.c
index 91934f2..578dc7c 100644
--- a/drivers/infiniband/hw/mthca/mthca_reset.c
+++ b/drivers/infiniband/hw/mthca/mthca_reset.c
@@ -281,6 +281,20 @@ good:
 		goto out;
 	}
 
+	/*
+	 * Perform a "flush" of the PCI config writes here by reading
+	 * the PCI_COMMAND register.  This is needed to make sure that
+	 * we don't try to touch other PCI BARs before the config
+	 * writes are done -- otherwise an MMIO cycle could start
+	 * before the config writes are done and reach the HCA before
+	 * the BAR is actually enabled.
+	 */
+	if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) {
+		err = -ENODEV;
+		mthca_err(mdev, "Couldn't access HCA memory after restoring, "
+			  "aborting.\n");
+	}
+
 out:
 	if (bridge)
 		pci_dev_put(bridge);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-31 19:53               ` Michael S. Tsirkin
@ 2006-10-31 19:53                 ` Roland Dreier
  2006-10-31 19:58                   ` Matthew Wilcox
  2006-10-31 20:28                   ` Michael S. Tsirkin
  2006-10-31 20:34                 ` Richard B. Johnson
  1 sibling, 2 replies; 38+ messages in thread
From: Roland Dreier @ 2006-10-31 19:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, linux-ia64, jeff, matthew, openib-general,
	linux-pci, David Miller

 > Here's what I don't understand: according to PCI rules, pci config read
 > can bypass pci config write (both are non-posted).
 > So why does doing it help flush the writes as the comment claims?

No, I don't believe a read of a config register can pass a write of
the same register.  (Someone correct me if I'm wrong)

 - R.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-31 19:02             ` Roland Dreier
@ 2006-10-31 19:53               ` Michael S. Tsirkin
  2006-10-31 19:53                 ` Roland Dreier
  2006-10-31 20:34                 ` Richard B. Johnson
  0 siblings, 2 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2006-10-31 19:53 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-kernel, linux-ia64, jeff, matthew, openib-general,
	linux-pci, David Miller

Quoting r. Roland Dreier <rdreier@cisco.com>:
> Subject: Re: Ordering between PCI config space writes and MMIO reads?
> 
> The discussion fizzled out without really reaching a definitive
> answer, so I'm going to apply the original patch (below), since I
> pretty much convinced myself that only the driver doing the config
> access has enough information to fix this reliably.
> 
>  - R.
> 
> Author: John Partridge <johnip@sgi.com>
> Date:   Tue Oct 31 11:00:04 2006 -0800
> 
>     IB/mthca: Make sure all PCI config writes reach device before doing MMIO
>     
>     During initialization, mthca writes some PCI config space registers
>     and then does an MMIO read from one of the BARs it just enabled.  This
>     MMIO read sometimes failed and caused a crash on SGI Altix machines,
>     because the PCI-X host bridge (legitimately, according to the PCI
>     spec) allowed the MMIO read to start before the config write completed.
>     
>     To fix this, add a config read after all config writes to make sure
>     they are all done before starting the MMIO read.
>     
>     Signed-off-by: John Partridge <johnip@sgi.com>
>     Signed-off-by: Roland Dreier <rolandd@cisco.com>
> 
> diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c b/drivers/infiniband/hw/mthca/mthca_reset.c
> index 91934f2..578dc7c 100644
> --- a/drivers/infiniband/hw/mthca/mthca_reset.c
> +++ b/drivers/infiniband/hw/mthca/mthca_reset.c
> @@ -281,6 +281,20 @@ good:
>  		goto out;
>  	}
>  
> +	/*
> +	 * Perform a "flush" of the PCI config writes here by reading
> +	 * the PCI_COMMAND register.  This is needed to make sure that
> +	 * we don't try to touch other PCI BARs before the config
> +	 * writes are done -- otherwise an MMIO cycle could start
> +	 * before the config writes are done and reach the HCA before
> +	 * the BAR is actually enabled.
> +	 */
> +	if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) {
> +		err = -ENODEV;
> +		mthca_err(mdev, "Couldn't access HCA memory after restoring, "
> +			  "aborting.\n");
> +	}
> +
>  out:
>  	if (bridge)
>  		pci_dev_put(bridge);

Here's what I don't understand: according to PCI rules, pci config read
can bypass pci config write (both are non-posted).
So why does doing it help flush the writes as the comment claims?

Isn't this more the case of
/* pci_config_write seems to complete asynchronously on Altix systems.
 * This is probably broken but its not clear what's the best
 * thing to do is - for now, do pci_read_config_dword which seems to flush
 * everything out. */

-- 
MST

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-31 19:53                 ` Roland Dreier
@ 2006-10-31 19:58                   ` Matthew Wilcox
  2006-10-31 20:28                   ` Michael S. Tsirkin
  1 sibling, 0 replies; 38+ messages in thread
From: Matthew Wilcox @ 2006-10-31 19:58 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Michael S. Tsirkin, linux-kernel, linux-ia64, jeff,
	openib-general, linux-pci, David Miller

On Tue, Oct 31, 2006 at 11:53:02AM -0800, Roland Dreier wrote:
>  > Here's what I don't understand: according to PCI rules, pci config read
>  > can bypass pci config write (both are non-posted).
>  > So why does doing it help flush the writes as the comment claims?
> 
> No, I don't believe a read of a config register can pass a write of
> the same register.  (Someone correct me if I'm wrong)

I don't see anything in the PCI spec which forbids it, but I would
expect that hardware designers don't actually do that in practice.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-31 19:53                 ` Roland Dreier
  2006-10-31 19:58                   ` Matthew Wilcox
@ 2006-10-31 20:28                   ` Michael S. Tsirkin
  1 sibling, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2006-10-31 20:28 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-kernel, linux-ia64, jeff, matthew, openib-general,
	linux-pci, David Miller

Quoting r. Roland Dreier <rdreier@cisco.com>:
> Subject: Re: Ordering between PCI config space writes and MMIO reads?
> 
>  > Here's what I don't understand: according to PCI rules, pci config read
>  > can bypass pci config write (both are non-posted).
>  > So why does doing it help flush the writes as the comment claims?
> 
> No, I don't believe a read of a config register can pass a write of
> the same register.  (Someone correct me if I'm wrong)

It can if PCI-X/PCI-Ex spec is anything to go by.

For example, see table 2-23, transaction ordering rules, in the PCI-express
spec: it is marked as "Y/N: there are no requirements.  The second transaction
may optionally pass the first transaction or be blocked by it."

In typical systems the OS should take care not to start a new
non-posted transaction before the previous one completed.
In particular, all intel and ppc systems I've seen simply block the CPU
unti the split completion arrives.

I find it hard to believe that Altix des not supply a way to check
that completion for config write transaction has arrived.


-- 
MST

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-31 19:53               ` Michael S. Tsirkin
  2006-10-31 19:53                 ` Roland Dreier
@ 2006-10-31 20:34                 ` Richard B. Johnson
  2006-10-31 20:47                   ` Matthew Wilcox
  2006-10-31 20:50                   ` Michael S. Tsirkin
  1 sibling, 2 replies; 38+ messages in thread
From: Richard B. Johnson @ 2006-10-31 20:34 UTC (permalink / raw)
  To: Michael S. Tsirkin, Roland Dreier
  Cc: linux-kernel, linux-ia64, jeff, matthew, openib-general,
	linux-pci, David Miller


----- Original Message ----- 
From: "Michael S. Tsirkin" <mst@mellanox.co.il>
To: "Roland Dreier" <rdreier@cisco.com>
Cc: <linux-kernel@vger.kernel.org>; <linux-ia64@vger.kernel.org>; 
<jeff@garzik.org>; <matthew@wil.cx>; <openib-general@openib.org>; 
<linux-pci@atrey.karlin.mff.cuni.cz>; "David Miller" <davem@davemloft.net>
Sent: Tuesday, October 31, 2006 2:53 PM
Subject: Re: Ordering between PCI config space writes and MMIO reads?


> Quoting r. Roland Dreier <rdreier@cisco.com>:
>> Subject: Re: Ordering between PCI config space writes and MMIO reads?
>>
>> The discussion fizzled out without really reaching a definitive
>> answer, so I'm going to apply the original patch (below), since I
>> pretty much convinced myself that only the driver doing the config
>> access has enough information to fix this reliably.
>>
>>  - R.
>>
>> Author: John Partridge <johnip@sgi.com>
>> Date:   Tue Oct 31 11:00:04 2006 -0800
>>
>>     IB/mthca: Make sure all PCI config writes reach device before doing 
>> MMIO
>>
>>     During initialization, mthca writes some PCI config space registers
>>     and then does an MMIO read from one of the BARs it just enabled. 
>> This
>>     MMIO read sometimes failed and caused a crash on SGI Altix machines,
>>     because the PCI-X host bridge (legitimately, according to the PCI
>>     spec) allowed the MMIO read to start before the config write 
>> completed.
>>
>>     To fix this, add a config read after all config writes to make sure
>>     they are all done before starting the MMIO read.
>>
>>     Signed-off-by: John Partridge <johnip@sgi.com>
>>     Signed-off-by: Roland Dreier <rolandd@cisco.com>
>>
>> diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c 
>> b/drivers/infiniband/hw/mthca/mthca_reset.c
>> index 91934f2..578dc7c 100644
>> --- a/drivers/infiniband/hw/mthca/mthca_reset.c
>> +++ b/drivers/infiniband/hw/mthca/mthca_reset.c
>> @@ -281,6 +281,20 @@ good:
>>  goto out;
>>  }
>>
>> + /*
>> + * Perform a "flush" of the PCI config writes here by reading
>> + * the PCI_COMMAND register.  This is needed to make sure that
>> + * we don't try to touch other PCI BARs before the config
>> + * writes are done -- otherwise an MMIO cycle could start
>> + * before the config writes are done and reach the HCA before
>> + * the BAR is actually enabled.
>> + */
>> + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) {
>> + err = -ENODEV;
>> + mthca_err(mdev, "Couldn't access HCA memory after restoring, "
>> +   "aborting.\n");
>> + }
>> +
>>  out:
>>  if (bridge)
>>  pci_dev_put(bridge);
>
> Here's what I don't understand: according to PCI rules, pci config read
> can bypass pci config write (both are non-posted).
> So why does doing it help flush the writes as the comment claims?
>
> Isn't this more the case of
> /* pci_config_write seems to complete asynchronously on Altix systems.
> * This is probably broken but its not clear what's the best
> * thing to do is - for now, do pci_read_config_dword which seems to flush
> * everything out. */
>

If you write to the PCI bus and then you read the result, the read __might__ 
be the
read that flushes any posted writes rather than the read of device registers 
that
would occur after the BARs were configured (hardware may be slower than
the CPU). So, it's best to do the required configuration cycles first, then 
after
all is done, read  something before you actually need to use data from 
subsequent
read/write cycles.

> -- 
> MST
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 (somewhere)
New Book: http://www.AbominableFirebug.com



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-31 20:34                 ` Richard B. Johnson
@ 2006-10-31 20:47                   ` Matthew Wilcox
  2006-10-31 22:30                     ` Roland Dreier
  2006-10-31 20:50                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 38+ messages in thread
From: Matthew Wilcox @ 2006-10-31 20:47 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Michael S. Tsirkin, Roland Dreier, linux-kernel, linux-ia64, jeff,
	openib-general, linux-pci, David Miller

On Tue, Oct 31, 2006 at 03:34:47PM -0500, Richard B. Johnson wrote:
> If you write to the PCI bus and then you read the result, the read 
> __might__ be the
> read that flushes any posted writes rather than the read of device 

Config space writes aren't posted, they're delayed.  So, for example,
you can do the config write on the primary bus, then it hits a bridge on
its way to the destination device.  The bridge is entitled (obviously,
it's unlikely to) drop it, and then the config read can pass by the
config write.

I'm beginning to think Michael Tsirkin has the only solution to this
-- architectures need to check that their hardware blocks until the
config write completion has occurred (and if not, simulate that it has
in software).


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-31 20:34                 ` Richard B. Johnson
  2006-10-31 20:47                   ` Matthew Wilcox
@ 2006-10-31 20:50                   ` Michael S. Tsirkin
  1 sibling, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2006-10-31 20:50 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Roland Dreier, linux-kernel, linux-ia64, jeff, matthew,
	openib-general, linux-pci, David Miller

Quoting r. Richard B. Johnson <jmodem@abominablefirebug.com>:
> Subject: Re: Ordering between PCI config space writes and MMIO reads?
> 
> 
> ----- Original Message ----- 
> From: "Michael S. Tsirkin" <mst@mellanox.co.il>
> To: "Roland Dreier" <rdreier@cisco.com>
> Cc: <linux-kernel@vger.kernel.org>; <linux-ia64@vger.kernel.org>; 
> <jeff@garzik.org>; <matthew@wil.cx>; <openib-general@openib.org>; 
> <linux-pci@atrey.karlin.mff.cuni.cz>; "David Miller" <davem@davemloft.net>
> Sent: Tuesday, October 31, 2006 2:53 PM
> Subject: Re: Ordering between PCI config space writes and MMIO reads?
> 
> 
> > Quoting r. Roland Dreier <rdreier@cisco.com>:
> >> Subject: Re: Ordering between PCI config space writes and MMIO reads?
> >>
> >> The discussion fizzled out without really reaching a definitive
> >> answer, so I'm going to apply the original patch (below), since I
> >> pretty much convinced myself that only the driver doing the config
> >> access has enough information to fix this reliably.
> >>
> >>  - R.
> >>
> >> Author: John Partridge <johnip@sgi.com>
> >> Date:   Tue Oct 31 11:00:04 2006 -0800
> >>
> >>     IB/mthca: Make sure all PCI config writes reach device before doing 
> >> MMIO
> >>
> >>     During initialization, mthca writes some PCI config space registers
> >>     and then does an MMIO read from one of the BARs it just enabled. 
> >> This
> >>     MMIO read sometimes failed and caused a crash on SGI Altix machines,
> >>     because the PCI-X host bridge (legitimately, according to the PCI
> >>     spec) allowed the MMIO read to start before the config write 
> >> completed.
> >>
> >>     To fix this, add a config read after all config writes to make sure
> >>     they are all done before starting the MMIO read.
> >>
> >>     Signed-off-by: John Partridge <johnip@sgi.com>
> >>     Signed-off-by: Roland Dreier <rolandd@cisco.com>
> >>
> >> diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c 
> >> b/drivers/infiniband/hw/mthca/mthca_reset.c
> >> index 91934f2..578dc7c 100644
> >> --- a/drivers/infiniband/hw/mthca/mthca_reset.c
> >> +++ b/drivers/infiniband/hw/mthca/mthca_reset.c
> >> @@ -281,6 +281,20 @@ good:
> >>  goto out;
> >>  }
> >>
> >> + /*
> >> + * Perform a "flush" of the PCI config writes here by reading
> >> + * the PCI_COMMAND register.  This is needed to make sure that
> >> + * we don't try to touch other PCI BARs before the config
> >> + * writes are done -- otherwise an MMIO cycle could start
> >> + * before the config writes are done and reach the HCA before
> >> + * the BAR is actually enabled.
> >> + */
> >> + if (pci_read_config_dword(mdev->pdev, PCI_COMMAND, hca_header)) {
> >> + err = -ENODEV;
> >> + mthca_err(mdev, "Couldn't access HCA memory after restoring, "
> >> +   "aborting.\n");
> >> + }
> >> +
> >>  out:
> >>  if (bridge)
> >>  pci_dev_put(bridge);
> >
> > Here's what I don't understand: according to PCI rules, pci config read
> > can bypass pci config write (both are non-posted).
> > So why does doing it help flush the writes as the comment claims?
> >
> > Isn't this more the case of
> > /* pci_config_write seems to complete asynchronously on Altix systems.
> > * This is probably broken but its not clear what's the best
> > * thing to do is - for now, do pci_read_config_dword which seems to flush
> > * everything out. */
> >
> 
> If you write to the PCI bus and then you read the result, the read __might__
> be the read that flushes any posted writes rather than the read of device
> registers that would occur after the BARs were configured (hardware may be
> slower than the CPU). So, it's best to do the required configuration cycles
> first, then after all is done, read  something before you actually need to use
> data from subsequent read/write cycles.

But why should it help? Accordig to the spec, read does not flush configuration
writes (unlike regular writes).

-- 
MST

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-31 20:47                   ` Matthew Wilcox
@ 2006-10-31 22:30                     ` Roland Dreier
  2006-11-01 16:27                       ` John Partridge
  0 siblings, 1 reply; 38+ messages in thread
From: Roland Dreier @ 2006-10-31 22:30 UTC (permalink / raw)
  To: Matthew Wilcox, John Partridge
  Cc: Richard B. Johnson, Michael S. Tsirkin, linux-kernel, linux-ia64,
	jeff, openib-general, linux-pci, David Miller

 > I'm beginning to think Michael Tsirkin has the only solution to this
 > -- architectures need to check that their hardware blocks until the
 > config write completion has occurred (and if not, simulate that it has
 > in software).

OK, I guess I'm convinced.  The vague language in the base PCI 3.0
spec about "dependencies" made me think that a read of a config
register had to wait until all previous writes to the same register
are done.  So I'll drop this patch for now.

John, you'll need to try and come up with a way to solve this in the
Altix implementation of pci_write_config_xxx().

 - R.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-31 22:30                     ` Roland Dreier
@ 2006-11-01 16:27                       ` John Partridge
  2006-11-01 16:46                         ` Matthew Wilcox
  2006-11-01 23:04                         ` David Miller
  0 siblings, 2 replies; 38+ messages in thread
From: John Partridge @ 2006-11-01 16:27 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Matthew Wilcox, Richard B. Johnson, Michael S. Tsirkin,
	linux-kernel, linux-ia64, jeff, openib-general, linux-pci,
	David Miller

Roland Dreier wrote:
>  > I'm beginning to think Michael Tsirkin has the only solution to this
>  > -- architectures need to check that their hardware blocks until the
>  > config write completion has occurred (and if not, simulate that it has
>  > in software).
> 
> OK, I guess I'm convinced.  The vague language in the base PCI 3.0
> spec about "dependencies" made me think that a read of a config
> register had to wait until all previous writes to the same register
> are done.  So I'll drop this patch for now.
> 
> John, you'll need to try and come up with a way to solve this in the
> Altix implementation of pci_write_config_xxx().
> 
>  - R.

Sorry, but I find this change a bit puzzling. The problem is particular to
the PPB on the HCA and not Altix. I can't see anywhere that a PCI Config Write
is required to block until completion, it is the driver and the HCA ,not the
Altix hardware that requires the Config Write to have completed before we
leave mthca_reset() Changing pci_write_config_xxx() will change the behavior
for ALL drivers and the possibility of breaking something else. The fix was
very low risk in mthca_reset(), changing the PCI code to fix this is much
more onerous.

I know you must feel like "piggy in the middle" with this, so I don't mean
to cause you any problems, but I guess I don't understand the reluctance for
the driver fix.

John

-- 
John Partridge

Silicon Graphics Inc
Tel:	651-683-3428
Vnet:	233-3428
E-Mail:	johnip@sgi.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-11-01 16:27                       ` John Partridge
@ 2006-11-01 16:46                         ` Matthew Wilcox
  2006-11-01 17:08                           ` John Partridge
  2006-11-01 23:04                         ` David Miller
  1 sibling, 1 reply; 38+ messages in thread
From: Matthew Wilcox @ 2006-11-01 16:46 UTC (permalink / raw)
  To: John Partridge
  Cc: Roland Dreier, Richard B. Johnson, Michael S. Tsirkin,
	linux-kernel, linux-ia64, jeff, openib-general, linux-pci,
	David Miller

On Wed, Nov 01, 2006 at 10:27:19AM -0600, John Partridge wrote:
> Sorry, but I find this change a bit puzzling. The problem is particular to
> the PPB on the HCA and not Altix.

That's not true; it's more likely on Altix, but it's not unique.  *any*
PCI-PCI bridge can reorder pci config reads and writes.  Apparently the
normal PCI host bridge implementation avoids this problem by blocking
until the completion comes back.  If you put a quad-port tulip card into
an Altix, you could experience the same problem (but it would be
massively unlikely.  You'd probably have to bring up three interfaces,
saturate them with traffic, then bring up the fourth to see it.  And
even then it would be rare).

> I can't see anywhere that a PCI Config 
> Write
> is required to block until completion, it is the driver and the HCA ,not the
> Altix hardware that requires the Config Write to have completed before we
> leave mthca_reset()

There's several places in the PCI midlayer that require the config write
to have completed before we do a config read.  The MWI code relies on
this to see if the device supports MWI.  If it gets out of order, we'll
think that the device doesn't support MWI when it thinks it's been told
to use MWI.  Data corruption could result.

> Changing pci_write_config_xxx() will change the behavior
> for ALL drivers and the possibility of breaking something else. The fix was
> very low risk in mthca_reset(), changing the PCI code to fix this is much
> more onerous.

I really don't think so.  At worst you'll be changing the timing.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-11-01 16:46                         ` Matthew Wilcox
@ 2006-11-01 17:08                           ` John Partridge
  2006-11-01 17:14                             ` Matthew Wilcox
  0 siblings, 1 reply; 38+ messages in thread
From: John Partridge @ 2006-11-01 17:08 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Roland Dreier, Richard B. Johnson, Michael S. Tsirkin,
	linux-kernel, linux-ia64, jeff, openib-general, linux-pci,
	David Miller

Matthew,

So, if I understand correctly, you are saying because we cannot guarantee
the "flush" a config write even by doing a config read of the same register
(because the PPB can re-order) we have to make sure we block or spin on the
config write completion at the lowest level of the config write ?

Thanks
John

Matthew Wilcox wrote:
> On Wed, Nov 01, 2006 at 10:27:19AM -0600, John Partridge wrote:
> 
>>Sorry, but I find this change a bit puzzling. The problem is particular to
>>the PPB on the HCA and not Altix.
> 
> 
> That's not true; it's more likely on Altix, but it's not unique.  *any*
> PCI-PCI bridge can reorder pci config reads and writes.  Apparently the
> normal PCI host bridge implementation avoids this problem by blocking
> until the completion comes back.  If you put a quad-port tulip card into
> an Altix, you could experience the same problem (but it would be
> massively unlikely.  You'd probably have to bring up three interfaces,
> saturate them with traffic, then bring up the fourth to see it.  And
> even then it would be rare).
> 
> 
>>I can't see anywhere that a PCI Config 
>>Write
>>is required to block until completion, it is the driver and the HCA ,not the
>>Altix hardware that requires the Config Write to have completed before we
>>leave mthca_reset()
> 
> 
> There's several places in the PCI midlayer that require the config write
> to have completed before we do a config read.  The MWI code relies on
> this to see if the device supports MWI.  If it gets out of order, we'll
> think that the device doesn't support MWI when it thinks it's been told
> to use MWI.  Data corruption could result.
> 
> 
>>Changing pci_write_config_xxx() will change the behavior
>>for ALL drivers and the possibility of breaking something else. The fix was
>>very low risk in mthca_reset(), changing the PCI code to fix this is much
>>more onerous.
> 
> 
> I really don't think so.  At worst you'll be changing the timing.


-- 
John Partridge

Silicon Graphics Inc
Tel:	651-683-3428
Vnet:	233-3428
E-Mail:	johnip@sgi.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-11-01 17:08                           ` John Partridge
@ 2006-11-01 17:14                             ` Matthew Wilcox
  0 siblings, 0 replies; 38+ messages in thread
From: Matthew Wilcox @ 2006-11-01 17:14 UTC (permalink / raw)
  To: John Partridge
  Cc: Roland Dreier, Richard B. Johnson, Michael S. Tsirkin,
	linux-kernel, linux-ia64, jeff, openib-general, linux-pci,
	David Miller

On Wed, Nov 01, 2006 at 11:08:08AM -0600, John Partridge wrote:
> So, if I understand correctly, you are saying because we cannot guarantee
> the "flush" a config write even by doing a config read of the same register
> (because the PPB can re-order) we have to make sure we block or spin on the
> config write completion at the lowest level of the config write ?

That's correct.  And I'm also saying that the reason this hasn't
been thought about before is that other root bridges have a mechanism
(implicit on x86, explicit on parisc) for waiting for the config write
completion to come back.

Seems to me that Altix uses the SAL calls to access PCI config space
these days, so you can hide it in your firmware rather than patching
Linux.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-11-01 16:27                       ` John Partridge
  2006-11-01 16:46                         ` Matthew Wilcox
@ 2006-11-01 23:04                         ` David Miller
  2006-11-02  1:08                           ` John Partridge
  1 sibling, 1 reply; 38+ messages in thread
From: David Miller @ 2006-11-01 23:04 UTC (permalink / raw)
  To: johnip
  Cc: rdreier, matthew, jmodem, mst, linux-kernel, linux-ia64, jeff,
	openib-general, linux-pci

From: John Partridge <johnip@sgi.com>
Date: Wed, 01 Nov 2006 10:27:19 -0600

> Sorry, but I find this change a bit puzzling. The problem is
> particular to the PPB on the HCA and not Altix. I can't see anywhere
> that a PCI Config Write is required to block until completion, it is
> the driver and the HCA ,not the Altix hardware that requires the
> Config Write to have completed before we leave mthca_reset()
> Changing pci_write_config_xxx() will change the behavior for ALL
> drivers and the possibility of breaking something else. The fix was
> very low risk in mthca_reset(), changing the PCI code to fix this is
> much more onerous.

The issue is that something as simple as:

	val = pci_read_config(REG);
	val |= bit;
	pci_write_config(REG, val);
	newval = pci_read_config(REG);
	BUG_ON(!(newval & bit));

is not guarenteed by PCI (aparently).

I see no valid reason why every PCI device driver should
be troubled with this lunacy and the ordering should thus
be ensured by the PCI layer.

It just so happens to take care of the original driver
issue too :-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-11-01 23:04                         ` David Miller
@ 2006-11-02  1:08                           ` John Partridge
  0 siblings, 0 replies; 38+ messages in thread
From: John Partridge @ 2006-11-02  1:08 UTC (permalink / raw)
  To: David Miller
  Cc: rdreier, matthew, jmodem, mst, linux-kernel, linux-ia64, jeff,
	openib-general, linux-pci

David Miller wrote:
> From: John Partridge <johnip@sgi.com>
> Date: Wed, 01 Nov 2006 10:27:19 -0600
> 
> 
>>Sorry, but I find this change a bit puzzling. The problem is
>>particular to the PPB on the HCA and not Altix. I can't see anywhere
>>that a PCI Config Write is required to block until completion, it is
>>the driver and the HCA ,not the Altix hardware that requires the
>>Config Write to have completed before we leave mthca_reset()
>>Changing pci_write_config_xxx() will change the behavior for ALL
>>drivers and the possibility of breaking something else. The fix was
>>very low risk in mthca_reset(), changing the PCI code to fix this is
>>much more onerous.
> 
> 
> The issue is that something as simple as:
> 
> 	val = pci_read_config(REG);
> 	val |= bit;
> 	pci_write_config(REG, val);
> 	newval = pci_read_config(REG);
> 	BUG_ON(!(newval & bit));
> 
> is not guarenteed by PCI (aparently).
> 
> I see no valid reason why every PCI device driver should
> be troubled with this lunacy and the ordering should thus
> be ensured by the PCI layer.
> 
> It just so happens to take care of the original driver
> issue too :-)

Yeah, Matthew has convinced me of that now.

Thanks


-- 
John Partridge

Silicon Graphics Inc
Tel:	651-683-3428
Vnet:	233-3428
E-Mail:	johnip@sgi.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Ordering between PCI config space writes and MMIO reads?
  2006-10-24 23:27         ` Jack Steiner
  2006-10-25 14:05           ` Roland Dreier
@ 2006-11-02  3:05           ` Jeremy Higdon
  1 sibling, 0 replies; 38+ messages in thread
From: Jeremy Higdon @ 2006-11-02  3:05 UTC (permalink / raw)
  To: Jack Steiner
  Cc: Matthew Wilcox, Roland Dreier, Jeff Garzik, linux-pci, linux-ia64,
	linux-kernel, openib-general, John Partridge

On Tue, Oct 24, 2006 at 06:27:55PM -0500, Jack Steiner wrote:
> On Tue, Oct 24, 2006 at 04:36:32PM -0600, Matthew Wilcox wrote:
> > On Tue, Oct 24, 2006 at 02:51:30PM -0700, Roland Dreier wrote:
> > >  > I think the right way to fix this is to ensure mmio write ordering in
> > >  > the pci_write_config_*() implementations.  Like this.
> > > 
> > > I'm happy to fix this in the PCI core and not force drivers to worry
> > > about this.
> > > 
> > > John, can you confirm that this patch fixes the issue for you?
> > 
> > Hang on.  I wasn't thinking clearly.  mmiowb() only ensures the write
> > has got as far as the shub.  
> 
> I think mmiowb() should work on SN hardware. mmiowb() delays until shub
> reports that all previously issued PIO writes have completed. 
> 
> The processor "mf.a" guarantees "platform acceptance" which on SN means
> that shub has accepted the write - not that it has actually completed (or
> even forwarded anywhere by shub). That makes "mf.a" more-or-less useless
> on SN. However, shub has an additional MMR register (PIO_WRITE_COUNT) that
> counts actual outstanding PIOs.  mmiob() delays until that count goes to
> zero.
> 
> I'll check if there is any additional reordering that can occur AFTER the
> PIO_WRITE_COUNT goes to zero.  If so, it would be at bus level - not in
> shub or routers.

As I understand it, the mmiowb on the shub waits only for the PIO write
to be accepted by the destination node (shub or tio) that the I/O device
is attached to, thus guaranteeing that no reordering will happen within
the NL.

If the PPB can reorder the write, then mmiowb is not sufficient.  You'd
have to do a readback from a chip register (assuming you can trust the
PPB not to reorder reads and writes), or some other work around I haven't
thought of.

jeremy

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2006-11-02  3:05 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-24 19:13 Ordering between PCI config space writes and MMIO reads? Roland Dreier
2006-10-24 19:22 ` Jeff Garzik
2006-10-24 21:47   ` Matthew Wilcox
2006-10-24 21:51     ` Roland Dreier
2006-10-24 22:12       ` John Partridge
2006-10-24 22:36       ` Matthew Wilcox
2006-10-24 22:43         ` David Miller
2006-10-25 14:15           ` Roland Dreier
2006-10-31 19:02             ` Roland Dreier
2006-10-31 19:53               ` Michael S. Tsirkin
2006-10-31 19:53                 ` Roland Dreier
2006-10-31 19:58                   ` Matthew Wilcox
2006-10-31 20:28                   ` Michael S. Tsirkin
2006-10-31 20:34                 ` Richard B. Johnson
2006-10-31 20:47                   ` Matthew Wilcox
2006-10-31 22:30                     ` Roland Dreier
2006-11-01 16:27                       ` John Partridge
2006-11-01 16:46                         ` Matthew Wilcox
2006-11-01 17:08                           ` John Partridge
2006-11-01 17:14                             ` Matthew Wilcox
2006-11-01 23:04                         ` David Miller
2006-11-02  1:08                           ` John Partridge
2006-10-31 20:50                   ` Michael S. Tsirkin
2006-10-24 22:59         ` [openib-general] " Jason Gunthorpe
2006-10-25 14:04           ` Roland Dreier
2006-10-24 23:09         ` Michael S. Tsirkin
2006-10-24 23:27         ` Jack Steiner
2006-10-25 14:05           ` Roland Dreier
2006-11-02  3:05           ` Jeremy Higdon
2006-10-24 21:01 ` [openib-general] " JWM
2006-10-24 21:24 ` Alan Cox
2006-10-24 21:29   ` Roland Dreier
2006-10-24 21:37     ` Jeff Garzik
2006-10-25  6:30 ` Grant Grundler
2006-10-25 14:11   ` Roland Dreier
2006-10-25 14:18   ` Matthew Wilcox
2006-10-25 17:15     ` [openib-general] " Jason Gunthorpe
2006-10-25 18:22     ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox