public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] I/O space write barrier
@ 2004-09-23 18:48 Jesse Barnes
  2004-09-23 19:03 ` James Bottomley
  2004-09-27  0:45 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-23 18:48 UTC (permalink / raw)
  To: linux-arch

[-- Attachment #1: Type: text/plain, Size: 2483 bytes --]

[Posting again to linux-arch at Ralf's suggestion.]

On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to 
I/O space aren't ordered coming from different CPUs.  For the most part, this 
isn't a problem since drivers generally spinlock around code that does writeX 
calls, but if the last operation a driver does before it releases a lock is a 
write and some other CPU takes the lock and immediately does a write, it's 
possible the second CPU's write could arrive before the first's.

This patch adds a mmiowb() call to deal with this sort of situation, and 
modifies the qla1280 driver to use it.  The idea is to mirror the regular, 
cacheable memory barrier operation, wmb (arg!  I misnamed it in my patch, 
I'll fix that up in the next rev assuming it's ok).  Example:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

In this case, newval2 could be written to ring_ptr before newval.  Fixing it 
is easy though:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  mmiowb(); /* ensure no other writes beat us to the device */
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  mmiowb();
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

Note that this doesn't address a related case where the driver may want to 
actually make a given write get to the device before proceeding.  This should 
be dealt with by immediately reading a register from the card that has no 
side effects.  According to the PCI spec, that will guarantee that all writes 
have arrived before being sent to the target bus.  If no such register is 
available (in the case of card resets perhaps), reading from config space is 
sufficient (though it may return all ones if the card isn't responding to 
read cycles).

I'm also not quite sure if I got the ppc64 case right (and obviously I need to 
add the macro for other platforms).  The assumption is that *most* platforms 
don't have this issue, and those that do have a chipset register or CPU 
instruction to ensure ordering.  If that turns out not to be the case, we can 
add an address argument to the function.

Thanks,
Jesse

[-- Attachment #2: mmiowb.patch --]
[-- Type: text/x-diff, Size: 10164 bytes --]

===== Documentation/DocBook/deviceiobook.tmpl 1.4 vs edited =====
--- 1.4/Documentation/DocBook/deviceiobook.tmpl	2004-02-03 21:31:10 -08:00
+++ edited/Documentation/DocBook/deviceiobook.tmpl	2004-09-22 07:59:56 -07:00
@@ -147,8 +147,7 @@
 	compiler is not permitted to reorder the I/O sequence. When the 
 	ordering can be compiler optimised, you can use <function>
 	__readb</function> and friends to indicate the relaxed ordering. Use 
-	this with care. The <function>rmb</function> provides a read memory 
-	barrier. The <function>wmb</function> provides a write memory barrier.
+	this with care.
       </para>
 
       <para>
@@ -163,6 +162,15 @@
       </para>
 
       <para>
+	In addition to write posting, some large SMP systems (e.g. SGI
+	Challenge, Origin and Altix machines) won't strongly order writes
+	coming from different CPUs.  Thus it's important to properly
+	protect parts of your driver that do memory-mapped writes with
+	locks and use the <function>mmiob</function> to make sure they
+	arrive in the order intended.
+      </para>
+
+      <para>
 	PCI ordering rules also guarantee that PIO read responses arrive
 	after any outstanding DMA writes on that bus, since for some devices
 	the result of a <function>readb</function> call may signal to the
@@ -171,7 +179,9 @@
 	<function>readb</function> call has no relation to any previous DMA
 	writes performed by the device.  The driver can use
 	<function>readb_relaxed</function> for these cases, although only
-	some platforms will honor the relaxed semantics.
+	some platforms will honor the relaxed semantics.  Using the relaxed
+	read functions will provide significant performance benefits on
+	platforms that support it.
       </para>
     </sect1>
 
===== arch/ia64/sn/io/machvec/iomv.c 1.9 vs edited =====
--- 1.9/arch/ia64/sn/io/machvec/iomv.c	2004-05-26 06:49:19 -07:00
+++ edited/arch/ia64/sn/io/machvec/iomv.c	2004-09-22 08:33:35 -07:00
@@ -54,23 +54,19 @@
 EXPORT_SYMBOL(sn_io_addr);
 
 /**
- * sn_mmiob - I/O space memory barrier
+ * __sn_mmiowb - I/O space write barrier
  *
- * Acts as a memory mapped I/O barrier for platforms that queue writes to 
- * I/O space.  This ensures that subsequent writes to I/O space arrive after
- * all previous writes.  For most ia64 platforms, this is a simple
- * 'mf.a' instruction.  For other platforms, mmiob() may have to read
- * a chipset register to ensure ordering.
+ * See include/asm-ia64/io.h and Documentation/DocBook/deviceiobook.tmpl
+ * for details.
  *
  * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
  * See PV 871084 for details about the WAR about zero value.
  *
  */
-void
-sn_mmiob (void)
+void __sn_mmiowb(void)
 {
 	while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) != 
 				SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
 		cpu_relax();
 }
-EXPORT_SYMBOL(sn_mmiob);
+EXPORT_SYMBOL(__sn_mmiowb);
===== drivers/scsi/qla1280.c 1.65 vs edited =====
--- 1.65/drivers/scsi/qla1280.c	2004-07-28 20:59:10 -07:00
+++ edited/drivers/scsi/qla1280.c	2004-09-22 08:31:46 -07:00
@@ -1715,7 +1715,7 @@
 	reg = ha->iobase;
 	/* enable risc and host interrupts */
 	WRT_REG_WORD(&reg->ictrl, (ISP_EN_INT | ISP_EN_RISC));
-	RD_REG_WORD(&reg->ictrl);	/* PCI Posted Write flush */
+	mmiowb(); /* make sure this write arrives before any others */
 	ha->flags.ints_enabled = 1;
 }
 
@@ -1727,7 +1727,7 @@
 	reg = ha->iobase;
 	/* disable risc and host interrupts */
 	WRT_REG_WORD(&reg->ictrl, 0);
-	RD_REG_WORD(&reg->ictrl);	/* PCI Posted Write flush */
+	mmiowb(); /* make sure this write arrives before any others */
 	ha->flags.ints_enabled = 0;
 }
 
@@ -3398,7 +3398,7 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb(); /* make sure this write arrives before any others */
 
  out:
 	if (status)
===== include/asm-ia64/io.h 1.19 vs edited =====
--- 1.19/include/asm-ia64/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-ia64/io.h	2004-09-22 08:11:50 -07:00
@@ -91,6 +91,20 @@
  */
 #define __ia64_mf_a()	ia64_mfa()
 
+/**
+ * __ia64_mmiowb - I/O write barrier
+ *
+ * Ensure ordering of I/O space writes.  This will make sure that writes
+ * following the barrier will arrive after all previous writes.  For most
+ * ia64 platforms, this is a simple 'mf.a' instruction.
+ *
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
+static inline void __ia64_mmiowb(void)
+{
+	ia64_mfa();
+}
+
 static inline const unsigned long
 __ia64_get_io_port_base (void)
 {
@@ -267,6 +281,7 @@
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
+#define __mmiowb	platform_mmiowb
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -280,6 +295,7 @@
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
+#define mmiowb()	__mmiowb()
 
 /*
  * The address passed to these functions are ioremap()ped already.
===== include/asm-ia64/machvec.h 1.26 vs edited =====
--- 1.26/include/asm-ia64/machvec.h	2004-08-03 16:05:22 -07:00
+++ edited/include/asm-ia64/machvec.h	2004-09-22 08:12:33 -07:00
@@ -62,6 +62,7 @@
 typedef void ia64_mv_outb_t (unsigned char, unsigned long);
 typedef void ia64_mv_outw_t (unsigned short, unsigned long);
 typedef void ia64_mv_outl_t (unsigned int, unsigned long);
+typedef void ia64_mv_mmiowb_t (void);
 typedef unsigned char ia64_mv_readb_t (void *);
 typedef unsigned short ia64_mv_readw_t (void *);
 typedef unsigned int ia64_mv_readl_t (void *);
@@ -130,6 +131,7 @@
 #  define platform_outb		ia64_mv.outb
 #  define platform_outw		ia64_mv.outw
 #  define platform_outl		ia64_mv.outl
+#  define platform_mmiowb	ia64_mv.mmiowb
 #  define platform_readb        ia64_mv.readb
 #  define platform_readw        ia64_mv.readw
 #  define platform_readl        ia64_mv.readl
@@ -176,6 +178,7 @@
 	ia64_mv_outb_t *outb;
 	ia64_mv_outw_t *outw;
 	ia64_mv_outl_t *outl;
+	ia64_mv_mmiowb_t *mmiowb;
 	ia64_mv_readb_t *readb;
 	ia64_mv_readw_t *readw;
 	ia64_mv_readl_t *readl;
@@ -218,6 +221,7 @@
 	platform_outb,				\
 	platform_outw,				\
 	platform_outl,				\
+	platform_mmiowb,			\
 	platform_readb,				\
 	platform_readw,				\
 	platform_readl,				\
@@ -343,6 +347,9 @@
 #endif
 #ifndef platform_outl
 # define platform_outl		__ia64_outl
+#endif
+#ifndef platform_mmiowb
+# define platform_mmiowb	__ia64_mmiowb
 #endif
 #ifndef platform_readb
 # define platform_readb		__ia64_readb
===== include/asm-ia64/machvec_init.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/machvec_init.h	2004-02-06 00:30:24 -08:00
+++ edited/include/asm-ia64/machvec_init.h	2004-09-22 08:13:45 -07:00
@@ -12,6 +12,7 @@
 extern ia64_mv_outb_t __ia64_outb;
 extern ia64_mv_outw_t __ia64_outw;
 extern ia64_mv_outl_t __ia64_outl;
+extern ia64_mv_mmiowb_t __ia64_mmiowb;
 extern ia64_mv_readb_t __ia64_readb;
 extern ia64_mv_readw_t __ia64_readw;
 extern ia64_mv_readl_t __ia64_readl;
===== include/asm-ia64/machvec_sn2.h 1.14 vs edited =====
--- 1.14/include/asm-ia64/machvec_sn2.h	2004-07-10 17:14:00 -07:00
+++ edited/include/asm-ia64/machvec_sn2.h	2004-09-22 08:14:03 -07:00
@@ -49,6 +49,7 @@
 extern ia64_mv_outb_t __sn_outb;
 extern ia64_mv_outw_t __sn_outw;
 extern ia64_mv_outl_t __sn_outl;
+extern ia64_mv_mmiowb_t __sn_mmiowb;
 extern ia64_mv_readb_t __sn_readb;
 extern ia64_mv_readw_t __sn_readw;
 extern ia64_mv_readl_t __sn_readl;
@@ -92,6 +93,7 @@
 #define platform_outb			__sn_outb
 #define platform_outw			__sn_outw
 #define platform_outl			__sn_outl
+#define platform_mmiowb			__sn_mmiowb
 #define platform_readb			__sn_readb
 #define platform_readw			__sn_readw
 #define platform_readl			__sn_readl
===== include/asm-ia64/sn/io.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/sn/io.h	2004-02-13 07:00:22 -08:00
+++ edited/include/asm-ia64/sn/io.h	2004-09-22 08:32:52 -07:00
@@ -58,8 +58,8 @@
 #include <asm/sn/sn2/shubio.h>
 
 /*
- * Used to ensure write ordering (like mb(), but for I/O space)
+ * Used to ensure write ordering
  */
-extern void sn_mmiob(void);
+extern void __sn_mmiowb(void);
 
 #endif /* _ASM_IA64_SN_IO_H */
===== include/asm-ia64/sn/sn2/io.h 1.6 vs edited =====
--- 1.6/include/asm-ia64/sn/sn2/io.h	2004-07-22 17:00:00 -07:00
+++ edited/include/asm-ia64/sn/sn2/io.h	2004-09-22 08:15:12 -07:00
@@ -11,8 +11,10 @@
 #include <linux/compiler.h>
 #include <asm/intrinsics.h>
 
-extern void * sn_io_addr(unsigned long port) __attribute_const__; /* Forward definition */
-extern void sn_mmiob(void); /* Forward definition */
+/* Forward declarations */
+struct device;
+extern void *sn_io_addr(unsigned long port) __attribute_const__;
+extern void __sn_mmiowb(void);
 
 #define __sn_mf_a()   ia64_mfa()
 
@@ -91,7 +93,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -102,7 +104,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -113,7 +115,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
===== include/asm-mips/io.h 1.7 vs edited =====
--- 1.7/include/asm-mips/io.h	2004-02-19 12:53:02 -08:00
+++ edited/include/asm-mips/io.h	2004-09-22 08:24:53 -07:00
@@ -290,6 +290,9 @@
 #define __raw_writeb(b,addr)	((*(volatile unsigned char *)(addr)) = (b))
 #define __raw_writew(w,addr)	((*(volatile unsigned short *)(addr)) = (w))
 #define __raw_writel(l,addr)	((*(volatile unsigned int *)(addr)) = (l))
+
+#define mmiowb() asm volatile ("sync" ::: "memory")
+
 #ifdef CONFIG_MIPS32
 #define ____raw_writeq(val,addr)						\
 ({									\
===== include/asm-ppc64/io.h 1.21 vs edited =====
--- 1.21/include/asm-ppc64/io.h	2004-09-13 11:31:52 -07:00
+++ edited/include/asm-ppc64/io.h	2004-09-22 08:24:12 -07:00
@@ -152,6 +152,8 @@
 extern void _insl_ns(volatile u32 *port, void *buf, int nl);
 extern void _outsl_ns(volatile u32 *port, const void *buf, int nl);
 
+#define mmiowb() asm volatile ("eieio" ::: "memory")
+
 /*
  * output pause versions need a delay at least for the
  * w83c105 ide controller in a p610.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 18:48 [PATCH] I/O space write barrier Jesse Barnes
@ 2004-09-23 19:03 ` James Bottomley
  2004-09-23 19:07   ` Jesse Barnes
  2004-09-27  0:45 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 13+ messages in thread
From: James Bottomley @ 2004-09-23 19:03 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-arch

On Thu, 2004-09-23 at 14:48, Jesse Barnes wrote:
> -	RD_REG_WORD(&reg->ictrl);	/* PCI Posted Write flush */
> +	mmiowb(); /* make sure this write arrives before any others */

What's going on here?  In your docs you say this mmiowb() is explicitly
to prevent write ordering problems and doesn't affect posting.  However
in this patch you're using it to replace a posting flush read.

>  	ha->flags.ints_enabled = 1;

The result will be that we can get to here, but the write that enables
the interrupt may still be posted in a bridge somewhere.

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 19:03 ` James Bottomley
@ 2004-09-23 19:07   ` Jesse Barnes
  2004-09-23 19:27     ` James Bottomley
  2004-09-23 19:55     ` Jeremy Higdon
  0 siblings, 2 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-23 19:07 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-arch, jeremy

On Thursday, September 23, 2004 3:03 pm, James Bottomley wrote:
> On Thu, 2004-09-23 at 14:48, Jesse Barnes wrote:
> > - RD_REG_WORD(&reg->ictrl); /* PCI Posted Write flush */
> > + mmiowb(); /* make sure this write arrives before any others */
>
> What's going on here?  In your docs you say this mmiowb() is explicitly
> to prevent write ordering problems and doesn't affect posting.  However
> in this patch you're using it to replace a posting flush read.

Jeremy was the one that added these, I think the comment is misleading and 
only ordering was intended.  If that's not the case I'll drop this bit.

>
> >   ha->flags.ints_enabled = 1;
>
> The result will be that we can get to here, but the write that enables
> the interrupt may still be posted in a bridge somewhere.

If we're waiting for an interrupt here, I don't think it matters if we flush 
or order, we'll wait the same amount of time regardless.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 19:07   ` Jesse Barnes
@ 2004-09-23 19:27     ` James Bottomley
  2004-09-23 19:41       ` Jesse Barnes
                         ` (2 more replies)
  2004-09-23 19:55     ` Jeremy Higdon
  1 sibling, 3 replies; 13+ messages in thread
From: James Bottomley @ 2004-09-23 19:27 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-arch, jeremy

On Thu, 2004-09-23 at 15:07, Jesse Barnes wrote:
> If we're waiting for an interrupt here, I don't think it matters if we flush 
> or order, we'll wait the same amount of time regardless.

I don't think so.  Your ordering barrier doesn't cause a posted write
flush.  Posted writes have theoretically no upper limit defined in the
spec for the time they may remain posted, so in the former case, you are
guaranteed that by the time you set the flag and exit the function that
interrupts are enabled in the qla1280.  If you apply the patch you sent
in, this guarantee is broken and you don't really know how much longer
after exiting the function it will be before interrupts become enabled
(although in practice it's probably only of the order of ms).

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 19:27     ` James Bottomley
@ 2004-09-23 19:41       ` Jesse Barnes
  2004-09-23 19:57       ` Jeremy Higdon
  2004-09-23 22:22       ` Jeremy Higdon
  2 siblings, 0 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-23 19:41 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-arch, jeremy

On Thursday, September 23, 2004 3:27 pm, James Bottomley wrote:
> On Thu, 2004-09-23 at 15:07, Jesse Barnes wrote:
> > If we're waiting for an interrupt here, I don't think it matters if we
> > flush or order, we'll wait the same amount of time regardless.
>
> I don't think so.  Your ordering barrier doesn't cause a posted write
> flush.  Posted writes have theoretically no upper limit defined in the
> spec for the time they may remain posted, so in the former case, you are
> guaranteed that by the time you set the flag and exit the function that
> interrupts are enabled in the qla1280.

Ok, you're right we don't want to break that.  I just didn't think it was that 
important for the driver to think that interrupts were enabled even when they 
weren't since it would probably do another read sometime soon anyway, but 
that's sloppy, so I'll just drop this bit.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 19:07   ` Jesse Barnes
  2004-09-23 19:27     ` James Bottomley
@ 2004-09-23 19:55     ` Jeremy Higdon
  2004-09-23 20:09       ` Jesse Barnes
  1 sibling, 1 reply; 13+ messages in thread
From: Jeremy Higdon @ 2004-09-23 19:55 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: James Bottomley, linux-arch

On Thu, Sep 23, 2004 at 03:07:26PM -0400, Jesse Barnes wrote:
> On Thursday, September 23, 2004 3:03 pm, James Bottomley wrote:
> > On Thu, 2004-09-23 at 14:48, Jesse Barnes wrote:
> > > - RD_REG_WORD(&reg->ictrl); /* PCI Posted Write flush */
> > > + mmiowb(); /* make sure this write arrives before any others */
> >
> > What's going on here?  In your docs you say this mmiowb() is explicitly
> > to prevent write ordering problems and doesn't affect posting.  However
> > in this patch you're using it to replace a posting flush read.
> 
> Jeremy was the one that added these, I think the comment is misleading and 
> only ordering was intended.  If that's not the case I'll drop this bit.

No, I didn't add those.  In fact, I thought them to be unnecessary.  Jes
added them, I believe.  I added the reads after the writes of req_ring_index 
to mailbox4.  Since at the time we did not have a write ordering function,
I had to use the heavy weight flush.

Take a look at the 1.56 diffs in the scsi_misc-2.6 tree.

jeremy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 19:27     ` James Bottomley
  2004-09-23 19:41       ` Jesse Barnes
@ 2004-09-23 19:57       ` Jeremy Higdon
  2004-09-23 22:22       ` Jeremy Higdon
  2 siblings, 0 replies; 13+ messages in thread
From: Jeremy Higdon @ 2004-09-23 19:57 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jesse Barnes, linux-arch

On Thu, Sep 23, 2004 at 03:27:24PM -0400, James Bottomley wrote:
> On Thu, 2004-09-23 at 15:07, Jesse Barnes wrote:
> > If we're waiting for an interrupt here, I don't think it matters if we flush 
> > or order, we'll wait the same amount of time regardless.
> 
> I don't think so.  Your ordering barrier doesn't cause a posted write
> flush.  Posted writes have theoretically no upper limit defined in the
> spec for the time they may remain posted, so in the former case, you are
> guaranteed that by the time you set the flag and exit the function that
> interrupts are enabled in the qla1280.  If you apply the patch you sent
> in, this guarantee is broken and you don't really know how much longer
> after exiting the function it will be before interrupts become enabled
> (although in practice it's probably only of the order of ms).
> 
> James

I would say microseconds.  I think that knowing that the hardware has
interrupts enabled by the end of the function is unnecessary, because
they will be enabled before you can do anything else to the chip.
But it shouldn't hurt anything either, and defensive programming in
old SCSI drivers is often a good idea  :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 19:55     ` Jeremy Higdon
@ 2004-09-23 20:09       ` Jesse Barnes
  0 siblings, 0 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-23 20:09 UTC (permalink / raw)
  To: Jeremy Higdon; +Cc: James Bottomley, linux-arch

On Thursday, September 23, 2004 3:55 pm, Jeremy Higdon wrote:
> On Thu, Sep 23, 2004 at 03:07:26PM -0400, Jesse Barnes wrote:
> > On Thursday, September 23, 2004 3:03 pm, James Bottomley wrote:
> > > On Thu, 2004-09-23 at 14:48, Jesse Barnes wrote:
> > > > - RD_REG_WORD(&reg->ictrl); /* PCI Posted Write flush */
> > > > + mmiowb(); /* make sure this write arrives before any others */
> > >
> > > What's going on here?  In your docs you say this mmiowb() is explicitly
> > > to prevent write ordering problems and doesn't affect posting.  However
> > > in this patch you're using it to replace a posting flush read.
> >
> > Jeremy was the one that added these, I think the comment is misleading
> > and only ordering was intended.  If that's not the case I'll drop this
> > bit.
>
> No, I didn't add those.  In fact, I thought them to be unnecessary.  Jes
> added them, I believe.  I added the reads after the writes of
> req_ring_index to mailbox4.  Since at the time we did not have a write
> ordering function, I had to use the heavy weight flush.

Ok, sorry for the misattribution :).  At any rate, I've already dropped that 
bit.

Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 19:27     ` James Bottomley
  2004-09-23 19:41       ` Jesse Barnes
  2004-09-23 19:57       ` Jeremy Higdon
@ 2004-09-23 22:22       ` Jeremy Higdon
  2004-09-23 23:36         ` James Bottomley
  2 siblings, 1 reply; 13+ messages in thread
From: Jeremy Higdon @ 2004-09-23 22:22 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jesse Barnes, linux-arch

On Thu, Sep 23, 2004 at 03:27:24PM -0400, James Bottomley wrote:
> On Thu, 2004-09-23 at 15:07, Jesse Barnes wrote:
> > If we're waiting for an interrupt here, I don't think it matters if we flush 
> > or order, we'll wait the same amount of time regardless.
> 
> I don't think so.  Your ordering barrier doesn't cause a posted write
> flush.  Posted writes have theoretically no upper limit defined in the
> spec for the time they may remain posted, so in the former case, you are
> guaranteed that by the time you set the flag and exit the function that
> interrupts are enabled in the qla1280.  If you apply the patch you sent
> in, this guarantee is broken and you don't really know how much longer
> after exiting the function it will be before interrupts become enabled
> (although in practice it's probably only of the order of ms).
> 
> James

James, is this what you want?
I think these are the only writes that need ordering, as opposed to flushing.
Being not certain what was intended in the other cases, it's safer to leave
them be.


===== drivers/scsi/qla1280.c 1.65 vs edited =====
--- 1.65/drivers/scsi/qla1280.c	2004-07-28 20:59:10 -07:00
+++ edited/drivers/scsi/qla1280.c	2004-09-23 15:19:56 -07:00
@@ -3398,7 +3398,7 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb(); /* posted write ordering */
 
  out:
 	if (status)
@@ -3666,7 +3666,7 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb(); /* posted write ordering */
 
 out:
 	if (status)
@@ -3778,7 +3778,7 @@
 
 	/* Set chip new ring index. */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb(); /* posted write ordering */
 
 	LEAVE("qla1280_isp_cmd");
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 22:22       ` Jeremy Higdon
@ 2004-09-23 23:36         ` James Bottomley
  2004-09-24  5:03           ` Jeremy Higdon
  0 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2004-09-23 23:36 UTC (permalink / raw)
  To: Jeremy Higdon; +Cc: Jesse Barnes, linux-arch

On Thu, 2004-09-23 at 18:22, Jeremy Higdon wrote:
> James, is this what you want?
> I think these are the only writes that need ordering, as opposed to flushing.
> Being not certain what was intended in the other cases, it's safer to leave
> them be.

Well, these two are just the kicks to tell the card that we have filled
a consistent memory slot with a SCSI command.  The ordering, I assume,
is because we want SCSI commands issued in the order they went down to
queuecommand in, so the mmiowb() preserves that order?  Could we have a
comment to that effect.

My objection isn't so much that what the original patch did would result
in errors, merely that we have enough trouble explaining these issues to
device driver writers, so whatever you do in your patches demonstrating
use has to be really exemplary.

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 23:36         ` James Bottomley
@ 2004-09-24  5:03           ` Jeremy Higdon
  0 siblings, 0 replies; 13+ messages in thread
From: Jeremy Higdon @ 2004-09-24  5:03 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jesse Barnes, linux-arch

On Thu, Sep 23, 2004 at 07:36:52PM -0400, James Bottomley wrote:
> On Thu, 2004-09-23 at 18:22, Jeremy Higdon wrote:
> > James, is this what you want?
> > I think these are the only writes that need ordering, as opposed to flushing.
> > Being not certain what was intended in the other cases, it's safer to leave
> > them be.
> 
> Well, these two are just the kicks to tell the card that we have filled
> a consistent memory slot with a SCSI command.  The ordering, I assume,
> is because we want SCSI commands issued in the order they went down to
> queuecommand in, so the mmiowb() preserves that order?  Could we have a
> comment to that effect.

Okay, I'll add a comment there.  On the qla1280, it's worse than just
out of order.  If the write order is switched, the chip will issue an
entire queue's worth of stale commands.

> My objection isn't so much that what the original patch did would result
> in errors, merely that we have enough trouble explaining these issues to
> device driver writers, so whatever you do in your patches demonstrating
> use has to be really exemplary.

I think there was also the issue of whether we wanted a flush or just
ordering for the interrupt enable/disable.  Since we don't really know,
and since it's not a performance path, it's safer to leave it a flush
after those.

Is the following diff what you're thinking of, James?
I'm not entirely happy with it, and could spend more time on it.

===== drivers/scsi/qla1280.c 1.65 vs edited =====
--- 1.65/drivers/scsi/qla1280.c	2004-07-28 20:59:10 -07:00
+++ edited/drivers/scsi/qla1280.c	2004-09-23 21:58:40 -07:00
@@ -3398,7 +3398,18 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+
+	/*
+	 * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+	 * of the request queue in register is ordered ahead of writes issued
+	 * after this one by other CPUs.  Access to the register is protected
+	 * by the host_lock.  Without the mmiowb, however, it is possible for
+	 * this CPU to release the host lock, another CPU acquire the host lock,
+	 * and write to the request queue in, and have the second write make it
+	 * to the chip first.  See Documentation/DocBook/deviceiobook.tmpl for
+	 * more information.
+	 */
+	mmiowb(); /* posted write ordering */
 
  out:
 	if (status)
@@ -3666,7 +3677,18 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+
+	/*
+	 * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+	 * of the request queue in register is ordered ahead of writes issued
+	 * after this one by other CPUs.  Access to the register is protected
+	 * by the host_lock.  Without the mmiowb, however, it is possible for
+	 * this CPU to release the host lock, another CPU acquire the host lock,
+	 * and write to the request queue in, and have the second write make it
+	 * to the chip first.  See Documentation/DocBook/deviceiobook.tmpl for
+	 * more information.
+	 */
+	mmiowb(); /* posted write ordering */
 
 out:
 	if (status)
@@ -3778,7 +3800,18 @@
 
 	/* Set chip new ring index. */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+
+	/*
+	 * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+	 * of the request queue in register is ordered ahead of writes issued
+	 * after this one by other CPUs.  Access to the register is protected
+	 * by the host_lock.  Without the mmiowb, however, it is possible for
+	 * this CPU to release the host lock, another CPU acquire the host lock,
+	 * and write to the request queue in, and have the second write make it
+	 * to the chip first.  See Documentation/DocBook/deviceiobook.tmpl for
+	 * more information.
+	 */
+	mmiowb(); /* posted write ordering */
 
 	LEAVE("qla1280_isp_cmd");
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-23 18:48 [PATCH] I/O space write barrier Jesse Barnes
  2004-09-23 19:03 ` James Bottomley
@ 2004-09-27  0:45 ` Benjamin Herrenschmidt
  2004-09-27 15:41   ` Jesse Barnes
  1 sibling, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2004-09-27  0:45 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Linux Arch list

On Fri, 2004-09-24 at 04:48, Jesse Barnes wrote:

>@@ -1727,7 +1727,7 @@
>        reg = ha->iobase;
>        /* disable risc and host interrupts */
>        WRT_REG_WORD(&reg->ictrl, 0);
>-       RD_REG_WORD(&reg->ictrl);       /* PCI Posted Write flush */
>+       mmiowb(); /* make sure this write arrives before any others */
>        ha->flags.ints_enabled = 0;
> }

This is a typical example of non working sync. You platform might have
some kind of guarantee vs. ordering of interrupts, but in the general
case, the above may "leak", that is you may still take an interrupt
after ha->flags.ints_enabled = 0.

First, on some CPUs like evil PPCs, that write to memory may end up
beeing on a different write queue as I explaines, thouh the mmiowb()
will help if I implement it as a strong barrier (and the previous
MMIO read wouldn't have helped). But that isn't my point.

The problem is that interrupts are asynchronous by nature. When the
above code is reached, the device may already have asserted the
interrupt line, which may still be in the process of propagating to
the CPU through the various APICs or within the CPU core. There is
no way you can guarantee you won't take it (though since it's a PCI
interrupt, it's, I suppose, a level interrupt, so it will probably
go away shortly after beeing asserted). It may even be on it's way
to beeing handled on another CPU in fact. Even synchronize_irq()
won't help.

I haven't looked at the rest of the driver, but it certainly must be
prepared to take spurrious interrupts since that can (and so will)
happen.

Ben.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-27  0:45 ` Benjamin Herrenschmidt
@ 2004-09-27 15:41   ` Jesse Barnes
  0 siblings, 0 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-27 15:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linux Arch list

On Sunday, September 26, 2004 5:45 pm, Benjamin Herrenschmidt wrote:
> On Fri, 2004-09-24 at 04:48, Jesse Barnes wrote:
> >@@ -1727,7 +1727,7 @@
> >        reg = ha->iobase;
> >        /* disable risc and host interrupts */
> >        WRT_REG_WORD(&reg->ictrl, 0);
> >-       RD_REG_WORD(&reg->ictrl);       /* PCI Posted Write flush */
> >+       mmiowb(); /* make sure this write arrives before any others */
> >        ha->flags.ints_enabled = 0;
> > }
>
> This is a typical example of non working sync. You platform might have
> some kind of guarantee vs. ordering of interrupts, but in the general
> case, the above may "leak", that is you may still take an interrupt
> after ha->flags.ints_enabled = 0.

Right, that's what James pointed out.  Jeremy will submit the qla mmiowb bits, 
and has dropped these, since the routine probably shouldn't return until 
interrupts are actually enabled.  (But I see on your platform the sore to 
ins_enabled will probably be issued in parallel to the MMIO store to the 
interrupt register, and so might happen first.)

> First, on some CPUs like evil PPCs, that write to memory may end up
> beeing on a different write queue as I explaines, thouh the mmiowb()
> will help if I implement it as a strong barrier (and the previous
> MMIO read wouldn't have helped). But that isn't my point.
>
> The problem is that interrupts are asynchronous by nature. When the
> above code is reached, the device may already have asserted the
> interrupt line, which may still be in the process of propagating to
> the CPU through the various APICs or within the CPU core. There is
> no way you can guarantee you won't take it (though since it's a PCI
> interrupt, it's, I suppose, a level interrupt, so it will probably
> go away shortly after beeing asserted). It may even be on it's way
> to beeing handled on another CPU in fact. Even synchronize_irq()
> won't help.
>
> I haven't looked at the rest of the driver, but it certainly must be
> prepared to take spurrious interrupts since that can (and so will)
> happen.

Right, afaik all drivers (or at least all PCI drivers) should be able to deal 
with this.  Whether or not they do is a different story (I think qla does).

Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-09-27 15:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-23 18:48 [PATCH] I/O space write barrier Jesse Barnes
2004-09-23 19:03 ` James Bottomley
2004-09-23 19:07   ` Jesse Barnes
2004-09-23 19:27     ` James Bottomley
2004-09-23 19:41       ` Jesse Barnes
2004-09-23 19:57       ` Jeremy Higdon
2004-09-23 22:22       ` Jeremy Higdon
2004-09-23 23:36         ` James Bottomley
2004-09-24  5:03           ` Jeremy Higdon
2004-09-23 19:55     ` Jeremy Higdon
2004-09-23 20:09       ` Jesse Barnes
2004-09-27  0:45 ` Benjamin Herrenschmidt
2004-09-27 15:41   ` Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox