* [PATCH] I/O space write barrier
@ 2004-09-23 18:48 Jesse Barnes
2004-09-23 19:03 ` James Bottomley
2004-09-27 0:45 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-23 18:48 UTC (permalink / raw)
To: linux-arch
[-- Attachment #1: Type: text/plain, Size: 2483 bytes --]
[Posting again to linux-arch at Ralf's suggestion.]
On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to
I/O space aren't ordered coming from different CPUs. For the most part, this
isn't a problem since drivers generally spinlock around code that does writeX
calls, but if the last operation a driver does before it releases a lock is a
write and some other CPU takes the lock and immediately does a write, it's
possible the second CPU's write could arrive before the first's.
This patch adds a mmiowb() call to deal with this sort of situation, and
modifies the qla1280 driver to use it. The idea is to mirror the regular,
cacheable memory barrier operation, wmb (arg! I misnamed it in my patch,
I'll fix that up in the next rev assuming it's ok). Example:
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: writel(newval2, ring_ptr);
CPU B: ...
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
In this case, newval2 could be written to ring_ptr before newval. Fixing it
is easy though:
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: mmiowb(); /* ensure no other writes beat us to the device */
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: writel(newval2, ring_ptr);
CPU B: ...
CPU B: mmiowb();
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
Note that this doesn't address a related case where the driver may want to
actually make a given write get to the device before proceeding. This should
be dealt with by immediately reading a register from the card that has no
side effects. According to the PCI spec, that will guarantee that all writes
have arrived before being sent to the target bus. If no such register is
available (in the case of card resets perhaps), reading from config space is
sufficient (though it may return all ones if the card isn't responding to
read cycles).
I'm also not quite sure if I got the ppc64 case right (and obviously I need to
add the macro for other platforms). The assumption is that *most* platforms
don't have this issue, and those that do have a chipset register or CPU
instruction to ensure ordering. If that turns out not to be the case, we can
add an address argument to the function.
Thanks,
Jesse
[-- Attachment #2: mmiowb.patch --]
[-- Type: text/x-diff, Size: 10164 bytes --]
===== Documentation/DocBook/deviceiobook.tmpl 1.4 vs edited =====
--- 1.4/Documentation/DocBook/deviceiobook.tmpl 2004-02-03 21:31:10 -08:00
+++ edited/Documentation/DocBook/deviceiobook.tmpl 2004-09-22 07:59:56 -07:00
@@ -147,8 +147,7 @@
compiler is not permitted to reorder the I/O sequence. When the
ordering can be compiler optimised, you can use <function>
__readb</function> and friends to indicate the relaxed ordering. Use
- this with care. The <function>rmb</function> provides a read memory
- barrier. The <function>wmb</function> provides a write memory barrier.
+ this with care.
</para>
<para>
@@ -163,6 +162,15 @@
</para>
<para>
+ In addition to write posting, some large SMP systems (e.g. SGI
+ Challenge, Origin and Altix machines) won't strongly order writes
+ coming from different CPUs. Thus it's important to properly
+ protect parts of your driver that do memory-mapped writes with
+ locks and use the <function>mmiob</function> to make sure they
+ arrive in the order intended.
+ </para>
+
+ <para>
PCI ordering rules also guarantee that PIO read responses arrive
after any outstanding DMA writes on that bus, since for some devices
the result of a <function>readb</function> call may signal to the
@@ -171,7 +179,9 @@
<function>readb</function> call has no relation to any previous DMA
writes performed by the device. The driver can use
<function>readb_relaxed</function> for these cases, although only
- some platforms will honor the relaxed semantics.
+ some platforms will honor the relaxed semantics. Using the relaxed
+ read functions will provide significant performance benefits on
+ platforms that support it.
</para>
</sect1>
===== arch/ia64/sn/io/machvec/iomv.c 1.9 vs edited =====
--- 1.9/arch/ia64/sn/io/machvec/iomv.c 2004-05-26 06:49:19 -07:00
+++ edited/arch/ia64/sn/io/machvec/iomv.c 2004-09-22 08:33:35 -07:00
@@ -54,23 +54,19 @@
EXPORT_SYMBOL(sn_io_addr);
/**
- * sn_mmiob - I/O space memory barrier
+ * __sn_mmiowb - I/O space write barrier
*
- * Acts as a memory mapped I/O barrier for platforms that queue writes to
- * I/O space. This ensures that subsequent writes to I/O space arrive after
- * all previous writes. For most ia64 platforms, this is a simple
- * 'mf.a' instruction. For other platforms, mmiob() may have to read
- * a chipset register to ensure ordering.
+ * See include/asm-ia64/io.h and Documentation/DocBook/deviceiobook.tmpl
+ * for details.
*
* On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
* See PV 871084 for details about the WAR about zero value.
*
*/
-void
-sn_mmiob (void)
+void __sn_mmiowb(void)
{
while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) !=
SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
cpu_relax();
}
-EXPORT_SYMBOL(sn_mmiob);
+EXPORT_SYMBOL(__sn_mmiowb);
===== drivers/scsi/qla1280.c 1.65 vs edited =====
--- 1.65/drivers/scsi/qla1280.c 2004-07-28 20:59:10 -07:00
+++ edited/drivers/scsi/qla1280.c 2004-09-22 08:31:46 -07:00
@@ -1715,7 +1715,7 @@
reg = ha->iobase;
/* enable risc and host interrupts */
WRT_REG_WORD(®->ictrl, (ISP_EN_INT | ISP_EN_RISC));
- RD_REG_WORD(®->ictrl); /* PCI Posted Write flush */
+ mmiowb(); /* make sure this write arrives before any others */
ha->flags.ints_enabled = 1;
}
@@ -1727,7 +1727,7 @@
reg = ha->iobase;
/* disable risc and host interrupts */
WRT_REG_WORD(®->ictrl, 0);
- RD_REG_WORD(®->ictrl); /* PCI Posted Write flush */
+ mmiowb(); /* make sure this write arrives before any others */
ha->flags.ints_enabled = 0;
}
@@ -3398,7 +3398,7 @@
sp->flags |= SRB_SENT;
ha->actthreads++;
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(®->mailbox4); /* PCI posted write flush */
+ mmiowb(); /* make sure this write arrives before any others */
out:
if (status)
===== include/asm-ia64/io.h 1.19 vs edited =====
--- 1.19/include/asm-ia64/io.h 2004-02-03 21:31:10 -08:00
+++ edited/include/asm-ia64/io.h 2004-09-22 08:11:50 -07:00
@@ -91,6 +91,20 @@
*/
#define __ia64_mf_a() ia64_mfa()
+/**
+ * __ia64_mmiowb - I/O write barrier
+ *
+ * Ensure ordering of I/O space writes. This will make sure that writes
+ * following the barrier will arrive after all previous writes. For most
+ * ia64 platforms, this is a simple 'mf.a' instruction.
+ *
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
+static inline void __ia64_mmiowb(void)
+{
+ ia64_mfa();
+}
+
static inline const unsigned long
__ia64_get_io_port_base (void)
{
@@ -267,6 +281,7 @@
#define __outb platform_outb
#define __outw platform_outw
#define __outl platform_outl
+#define __mmiowb platform_mmiowb
#define inb(p) __inb(p)
#define inw(p) __inw(p)
@@ -280,6 +295,7 @@
#define outsb(p,s,c) __outsb(p,s,c)
#define outsw(p,s,c) __outsw(p,s,c)
#define outsl(p,s,c) __outsl(p,s,c)
+#define mmiowb() __mmiowb()
/*
* The address passed to these functions are ioremap()ped already.
===== include/asm-ia64/machvec.h 1.26 vs edited =====
--- 1.26/include/asm-ia64/machvec.h 2004-08-03 16:05:22 -07:00
+++ edited/include/asm-ia64/machvec.h 2004-09-22 08:12:33 -07:00
@@ -62,6 +62,7 @@
typedef void ia64_mv_outb_t (unsigned char, unsigned long);
typedef void ia64_mv_outw_t (unsigned short, unsigned long);
typedef void ia64_mv_outl_t (unsigned int, unsigned long);
+typedef void ia64_mv_mmiowb_t (void);
typedef unsigned char ia64_mv_readb_t (void *);
typedef unsigned short ia64_mv_readw_t (void *);
typedef unsigned int ia64_mv_readl_t (void *);
@@ -130,6 +131,7 @@
# define platform_outb ia64_mv.outb
# define platform_outw ia64_mv.outw
# define platform_outl ia64_mv.outl
+# define platform_mmiowb ia64_mv.mmiowb
# define platform_readb ia64_mv.readb
# define platform_readw ia64_mv.readw
# define platform_readl ia64_mv.readl
@@ -176,6 +178,7 @@
ia64_mv_outb_t *outb;
ia64_mv_outw_t *outw;
ia64_mv_outl_t *outl;
+ ia64_mv_mmiowb_t *mmiowb;
ia64_mv_readb_t *readb;
ia64_mv_readw_t *readw;
ia64_mv_readl_t *readl;
@@ -218,6 +221,7 @@
platform_outb, \
platform_outw, \
platform_outl, \
+ platform_mmiowb, \
platform_readb, \
platform_readw, \
platform_readl, \
@@ -343,6 +347,9 @@
#endif
#ifndef platform_outl
# define platform_outl __ia64_outl
+#endif
+#ifndef platform_mmiowb
+# define platform_mmiowb __ia64_mmiowb
#endif
#ifndef platform_readb
# define platform_readb __ia64_readb
===== include/asm-ia64/machvec_init.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/machvec_init.h 2004-02-06 00:30:24 -08:00
+++ edited/include/asm-ia64/machvec_init.h 2004-09-22 08:13:45 -07:00
@@ -12,6 +12,7 @@
extern ia64_mv_outb_t __ia64_outb;
extern ia64_mv_outw_t __ia64_outw;
extern ia64_mv_outl_t __ia64_outl;
+extern ia64_mv_mmiowb_t __ia64_mmiowb;
extern ia64_mv_readb_t __ia64_readb;
extern ia64_mv_readw_t __ia64_readw;
extern ia64_mv_readl_t __ia64_readl;
===== include/asm-ia64/machvec_sn2.h 1.14 vs edited =====
--- 1.14/include/asm-ia64/machvec_sn2.h 2004-07-10 17:14:00 -07:00
+++ edited/include/asm-ia64/machvec_sn2.h 2004-09-22 08:14:03 -07:00
@@ -49,6 +49,7 @@
extern ia64_mv_outb_t __sn_outb;
extern ia64_mv_outw_t __sn_outw;
extern ia64_mv_outl_t __sn_outl;
+extern ia64_mv_mmiowb_t __sn_mmiowb;
extern ia64_mv_readb_t __sn_readb;
extern ia64_mv_readw_t __sn_readw;
extern ia64_mv_readl_t __sn_readl;
@@ -92,6 +93,7 @@
#define platform_outb __sn_outb
#define platform_outw __sn_outw
#define platform_outl __sn_outl
+#define platform_mmiowb __sn_mmiowb
#define platform_readb __sn_readb
#define platform_readw __sn_readw
#define platform_readl __sn_readl
===== include/asm-ia64/sn/io.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/sn/io.h 2004-02-13 07:00:22 -08:00
+++ edited/include/asm-ia64/sn/io.h 2004-09-22 08:32:52 -07:00
@@ -58,8 +58,8 @@
#include <asm/sn/sn2/shubio.h>
/*
- * Used to ensure write ordering (like mb(), but for I/O space)
+ * Used to ensure write ordering
*/
-extern void sn_mmiob(void);
+extern void __sn_mmiowb(void);
#endif /* _ASM_IA64_SN_IO_H */
===== include/asm-ia64/sn/sn2/io.h 1.6 vs edited =====
--- 1.6/include/asm-ia64/sn/sn2/io.h 2004-07-22 17:00:00 -07:00
+++ edited/include/asm-ia64/sn/sn2/io.h 2004-09-22 08:15:12 -07:00
@@ -11,8 +11,10 @@
#include <linux/compiler.h>
#include <asm/intrinsics.h>
-extern void * sn_io_addr(unsigned long port) __attribute_const__; /* Forward definition */
-extern void sn_mmiob(void); /* Forward definition */
+/* Forward declarations */
+struct device;
+extern void *sn_io_addr(unsigned long port) __attribute_const__;
+extern void __sn_mmiowb(void);
#define __sn_mf_a() ia64_mfa()
@@ -91,7 +93,7 @@
if ((addr = sn_io_addr(port))) {
*addr = val;
- sn_mmiob();
+ __sn_mmiowb();
}
}
@@ -102,7 +104,7 @@
if ((addr = sn_io_addr(port))) {
*addr = val;
- sn_mmiob();
+ __sn_mmiowb();
}
}
@@ -113,7 +115,7 @@
if ((addr = sn_io_addr(port))) {
*addr = val;
- sn_mmiob();
+ __sn_mmiowb();
}
}
===== include/asm-mips/io.h 1.7 vs edited =====
--- 1.7/include/asm-mips/io.h 2004-02-19 12:53:02 -08:00
+++ edited/include/asm-mips/io.h 2004-09-22 08:24:53 -07:00
@@ -290,6 +290,9 @@
#define __raw_writeb(b,addr) ((*(volatile unsigned char *)(addr)) = (b))
#define __raw_writew(w,addr) ((*(volatile unsigned short *)(addr)) = (w))
#define __raw_writel(l,addr) ((*(volatile unsigned int *)(addr)) = (l))
+
+#define mmiowb() asm volatile ("sync" ::: "memory")
+
#ifdef CONFIG_MIPS32
#define ____raw_writeq(val,addr) \
({ \
===== include/asm-ppc64/io.h 1.21 vs edited =====
--- 1.21/include/asm-ppc64/io.h 2004-09-13 11:31:52 -07:00
+++ edited/include/asm-ppc64/io.h 2004-09-22 08:24:12 -07:00
@@ -152,6 +152,8 @@
extern void _insl_ns(volatile u32 *port, void *buf, int nl);
extern void _outsl_ns(volatile u32 *port, const void *buf, int nl);
+#define mmiowb() asm volatile ("eieio" ::: "memory")
+
/*
* output pause versions need a delay at least for the
* w83c105 ide controller in a p610.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 18:48 [PATCH] I/O space write barrier Jesse Barnes
@ 2004-09-23 19:03 ` James Bottomley
2004-09-23 19:07 ` Jesse Barnes
2004-09-27 0:45 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 13+ messages in thread
From: James Bottomley @ 2004-09-23 19:03 UTC (permalink / raw)
To: Jesse Barnes; +Cc: linux-arch
On Thu, 2004-09-23 at 14:48, Jesse Barnes wrote:
> - RD_REG_WORD(®->ictrl); /* PCI Posted Write flush */
> + mmiowb(); /* make sure this write arrives before any others */
What's going on here? In your docs you say this mmiowb() is explicitly
to prevent write ordering problems and doesn't affect posting. However
in this patch you're using it to replace a posting flush read.
> ha->flags.ints_enabled = 1;
The result will be that we can get to here, but the write that enables
the interrupt may still be posted in a bridge somewhere.
James
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 19:03 ` James Bottomley
@ 2004-09-23 19:07 ` Jesse Barnes
2004-09-23 19:27 ` James Bottomley
2004-09-23 19:55 ` Jeremy Higdon
0 siblings, 2 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-23 19:07 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-arch, jeremy
On Thursday, September 23, 2004 3:03 pm, James Bottomley wrote:
> On Thu, 2004-09-23 at 14:48, Jesse Barnes wrote:
> > - RD_REG_WORD(®->ictrl); /* PCI Posted Write flush */
> > + mmiowb(); /* make sure this write arrives before any others */
>
> What's going on here? In your docs you say this mmiowb() is explicitly
> to prevent write ordering problems and doesn't affect posting. However
> in this patch you're using it to replace a posting flush read.
Jeremy was the one that added these, I think the comment is misleading and
only ordering was intended. If that's not the case I'll drop this bit.
>
> > ha->flags.ints_enabled = 1;
>
> The result will be that we can get to here, but the write that enables
> the interrupt may still be posted in a bridge somewhere.
If we're waiting for an interrupt here, I don't think it matters if we flush
or order, we'll wait the same amount of time regardless.
Thanks,
Jesse
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 19:07 ` Jesse Barnes
@ 2004-09-23 19:27 ` James Bottomley
2004-09-23 19:41 ` Jesse Barnes
` (2 more replies)
2004-09-23 19:55 ` Jeremy Higdon
1 sibling, 3 replies; 13+ messages in thread
From: James Bottomley @ 2004-09-23 19:27 UTC (permalink / raw)
To: Jesse Barnes; +Cc: linux-arch, jeremy
On Thu, 2004-09-23 at 15:07, Jesse Barnes wrote:
> If we're waiting for an interrupt here, I don't think it matters if we flush
> or order, we'll wait the same amount of time regardless.
I don't think so. Your ordering barrier doesn't cause a posted write
flush. Posted writes have theoretically no upper limit defined in the
spec for the time they may remain posted, so in the former case, you are
guaranteed that by the time you set the flag and exit the function that
interrupts are enabled in the qla1280. If you apply the patch you sent
in, this guarantee is broken and you don't really know how much longer
after exiting the function it will be before interrupts become enabled
(although in practice it's probably only of the order of ms).
James
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 19:27 ` James Bottomley
@ 2004-09-23 19:41 ` Jesse Barnes
2004-09-23 19:57 ` Jeremy Higdon
2004-09-23 22:22 ` Jeremy Higdon
2 siblings, 0 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-23 19:41 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-arch, jeremy
On Thursday, September 23, 2004 3:27 pm, James Bottomley wrote:
> On Thu, 2004-09-23 at 15:07, Jesse Barnes wrote:
> > If we're waiting for an interrupt here, I don't think it matters if we
> > flush or order, we'll wait the same amount of time regardless.
>
> I don't think so. Your ordering barrier doesn't cause a posted write
> flush. Posted writes have theoretically no upper limit defined in the
> spec for the time they may remain posted, so in the former case, you are
> guaranteed that by the time you set the flag and exit the function that
> interrupts are enabled in the qla1280.
Ok, you're right we don't want to break that. I just didn't think it was that
important for the driver to think that interrupts were enabled even when they
weren't since it would probably do another read sometime soon anyway, but
that's sloppy, so I'll just drop this bit.
Thanks,
Jesse
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 19:07 ` Jesse Barnes
2004-09-23 19:27 ` James Bottomley
@ 2004-09-23 19:55 ` Jeremy Higdon
2004-09-23 20:09 ` Jesse Barnes
1 sibling, 1 reply; 13+ messages in thread
From: Jeremy Higdon @ 2004-09-23 19:55 UTC (permalink / raw)
To: Jesse Barnes; +Cc: James Bottomley, linux-arch
On Thu, Sep 23, 2004 at 03:07:26PM -0400, Jesse Barnes wrote:
> On Thursday, September 23, 2004 3:03 pm, James Bottomley wrote:
> > On Thu, 2004-09-23 at 14:48, Jesse Barnes wrote:
> > > - RD_REG_WORD(®->ictrl); /* PCI Posted Write flush */
> > > + mmiowb(); /* make sure this write arrives before any others */
> >
> > What's going on here? In your docs you say this mmiowb() is explicitly
> > to prevent write ordering problems and doesn't affect posting. However
> > in this patch you're using it to replace a posting flush read.
>
> Jeremy was the one that added these, I think the comment is misleading and
> only ordering was intended. If that's not the case I'll drop this bit.
No, I didn't add those. In fact, I thought them to be unnecessary. Jes
added them, I believe. I added the reads after the writes of req_ring_index
to mailbox4. Since at the time we did not have a write ordering function,
I had to use the heavy weight flush.
Take a look at the 1.56 diffs in the scsi_misc-2.6 tree.
jeremy
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 19:27 ` James Bottomley
2004-09-23 19:41 ` Jesse Barnes
@ 2004-09-23 19:57 ` Jeremy Higdon
2004-09-23 22:22 ` Jeremy Higdon
2 siblings, 0 replies; 13+ messages in thread
From: Jeremy Higdon @ 2004-09-23 19:57 UTC (permalink / raw)
To: James Bottomley; +Cc: Jesse Barnes, linux-arch
On Thu, Sep 23, 2004 at 03:27:24PM -0400, James Bottomley wrote:
> On Thu, 2004-09-23 at 15:07, Jesse Barnes wrote:
> > If we're waiting for an interrupt here, I don't think it matters if we flush
> > or order, we'll wait the same amount of time regardless.
>
> I don't think so. Your ordering barrier doesn't cause a posted write
> flush. Posted writes have theoretically no upper limit defined in the
> spec for the time they may remain posted, so in the former case, you are
> guaranteed that by the time you set the flag and exit the function that
> interrupts are enabled in the qla1280. If you apply the patch you sent
> in, this guarantee is broken and you don't really know how much longer
> after exiting the function it will be before interrupts become enabled
> (although in practice it's probably only of the order of ms).
>
> James
I would say microseconds. I think that knowing that the hardware has
interrupts enabled by the end of the function is unnecessary, because
they will be enabled before you can do anything else to the chip.
But it shouldn't hurt anything either, and defensive programming in
old SCSI drivers is often a good idea :-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 19:55 ` Jeremy Higdon
@ 2004-09-23 20:09 ` Jesse Barnes
0 siblings, 0 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-23 20:09 UTC (permalink / raw)
To: Jeremy Higdon; +Cc: James Bottomley, linux-arch
On Thursday, September 23, 2004 3:55 pm, Jeremy Higdon wrote:
> On Thu, Sep 23, 2004 at 03:07:26PM -0400, Jesse Barnes wrote:
> > On Thursday, September 23, 2004 3:03 pm, James Bottomley wrote:
> > > On Thu, 2004-09-23 at 14:48, Jesse Barnes wrote:
> > > > - RD_REG_WORD(®->ictrl); /* PCI Posted Write flush */
> > > > + mmiowb(); /* make sure this write arrives before any others */
> > >
> > > What's going on here? In your docs you say this mmiowb() is explicitly
> > > to prevent write ordering problems and doesn't affect posting. However
> > > in this patch you're using it to replace a posting flush read.
> >
> > Jeremy was the one that added these, I think the comment is misleading
> > and only ordering was intended. If that's not the case I'll drop this
> > bit.
>
> No, I didn't add those. In fact, I thought them to be unnecessary. Jes
> added them, I believe. I added the reads after the writes of
> req_ring_index to mailbox4. Since at the time we did not have a write
> ordering function, I had to use the heavy weight flush.
Ok, sorry for the misattribution :). At any rate, I've already dropped that
bit.
Jesse
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 19:27 ` James Bottomley
2004-09-23 19:41 ` Jesse Barnes
2004-09-23 19:57 ` Jeremy Higdon
@ 2004-09-23 22:22 ` Jeremy Higdon
2004-09-23 23:36 ` James Bottomley
2 siblings, 1 reply; 13+ messages in thread
From: Jeremy Higdon @ 2004-09-23 22:22 UTC (permalink / raw)
To: James Bottomley; +Cc: Jesse Barnes, linux-arch
On Thu, Sep 23, 2004 at 03:27:24PM -0400, James Bottomley wrote:
> On Thu, 2004-09-23 at 15:07, Jesse Barnes wrote:
> > If we're waiting for an interrupt here, I don't think it matters if we flush
> > or order, we'll wait the same amount of time regardless.
>
> I don't think so. Your ordering barrier doesn't cause a posted write
> flush. Posted writes have theoretically no upper limit defined in the
> spec for the time they may remain posted, so in the former case, you are
> guaranteed that by the time you set the flag and exit the function that
> interrupts are enabled in the qla1280. If you apply the patch you sent
> in, this guarantee is broken and you don't really know how much longer
> after exiting the function it will be before interrupts become enabled
> (although in practice it's probably only of the order of ms).
>
> James
James, is this what you want?
I think these are the only writes that need ordering, as opposed to flushing.
Being not certain what was intended in the other cases, it's safer to leave
them be.
===== drivers/scsi/qla1280.c 1.65 vs edited =====
--- 1.65/drivers/scsi/qla1280.c 2004-07-28 20:59:10 -07:00
+++ edited/drivers/scsi/qla1280.c 2004-09-23 15:19:56 -07:00
@@ -3398,7 +3398,7 @@
sp->flags |= SRB_SENT;
ha->actthreads++;
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(®->mailbox4); /* PCI posted write flush */
+ mmiowb(); /* posted write ordering */
out:
if (status)
@@ -3666,7 +3666,7 @@
sp->flags |= SRB_SENT;
ha->actthreads++;
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(®->mailbox4); /* PCI posted write flush */
+ mmiowb(); /* posted write ordering */
out:
if (status)
@@ -3778,7 +3778,7 @@
/* Set chip new ring index. */
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(®->mailbox4); /* PCI posted write flush */
+ mmiowb(); /* posted write ordering */
LEAVE("qla1280_isp_cmd");
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 22:22 ` Jeremy Higdon
@ 2004-09-23 23:36 ` James Bottomley
2004-09-24 5:03 ` Jeremy Higdon
0 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2004-09-23 23:36 UTC (permalink / raw)
To: Jeremy Higdon; +Cc: Jesse Barnes, linux-arch
On Thu, 2004-09-23 at 18:22, Jeremy Higdon wrote:
> James, is this what you want?
> I think these are the only writes that need ordering, as opposed to flushing.
> Being not certain what was intended in the other cases, it's safer to leave
> them be.
Well, these two are just the kicks to tell the card that we have filled
a consistent memory slot with a SCSI command. The ordering, I assume,
is because we want SCSI commands issued in the order they went down to
queuecommand in, so the mmiowb() preserves that order? Could we have a
comment to that effect.
My objection isn't so much that what the original patch did would result
in errors, merely that we have enough trouble explaining these issues to
device driver writers, so whatever you do in your patches demonstrating
use has to be really exemplary.
James
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 23:36 ` James Bottomley
@ 2004-09-24 5:03 ` Jeremy Higdon
0 siblings, 0 replies; 13+ messages in thread
From: Jeremy Higdon @ 2004-09-24 5:03 UTC (permalink / raw)
To: James Bottomley; +Cc: Jesse Barnes, linux-arch
On Thu, Sep 23, 2004 at 07:36:52PM -0400, James Bottomley wrote:
> On Thu, 2004-09-23 at 18:22, Jeremy Higdon wrote:
> > James, is this what you want?
> > I think these are the only writes that need ordering, as opposed to flushing.
> > Being not certain what was intended in the other cases, it's safer to leave
> > them be.
>
> Well, these two are just the kicks to tell the card that we have filled
> a consistent memory slot with a SCSI command. The ordering, I assume,
> is because we want SCSI commands issued in the order they went down to
> queuecommand in, so the mmiowb() preserves that order? Could we have a
> comment to that effect.
Okay, I'll add a comment there. On the qla1280, it's worse than just
out of order. If the write order is switched, the chip will issue an
entire queue's worth of stale commands.
> My objection isn't so much that what the original patch did would result
> in errors, merely that we have enough trouble explaining these issues to
> device driver writers, so whatever you do in your patches demonstrating
> use has to be really exemplary.
I think there was also the issue of whether we wanted a flush or just
ordering for the interrupt enable/disable. Since we don't really know,
and since it's not a performance path, it's safer to leave it a flush
after those.
Is the following diff what you're thinking of, James?
I'm not entirely happy with it, and could spend more time on it.
===== drivers/scsi/qla1280.c 1.65 vs edited =====
--- 1.65/drivers/scsi/qla1280.c 2004-07-28 20:59:10 -07:00
+++ edited/drivers/scsi/qla1280.c 2004-09-23 21:58:40 -07:00
@@ -3398,7 +3398,18 @@
sp->flags |= SRB_SENT;
ha->actthreads++;
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(®->mailbox4); /* PCI posted write flush */
+
+ /*
+ * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+ * of the request queue in register is ordered ahead of writes issued
+ * after this one by other CPUs. Access to the register is protected
+ * by the host_lock. Without the mmiowb, however, it is possible for
+ * this CPU to release the host lock, another CPU acquire the host lock,
+ * and write to the request queue in, and have the second write make it
+ * to the chip first. See Documentation/DocBook/deviceiobook.tmpl for
+ * more information.
+ */
+ mmiowb(); /* posted write ordering */
out:
if (status)
@@ -3666,7 +3677,18 @@
sp->flags |= SRB_SENT;
ha->actthreads++;
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(®->mailbox4); /* PCI posted write flush */
+
+ /*
+ * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+ * of the request queue in register is ordered ahead of writes issued
+ * after this one by other CPUs. Access to the register is protected
+ * by the host_lock. Without the mmiowb, however, it is possible for
+ * this CPU to release the host lock, another CPU acquire the host lock,
+ * and write to the request queue in, and have the second write make it
+ * to the chip first. See Documentation/DocBook/deviceiobook.tmpl for
+ * more information.
+ */
+ mmiowb(); /* posted write ordering */
out:
if (status)
@@ -3778,7 +3800,18 @@
/* Set chip new ring index. */
WRT_REG_WORD(®->mailbox4, ha->req_ring_index);
- (void) RD_REG_WORD(®->mailbox4); /* PCI posted write flush */
+
+ /*
+ * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+ * of the request queue in register is ordered ahead of writes issued
+ * after this one by other CPUs. Access to the register is protected
+ * by the host_lock. Without the mmiowb, however, it is possible for
+ * this CPU to release the host lock, another CPU acquire the host lock,
+ * and write to the request queue in, and have the second write make it
+ * to the chip first. See Documentation/DocBook/deviceiobook.tmpl for
+ * more information.
+ */
+ mmiowb(); /* posted write ordering */
LEAVE("qla1280_isp_cmd");
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-23 18:48 [PATCH] I/O space write barrier Jesse Barnes
2004-09-23 19:03 ` James Bottomley
@ 2004-09-27 0:45 ` Benjamin Herrenschmidt
2004-09-27 15:41 ` Jesse Barnes
1 sibling, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2004-09-27 0:45 UTC (permalink / raw)
To: Jesse Barnes; +Cc: Linux Arch list
On Fri, 2004-09-24 at 04:48, Jesse Barnes wrote:
>@@ -1727,7 +1727,7 @@
> reg = ha->iobase;
> /* disable risc and host interrupts */
> WRT_REG_WORD(®->ictrl, 0);
>- RD_REG_WORD(®->ictrl); /* PCI Posted Write flush */
>+ mmiowb(); /* make sure this write arrives before any others */
> ha->flags.ints_enabled = 0;
> }
This is a typical example of non working sync. You platform might have
some kind of guarantee vs. ordering of interrupts, but in the general
case, the above may "leak", that is you may still take an interrupt
after ha->flags.ints_enabled = 0.
First, on some CPUs like evil PPCs, that write to memory may end up
beeing on a different write queue as I explaines, thouh the mmiowb()
will help if I implement it as a strong barrier (and the previous
MMIO read wouldn't have helped). But that isn't my point.
The problem is that interrupts are asynchronous by nature. When the
above code is reached, the device may already have asserted the
interrupt line, which may still be in the process of propagating to
the CPU through the various APICs or within the CPU core. There is
no way you can guarantee you won't take it (though since it's a PCI
interrupt, it's, I suppose, a level interrupt, so it will probably
go away shortly after beeing asserted). It may even be on it's way
to beeing handled on another CPU in fact. Even synchronize_irq()
won't help.
I haven't looked at the rest of the driver, but it certainly must be
prepared to take spurrious interrupts since that can (and so will)
happen.
Ben.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] I/O space write barrier
2004-09-27 0:45 ` Benjamin Herrenschmidt
@ 2004-09-27 15:41 ` Jesse Barnes
0 siblings, 0 replies; 13+ messages in thread
From: Jesse Barnes @ 2004-09-27 15:41 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Linux Arch list
On Sunday, September 26, 2004 5:45 pm, Benjamin Herrenschmidt wrote:
> On Fri, 2004-09-24 at 04:48, Jesse Barnes wrote:
> >@@ -1727,7 +1727,7 @@
> > reg = ha->iobase;
> > /* disable risc and host interrupts */
> > WRT_REG_WORD(®->ictrl, 0);
> >- RD_REG_WORD(®->ictrl); /* PCI Posted Write flush */
> >+ mmiowb(); /* make sure this write arrives before any others */
> > ha->flags.ints_enabled = 0;
> > }
>
> This is a typical example of non working sync. You platform might have
> some kind of guarantee vs. ordering of interrupts, but in the general
> case, the above may "leak", that is you may still take an interrupt
> after ha->flags.ints_enabled = 0.
Right, that's what James pointed out. Jeremy will submit the qla mmiowb bits,
and has dropped these, since the routine probably shouldn't return until
interrupts are actually enabled. (But I see on your platform the sore to
ins_enabled will probably be issued in parallel to the MMIO store to the
interrupt register, and so might happen first.)
> First, on some CPUs like evil PPCs, that write to memory may end up
> beeing on a different write queue as I explaines, thouh the mmiowb()
> will help if I implement it as a strong barrier (and the previous
> MMIO read wouldn't have helped). But that isn't my point.
>
> The problem is that interrupts are asynchronous by nature. When the
> above code is reached, the device may already have asserted the
> interrupt line, which may still be in the process of propagating to
> the CPU through the various APICs or within the CPU core. There is
> no way you can guarantee you won't take it (though since it's a PCI
> interrupt, it's, I suppose, a level interrupt, so it will probably
> go away shortly after beeing asserted). It may even be on it's way
> to beeing handled on another CPU in fact. Even synchronize_irq()
> won't help.
>
> I haven't looked at the rest of the driver, but it certainly must be
> prepared to take spurrious interrupts since that can (and so will)
> happen.
Right, afaik all drivers (or at least all PCI drivers) should be able to deal
with this. Whether or not they do is a different story (I think qla does).
Jesse
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2004-09-27 15:41 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-23 18:48 [PATCH] I/O space write barrier Jesse Barnes
2004-09-23 19:03 ` James Bottomley
2004-09-23 19:07 ` Jesse Barnes
2004-09-23 19:27 ` James Bottomley
2004-09-23 19:41 ` Jesse Barnes
2004-09-23 19:57 ` Jeremy Higdon
2004-09-23 22:22 ` Jeremy Higdon
2004-09-23 23:36 ` James Bottomley
2004-09-24 5:03 ` Jeremy Higdon
2004-09-23 19:55 ` Jeremy Higdon
2004-09-23 20:09 ` Jesse Barnes
2004-09-27 0:45 ` Benjamin Herrenschmidt
2004-09-27 15:41 ` Jesse Barnes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox