RE: SCSI QLA not working on latest *-mm SN2

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RE: SCSI QLA not working on latest *-mm SN2
@ 2004-09-17 22:55 Andrew Vasquez
  2004-09-17 23:10 ` Jesse Barnes
  2004-09-17 23:55 ` James Bottomley
  0 siblings, 2 replies; 78+ messages in thread
From: Andrew Vasquez @ 2004-09-17 22:55 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Jesse Barnes, Paul Jackson, linux-scsi, mdr, jeremy, djh, jbarnes,
	Andrew Morton


On Thursday, September 16, 2004 4:14 PM, Jeremy Higdon wrote:
> On Thu, Sep 16, 2004 at 01:56:50PM -0700, Andrew Vasquez wrote:
> > 
> > Interesting, the only changes in reset_chip() are for PCI posting
> > issues.  Relevant diff attached.
> > 
> 
> Are all of those reads really necessary?  Generally the only reason
> for doing a read to flush a posted write is for timing issues (in
> which the read may not be good enough, according to a thread I saw
> from Grant Grundler), or to enforce ordering before releasing a lock
> (sleeping or spinning). 
> 

The reads are there to ensure ordering of the writes at each stage of
the reset (in qla_init.c) and fw dumps (qla_dbg.c).  After speaking
with the hardware and firmware folks, the readw() after the
soft-reset in qla_init.c was probably what triggered the MCA.  Seems
we will have to settle for some sort of udelay() as what was done in
reset_chip().

> Have you run into platforms in which two I/O writes from one CPU are
> retired out of order? 
> 

Again, only for completeness.

Regards,
Andrew Vasquez

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-17 22:55 SCSI QLA not working on latest *-mm SN2 Andrew Vasquez
@ 2004-09-17 23:10 ` Jesse Barnes
  2004-09-17 23:55 ` James Bottomley
  1 sibling, 0 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-17 23:10 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: Jeremy Higdon, Paul Jackson, linux-scsi, mdr, jeremy, djh,
	jbarnes, Andrew Morton

On Friday, September 17, 2004 3:55 pm, Andrew Vasquez wrote:
> On Thursday, September 16, 2004 4:14 PM, Jeremy Higdon wrote:
> > On Thu, Sep 16, 2004 at 01:56:50PM -0700, Andrew Vasquez wrote:
> > > Interesting, the only changes in reset_chip() are for PCI posting
> > > issues.  Relevant diff attached.
> >
> > Are all of those reads really necessary?  Generally the only reason
> > for doing a read to flush a posted write is for timing issues (in
> > which the read may not be good enough, according to a thread I saw
> > from Grant Grundler), or to enforce ordering before releasing a lock
> > (sleeping or spinning).
>
> The reads are there to ensure ordering of the writes at each stage of
> the reset (in qla_init.c) and fw dumps (qla_dbg.c).  After speaking
> with the hardware and firmware folks, the readw() after the
> soft-reset in qla_init.c was probably what triggered the MCA.  Seems
> we will have to settle for some sort of udelay() as what was done in
> reset_chip().

Typically you'll only need the reads right before your code leaves a critical 
section that's done writes.  If hardware reorders the writes in a single 
threaded section, it's likely buggy, and I don't think we should code for 
that.  IOW, I think a majority of the reads in the patch are superfluous (not 
really harmful aside from the one, just slows things down).

> > Have you run into platforms in which two I/O writes from one CPU are
> > retired out of order?
>
> Again, only for completeness.

Please check out the doc I pointed you at to see if it makes sense.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: SCSI QLA not working on latest *-mm SN2
  2004-09-17 22:55 SCSI QLA not working on latest *-mm SN2 Andrew Vasquez
  2004-09-17 23:10 ` Jesse Barnes
@ 2004-09-17 23:55 ` James Bottomley
  2004-09-18  1:15   ` Andrew Vasquez
  1 sibling, 1 reply; 78+ messages in thread
From: James Bottomley @ 2004-09-17 23:55 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: Jeremy Higdon, Jesse Barnes, Paul Jackson, SCSI Mailing List, mdr,
	jeremy, djh, jbarnes, Andrew Morton

On Fri, 2004-09-17 at 18:55, Andrew Vasquez wrote:
> The reads are there to ensure ordering of the writes at each stage of
> the reset (in qla_init.c) and fw dumps (qla_dbg.c).  After speaking
> with the hardware and firmware folks, the readw() after the
> soft-reset in qla_init.c was probably what triggered the MCA.  Seems
> we will have to settle for some sort of udelay() as what was done in
> reset_chip().

Just to confirm if we absolutely have to do this...the offending reads
to issue the posting flush were to the register you just wrote to to get
the chip to reset.  However, any MMIO read to any region of that card
would also trigger a posted write flush.  Does the chip drop entirely
off the PCI bus during the execution of reset, or could we perhaps issue
an innocuous read to somewhere in PCI configuration space for the card?

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-17 23:55 ` James Bottomley
@ 2004-09-18  1:15   ` Andrew Vasquez
  2004-09-18  1:25     ` Matthew Wilcox
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Vasquez @ 2004-09-18  1:15 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Vasquez, Jeremy Higdon, Jesse Barnes, Paul Jackson,
	SCSI Mailing List, mdr, jeremy, djh, jbarnes, Andrew Morton

On Fri, 17 Sep 2004, James Bottomley wrote:

> On Fri, 2004-09-17 at 18:55, Andrew Vasquez wrote:
> > The reads are there to ensure ordering of the writes at each stage of
> > the reset (in qla_init.c) and fw dumps (qla_dbg.c).  After speaking
> > with the hardware and firmware folks, the readw() after the
> > soft-reset in qla_init.c was probably what triggered the MCA.  Seems
> > we will have to settle for some sort of udelay() as what was done in
> > reset_chip().
> 
> Just to confirm if we absolutely have to do this...the offending reads
> to issue the posting flush were to the register you just wrote to to get
> the chip to reset.  However, any MMIO read to any region of that card
> would also trigger a posted write flush.  Does the chip drop entirely
> off the PCI bus during the execution of reset, or could we perhaps issue
> an innocuous read to somewhere in PCI configuration space for the card?
> 

I had asked the hardware guy a similar question -- for the soft-reset
operation, we'll _not_ be able to issue additional readw()s until '16
PCI clocks elapse.'  So, it seems we'll have to settle with the
udelay() in this particular instance.

--
Andrew

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-18  1:15   ` Andrew Vasquez
@ 2004-09-18  1:25     ` Matthew Wilcox
  2004-09-18  1:24       ` Andrew Vasquez
                         ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Matthew Wilcox @ 2004-09-18  1:25 UTC (permalink / raw)
  To: Andrew Vasquez, James Bottomley, Jeremy Higdon, Jesse Barnes,
	Paul Jackson, SCSI Mailing List, mdr, jeremy, djh, jbarnes,
	Andrew Morton

On Fri, Sep 17, 2004 at 06:15:10PM -0700, Andrew Vasquez wrote:
> On Fri, 17 Sep 2004, James Bottomley wrote:
> > Just to confirm if we absolutely have to do this...the offending reads
> > to issue the posting flush were to the register you just wrote to to get
> > the chip to reset.  However, any MMIO read to any region of that card
> > would also trigger a posted write flush.  Does the chip drop entirely
> > off the PCI bus during the execution of reset, or could we perhaps issue
> > an innocuous read to somewhere in PCI configuration space for the card?
> 
> I had asked the hardware guy a similar question -- for the soft-reset
> operation, we'll _not_ be able to issue additional readw()s until '16
> PCI clocks elapse.'  So, it seems we'll have to settle with the
> udelay() in this particular instance.

But the write that starts the reset can be delayed arbitrarily, so we need
to do *some* kind of read from the device to be know that it got there.
Can we access config space instead of mmio space?

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-18  1:25     ` Matthew Wilcox
@ 2004-09-18  1:24       ` Andrew Vasquez
  2004-09-18  2:36       ` Jeremy Higdon
  2004-09-18 19:12       ` James Bottomley
  2 siblings, 0 replies; 78+ messages in thread
From: Andrew Vasquez @ 2004-09-18  1:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: James Bottomley, Jeremy Higdon, Jesse Barnes, Paul Jackson,
	SCSI Mailing List, mdr, jeremy, djh, jbarnes, Andrew Morton

On Fri, 2004-09-17 at 18:25, Matthew Wilcox wrote:
> On Fri, Sep 17, 2004 at 06:15:10PM -0700, Andrew Vasquez wrote:
> > On Fri, 17 Sep 2004, James Bottomley wrote:
> > > Just to confirm if we absolutely have to do this...the offending reads
> > > to issue the posting flush were to the register you just wrote to to get
> > > the chip to reset.  However, any MMIO read to any region of that card
> > > would also trigger a posted write flush.  Does the chip drop entirely
> > > off the PCI bus during the execution of reset, or could we perhaps issue
> > > an innocuous read to somewhere in PCI configuration space for the card?
> > 
> > I had asked the hardware guy a similar question -- for the soft-reset
> > operation, we'll _not_ be able to issue additional readw()s until '16
> > PCI clocks elapse.'  So, it seems we'll have to settle with the
> > udelay() in this particular instance.
> 
> But the write that starts the reset can be delayed arbitrarily, so we need
> to do *some* kind of read from the device to be know that it got there.
> Can we access config space instead of mmio space?

I'll have an answer to that question on Monday -- most, if not everyone
here has left for the weekend.

--
andrew


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-18  1:25     ` Matthew Wilcox
  2004-09-18  1:24       ` Andrew Vasquez
@ 2004-09-18  2:36       ` Jeremy Higdon
  2004-09-18 19:12       ` James Bottomley
  2 siblings, 0 replies; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-18  2:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Vasquez, James Bottomley, Jesse Barnes, Paul Jackson,
	SCSI Mailing List, mdr, jeremy, djh, jbarnes, Andrew Morton

On Sat, Sep 18, 2004 at 02:25:17AM +0100, Matthew Wilcox wrote:
> On Fri, Sep 17, 2004 at 06:15:10PM -0700, Andrew Vasquez wrote:
> > On Fri, 17 Sep 2004, James Bottomley wrote:
> > > Just to confirm if we absolutely have to do this...the offending reads
> > > to issue the posting flush were to the register you just wrote to to get
> > > the chip to reset.  However, any MMIO read to any region of that card
> > > would also trigger a posted write flush.  Does the chip drop entirely
> > > off the PCI bus during the execution of reset, or could we perhaps issue
> > > an innocuous read to somewhere in PCI configuration space for the card?
> > 
> > I had asked the hardware guy a similar question -- for the soft-reset
> > operation, we'll _not_ be able to issue additional readw()s until '16
> > PCI clocks elapse.'  So, it seems we'll have to settle with the
> > udelay() in this particular instance.
> 
> But the write that starts the reset can be delayed arbitrarily, so we need
> to do *some* kind of read from the device to be know that it got there.
> Can we access config space instead of mmio space?


This is where some sort of generic primitive for flushing posted writes
would be handy.  For any PCI implementation, there will be registers
unrelated to the chip in question that one should be able to read and
which flush the posted write.

On Altix, we have the sn_mmiob()  :-)

A pci_config_read_word() is not quite the same as a readw(), I don't
think, so maybe that would be okay?

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-18  1:25     ` Matthew Wilcox
  2004-09-18  1:24       ` Andrew Vasquez
  2004-09-18  2:36       ` Jeremy Higdon
@ 2004-09-18 19:12       ` James Bottomley
  2 siblings, 0 replies; 78+ messages in thread
From: James Bottomley @ 2004-09-18 19:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Vasquez, Jeremy Higdon, Jesse Barnes, Paul Jackson,
	SCSI Mailing List, mdr, jeremy, djh, jbarnes, Andrew Morton

On Fri, 2004-09-17 at 21:25, Matthew Wilcox wrote:
> But the write that starts the reset can be delayed arbitrarily, so we need
> to do *some* kind of read from the device to be know that it got there.
> Can we access config space instead of mmio space?

This was my worry as well.  I think if there's no way to flush the
posted write then you need to activate this reset by PIO which doesn't
suffer from posting.

James



^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: SCSI QLA not working on latest *-mm SN2
@ 2004-09-21 21:22 Andrew Vasquez
  2004-09-21 21:44 ` Jeremy Higdon
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Vasquez @ 2004-09-21 21:22 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Matthew Wilcox, James Bottomley, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 2:06 PM, Jeremy Higdon wrote:
> On Tue, Sep 21, 2004 at 01:50:02PM -0700, Andrew Vasquez wrote:
> > The only requirement after reception of a soft-reset request (by PIO
> > or MMIO) by the RISC is for the driver to wait 16 PCI clocks before
> > issuing another request.  The problem of course is determining when
> > to start timing within the driver.
> 
> So I think that we just wait for some reasonable worst case time for
> the write to complete.  We can't really do anything else.
>

That seems to be the case.
 
> Are these resets done as part of error recovery?  I.e., do we have
> to be concerned about long the write will take on a busy system?
> 

Just during initialization (qla2x00_chip_diag() and qla2x00_setup_chip())
and after a firmware dump (qla_dbg.c -- which requires a full 
re-initialization to recover).  If the driver is in the process of
performing a firmware-dump, the RISC has already been paused and nothing
is executing within the firmware.

--
Andrew

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 21:22 Andrew Vasquez
@ 2004-09-21 21:44 ` Jeremy Higdon
  2004-09-21 22:37   ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21 21:44 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: Matthew Wilcox, James Bottomley, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 02:22:48PM -0700, Andrew Vasquez wrote:
> On Tuesday, September 21, 2004 2:06 PM, Jeremy Higdon wrote:
> > On Tue, Sep 21, 2004 at 01:50:02PM -0700, Andrew Vasquez wrote:
> > > The only requirement after reception of a soft-reset request (by PIO
> > > or MMIO) by the RISC is for the driver to wait 16 PCI clocks before
> > > issuing another request.  The problem of course is determining when
> > > to start timing within the driver.
> > 
> > So I think that we just wait for some reasonable worst case time for
> > the write to complete.  We can't really do anything else.
> >
> 
> That seems to be the case.
>  
> > Are these resets done as part of error recovery?  I.e., do we have
> > to be concerned about long the write will take on a busy system?
> > 
> 
> Just during initialization (qla2x00_chip_diag() and qla2x00_setup_chip())
> and after a firmware dump (qla_dbg.c -- which requires a full 
> re-initialization to recover).  If the driver is in the process of
> performing a firmware-dump, the RISC has already been paused and nothing
> is executing within the firmware.
> 
> --
> Andrew

Well then 10us should be plenty of time.  Maybe 20 just to be extra sure.

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 21:44 ` Jeremy Higdon
@ 2004-09-21 22:37   ` Jesse Barnes
  2004-09-21 22:49     ` Jeremy Higdon
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 22:37 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Andrew Vasquez, Matthew Wilcox, James Bottomley, Grant Grundler,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 5:44 pm, Jeremy Higdon wrote:
> > Just during initialization (qla2x00_chip_diag() and qla2x00_setup_chip())
> > and after a firmware dump (qla_dbg.c -- which requires a full
> > re-initialization to recover).  If the driver is in the process of
> > performing a firmware-dump, the RISC has already been paused and nothing
> > is executing within the firmware.
>
> Well then 10us should be plenty of time.  Maybe 20 just to be extra sure.

Don't forget the case of a driver that's insmod'd into a running, busy system.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 22:37   ` Jesse Barnes
@ 2004-09-21 22:49     ` Jeremy Higdon
  0 siblings, 0 replies; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21 22:49 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andrew Vasquez, Matthew Wilcox, James Bottomley, Grant Grundler,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 06:37:46PM -0400, Jesse Barnes wrote:
> On Tuesday, September 21, 2004 5:44 pm, Jeremy Higdon wrote:
> > > Just during initialization (qla2x00_chip_diag() and qla2x00_setup_chip())
> > > and after a firmware dump (qla_dbg.c -- which requires a full
> > > re-initialization to recover).  If the driver is in the process of
> > > performing a firmware-dump, the RISC has already been paused and nothing
> > > is executing within the firmware.
> >
> > Well then 10us should be plenty of time.  Maybe 20 just to be extra sure.
> 
> Don't forget the case of a driver that's insmod'd into a running, busy system.

Hopefully we don't have anyone insmodding qla2xxx on a 512p system
while it's running a numa stress test.  If so, they'll learn not
to.  :-)

It would be nice the have a solution for that case, but I don't
see it.  The only way to ensure the write is complete is to read
afterward, but the read afterward can bring the system down, so
we have to do without it and add some padding.  I think that's
the conclusion we're narrowing down to.

We can narrow the window with an sn_mmiob() on Altix.  I'm don't
think we want to clutter the driver with a platform-specific call
that is really intended just to imply ordering and not completion.
It just so happens that in order to guarantee ordering, the mmio
write will be partly completed.

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: SCSI QLA not working on latest *-mm SN2
@ 2004-09-21 20:50 Andrew Vasquez
  2004-09-21 21:06 ` Jeremy Higdon
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Vasquez @ 2004-09-21 20:50 UTC (permalink / raw)
  To: Jeremy Higdon, Matthew Wilcox
  Cc: James Bottomley, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton


On Tuesday, September 21, 2004 1:44 PM, Jeremy Higdon wrote:
> On Tue, Sep 21, 2004 at 05:25:35PM +0100, Matthew Wilcox wrote:
> > On Tue, Sep 21, 2004 at 08:58:23AM -0700, Andrew Vasquez wrote:
> > > From what I can gather from the hw engineers, the config-read
> > > will not guarantee a flush of posted writes.
> > 
> > I believe your hardware engineers to be mistaken.  See PCI 2.2,
> > Appendix E, section E.2: 
> > 
> > 2. Memory writes can be posted in both directions in a bridge. I/O
> > and Configuration writes are not posted. (I/O writes can be posted
> > in the Host Bridge, but some restrictions apply.) Read
> > transactions (Memory, I/O, or Configuration) are not posted.
> > 
> > 5. A read transaction must push ahead of it through the bridge any
> > posted writes originating on the same side of the bridge and
> > posted before the read. Before the read transaction can complete
> > on its originating bus, it must pull out of the bridge any posted
> > writes that originated on the opposite side and were posted before
> > the read command completes on the read-destination bus. 
> 
> 
> I would agree.  A config read should retire any posted I/O writes,
> whether Port or Memory Mapped.
> 
> So, unless the qla2xxx chips also do not respond to config reads
> after reset, the config read should be the answer . . .
> 

Yes, please see my earlier reply to Matthew:

	Hmm...adding more confusion to the mix.  I apologize -- my
	reply was not written correctly, yes, the config-read will
	flush any pending writes.  But, the same problem persists 
	- the RISC will still stop responding to requests (config
	or MMIO) during the soft-reset -- potentially resulting in
	a MAC (as seen by SGI).

> . . . unless there is some sort of random delay within the chip
> itself between a completion of the IO write on the PCI bus and the
> chip resetting itself.  That's not a problem, is it, Andrew?
>

The only requirement after reception of a soft-reset request (by PIO
or MMIO) by the RISC is for the driver to wait 16 PCI clocks before
issuing another request.  The problem of course is determining when to
start timing within the driver.

Regards,
Andrew Vasquez

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 20:50 Andrew Vasquez
@ 2004-09-21 21:06 ` Jeremy Higdon
  2004-09-21 22:36   ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21 21:06 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: Matthew Wilcox, James Bottomley, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 01:50:02PM -0700, Andrew Vasquez wrote:
> 
> Yes, please see my earlier reply to Matthew:
> 
> 	Hmm...adding more confusion to the mix.  I apologize -- my
> 	reply was not written correctly, yes, the config-read will
> 	flush any pending writes.  But, the same problem persists 
> 	- the RISC will still stop responding to requests (config
> 	or MMIO) during the soft-reset -- potentially resulting in
> 	a MAC (as seen by SGI).

Sorry, I should have read all 30 or so messages before replying to
any  :-)

> > . . . unless there is some sort of random delay within the chip
> > itself between a completion of the IO write on the PCI bus and the
> > chip resetting itself.  That's not a problem, is it, Andrew?
> >
> 
> The only requirement after reception of a soft-reset request (by PIO
> or MMIO) by the RISC is for the driver to wait 16 PCI clocks before
> issuing another request.  The problem of course is determining when to
> start timing within the driver.

So I think that we just wait for some reasonable worst case time for
the write to complete.  We can't really do anything else.

Are these resets done as part of error recovery?  I.e., do we have
to be concerned about long the write will take on a busy system?

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 21:06 ` Jeremy Higdon
@ 2004-09-21 22:36   ` Jesse Barnes
  2004-09-21 22:39     ` Jeremy Higdon
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 22:36 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Andrew Vasquez, Matthew Wilcox, James Bottomley, Grant Grundler,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 5:06 pm, Jeremy Higdon wrote:
> > The only requirement after reception of a soft-reset request (by PIO
> > or MMIO) by the RISC is for the driver to wait 16 PCI clocks before
> > issuing another request.  The problem of course is determining when to
> > start timing within the driver.
>
> So I think that we just wait for some reasonable worst case time for
> the write to complete.  We can't really do anything else.

Shouldn't we do a config space read and *then* start the delay, which will 
only delay for as long as it takes to reset the card on a 33 MHz bus?

> Are these resets done as part of error recovery?  I.e., do we have
> to be concerned about long the write will take on a busy system?

If reading from config space isn't sufficient, then we'd be in trouble in that 
case.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 22:36   ` Jesse Barnes
@ 2004-09-21 22:39     ` Jeremy Higdon
  2004-09-21 22:43       ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21 22:39 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andrew Vasquez, Matthew Wilcox, James Bottomley, Grant Grundler,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 06:36:36PM -0400, Jesse Barnes wrote:

> > > The only requirement after reception of a soft-reset request (by PIO
> > > or MMIO) by the RISC is for the driver to wait 16 PCI clocks before
> > > issuing another request.  The problem of course is determining when to
> > > start timing within the driver.
> >
> > So I think that we just wait for some reasonable worst case time for
> > the write to complete.  We can't really do anything else.
> 
> Shouldn't we do a config space read and *then* start the delay, which will 
> only delay for as long as it takes to reset the card on a 33 MHz bus?

The config space read has the same problem that the mmio space read does.
The chip does not respond.

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 22:39     ` Jeremy Higdon
@ 2004-09-21 22:43       ` Jesse Barnes
  2004-09-21 22:54         ` Jeremy Higdon
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 22:43 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Andrew Vasquez, Matthew Wilcox, James Bottomley, Grant Grundler,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 6:39 pm, Jeremy Higdon wrote:
> On Tue, Sep 21, 2004 at 06:36:36PM -0400, Jesse Barnes wrote:
> > > > The only requirement after reception of a soft-reset request (by PIO
> > > > or MMIO) by the RISC is for the driver to wait 16 PCI clocks before
> > > > issuing another request.  The problem of course is determining when
> > > > to start timing within the driver.
> > >
> > > So I think that we just wait for some reasonable worst case time for
> > > the write to complete.  We can't really do anything else.
> >
> > Shouldn't we do a config space read and *then* start the delay, which
> > will only delay for as long as it takes to reset the card on a 33 MHz
> > bus?
>
> The config space read has the same problem that the mmio space read does.
> The chip does not respond.

Right, but config space reads are supposed to soft fail no matter what.  So 
we'll get back all ones, but we'll also know that the write has been received 
by the device.  We can start the delay regardless of the value the read 
returns.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 22:43       ` Jesse Barnes
@ 2004-09-21 22:54         ` Jeremy Higdon
  2004-09-21 23:17           ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21 22:54 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andrew Vasquez, Matthew Wilcox, James Bottomley, Grant Grundler,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 06:43:49PM -0400, Jesse Barnes wrote:
> On Tuesday, September 21, 2004 6:39 pm, Jeremy Higdon wrote:
> > On Tue, Sep 21, 2004 at 06:36:36PM -0400, Jesse Barnes wrote:
> > >
> > > Shouldn't we do a config space read and *then* start the delay, which
> > > will only delay for as long as it takes to reset the card on a 33 MHz
> > > bus?
> >
> > The config space read has the same problem that the mmio space read does.
> > The chip does not respond.
> 
> Right, but config space reads are supposed to soft fail no matter what.  So 
> we'll get back all ones, but we'll also know that the write has been received 
> by the device.  We can start the delay regardless of the value the read 
> returns.


I thought you said that config space read failures generate an MCA.
Or are you going to fix that?

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 22:54         ` Jeremy Higdon
@ 2004-09-21 23:17           ` Jesse Barnes
  2004-09-22 21:33             ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 23:17 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Andrew Vasquez, Matthew Wilcox, James Bottomley, Grant Grundler,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 6:54 pm, Jeremy Higdon wrote:
> On Tue, Sep 21, 2004 at 06:43:49PM -0400, Jesse Barnes wrote:
> > On Tuesday, September 21, 2004 6:39 pm, Jeremy Higdon wrote:
> > > On Tue, Sep 21, 2004 at 06:36:36PM -0400, Jesse Barnes wrote:
> > > > Shouldn't we do a config space read and *then* start the delay, which
> > > > will only delay for as long as it takes to reset the card on a 33 MHz
> > > > bus?
> > >
> > > The config space read has the same problem that the mmio space read
> > > does. The chip does not respond.
> >
> > Right, but config space reads are supposed to soft fail no matter what. 
> > So we'll get back all ones, but we'll also know that the write has been
> > received by the device.  We can start the delay regardless of the value
> > the read returns.
>
> I thought you said that config space read failures generate an MCA.
> Or are you going to fix that?

That's the plan.  I'll come up with a patch.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 23:17           ` Jesse Barnes
@ 2004-09-22 21:33             ` Jesse Barnes
  0 siblings, 0 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-22 21:33 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Andrew Vasquez, Matthew Wilcox, James Bottomley, Grant Grundler,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 7:17 pm, Jesse Barnes wrote:
> On Tuesday, September 21, 2004 6:54 pm, Jeremy Higdon wrote:
> > On Tue, Sep 21, 2004 at 06:43:49PM -0400, Jesse Barnes wrote:
> > > On Tuesday, September 21, 2004 6:39 pm, Jeremy Higdon wrote:
> > > > On Tue, Sep 21, 2004 at 06:36:36PM -0400, Jesse Barnes wrote:
> > > > > Shouldn't we do a config space read and *then* start the delay,
> > > > > which will only delay for as long as it takes to reset the card on
> > > > > a 33 MHz bus?
> > > >
> > > > The config space read has the same problem that the mmio space read
> > > > does. The chip does not respond.
> > >
> > > Right, but config space reads are supposed to soft fail no matter what.
> > > So we'll get back all ones, but we'll also know that the write has been
> > > received by the device.  We can start the delay regardless of the value
> > > the read returns.
> >
> > I thought you said that config space read failures generate an MCA.
> > Or are you going to fix that?
>
> That's the plan.  I'll come up with a patch.

On second thought Andrew (Vasquez), can not add config space reads quite yet 
(just stick with generous udelay() calls)?  It'll probably take me a couple 
of days to put together a patch that fixes config space reads on sn2, and I'd 
rather not have qla2xxx cause MCAs in the meantime.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: SCSI QLA not working on latest *-mm SN2
@ 2004-09-21 17:33 Andrew Vasquez
  2004-09-21 17:52 ` Jesse Barnes
                   ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Andrew Vasquez @ 2004-09-21 17:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: James Bottomley, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton


On , willy@www.linux.org.uk wrote:
> 
> On Tue, Sep 21, 2004 at 08:58:23AM -0700, Andrew Vasquez wrote:
> > From what I can gather from the hw engineers, the config-read will
> > not guarantee a flush of posted writes.
> 
> I believe your hardware engineers to be mistaken.  See PCI 2.2,
> Appendix E, section E.2:
> 
> 2. Memory writes can be posted in both directions in a bridge. I/O
> and Configuration writes are not posted. (I/O writes can be posted
> in the Host Bridge, but some restrictions apply.) Read transactions
> (Memory, I/O, or Configuration) are not posted.
> 
> 5. A read transaction must push ahead of it through the bridge any
> posted writes originating on the same side of the bridge and posted
> before the read. Before the read transaction can complete on its
> originating bus, it must pull out of the bridge any posted writes
> that originated on the opposite side and were posted before the read
> command completes on the read-destination bus.
> 

Hmm...adding more confusion to the mix.  I apologize -- my reply was
not written correctly, yes, the config-read will flush any pending
writes.  But, the same problem persists -- the RISC will still stop
responding to requests (config or MMIO) during the soft-reset -- 
potentially resulting in a MAC (as seen by SGI).

The 'safe' solution (as suggested by the hw people) was to use PIO to
issue the soft-reset, then udelay().  

--
Andrew 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 17:33 Andrew Vasquez
@ 2004-09-21 17:52 ` Jesse Barnes
  2004-09-21 18:04 ` Matthew Wilcox
  2004-09-21 18:59 ` Matthew Wilcox
  2 siblings, 0 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 17:52 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: Matthew Wilcox, James Bottomley, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 1:33 pm, Andrew Vasquez wrote:
> Hmm...adding more confusion to the mix.  I apologize -- my reply was
> not written correctly, yes, the config-read will flush any pending
> writes.  But, the same problem persists -- the RISC will still stop
> responding to requests (config or MMIO) during the soft-reset --
> potentially resulting in a MAC (as seen by SGI).

Thanks for clarifying, I suspected that was the case.

> The 'safe' solution (as suggested by the hw people) was to use PIO to
> issue the soft-reset, then udelay().

But unfortunately, that doesn't get us 100% there, since even PIO space writes 
can be posted in some circumstances.  Any other suggestions?

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 17:33 Andrew Vasquez
  2004-09-21 17:52 ` Jesse Barnes
@ 2004-09-21 18:04 ` Matthew Wilcox
  2004-09-21 18:59 ` Matthew Wilcox
  2 siblings, 0 replies; 78+ messages in thread
From: Matthew Wilcox @ 2004-09-21 18:04 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: Matthew Wilcox, James Bottomley, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 10:33:36AM -0700, Andrew Vasquez wrote:
> Hmm...adding more confusion to the mix.  I apologize -- my reply was
> not written correctly, yes, the config-read will flush any pending
> writes.  But, the same problem persists -- the RISC will still stop
> responding to requests (config or MMIO) during the soft-reset -- 
> potentially resulting in a MAC (as seen by SGI).
> 
> The 'safe' solution (as suggested by the hw people) was to use PIO to
> issue the soft-reset, then udelay().  

Even that's not safe ;-(
This snippet is from pci 2.3 (section 3.2.5.2) but there's substantially
similar wording in pci 2.2:

  Host bus bridges are permitted to post I/O write transactions that
  originate on the host bus and complete on a PCI bus segment when they
  follow the ordering rules described in this specification and do not
  cause a deadlock. This means that when a host bus bridge posts an I/O
  write transaction that originated on the host bus, it must provide a
  deadlock free environment when the transaction completes on PCI. The
  transaction will complete on the destination PCI bus before completing
  on the originating PCI bus.

  Since memory write transactions may be posted in bridges anywhere
  in the system, and I/O writes may be posted in the host bus bridge,
  a master cannot automatically tell when its write transaction completes
  at the final destination. For a device driver to guarantee that a write
  has completed at the actual target (and not at an intermediate bridge),
  it must complete a read to the same device that the write targeted. The
  read (memory or I/O) forces all bridges between the originating master
  and the actual target to flush all posted data before allowing the
  read to complete. For additional details on device drivers, refer to
  Section 6.5. Refer to Section 3.10., item 6, for other cases where a
  read is necessary.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 17:33 Andrew Vasquez
  2004-09-21 17:52 ` Jesse Barnes
  2004-09-21 18:04 ` Matthew Wilcox
@ 2004-09-21 18:59 ` Matthew Wilcox
  2004-09-21 19:10   ` Jesse Barnes
  2 siblings, 1 reply; 78+ messages in thread
From: Matthew Wilcox @ 2004-09-21 18:59 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: Matthew Wilcox, James Bottomley, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 10:33:36AM -0700, Andrew Vasquez wrote:
> Hmm...adding more confusion to the mix.  I apologize -- my reply was
> not written correctly, yes, the config-read will flush any pending
> writes.  But, the same problem persists -- the RISC will still stop
> responding to requests (config or MMIO) during the soft-reset -- 
> potentially resulting in a MAC (as seen by SGI).

Aha!  SGI's wacky PCI controllers must be able to cope with a PCI
config space fail, otherwise they wouldn't be able to do a PCI bus walk.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 18:59 ` Matthew Wilcox
@ 2004-09-21 19:10   ` Jesse Barnes
  0 siblings, 0 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 19:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Vasquez, James Bottomley, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 2:59 pm, Matthew Wilcox wrote:
> On Tue, Sep 21, 2004 at 10:33:36AM -0700, Andrew Vasquez wrote:
> > Hmm...adding more confusion to the mix.  I apologize -- my reply was
> > not written correctly, yes, the config-read will flush any pending
> > writes.  But, the same problem persists -- the RISC will still stop
> > responding to requests (config or MMIO) during the soft-reset --
> > potentially resulting in a MAC (as seen by SGI).
>
> Aha!  SGI's wacky PCI controllers must be able to cope with a PCI
> config space fail, otherwise they wouldn't be able to do a PCI bus walk.

Ok, you got me!  Yes, we need to fail PCI config space read failures, but we 
don't at the moment.  Our bridge chips generate MCAs when this occurs, so we 
have to recover from them somehow.  Current PCI discovery code does this by 
making a PROM call for PCI config space access.  I expect our I/O code 
rewrite will use the standard SAL config space access routines, which should 
make this easy.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: SCSI QLA not working on latest *-mm SN2
@ 2004-09-21 15:58 Andrew Vasquez
  2004-09-21 16:07 ` Jesse Barnes
  2004-09-21 16:25 ` Matthew Wilcox
  0 siblings, 2 replies; 78+ messages in thread
From: Andrew Vasquez @ 2004-09-21 15:58 UTC (permalink / raw)
  To: James Bottomley, Jesse Barnes
  Cc: Grant Grundler, pj, SCSI Mailing List, mdr, jeremy, djh,
	Andrew Morton

On Tuesday, September 21, 2004 8:42 AM, James Bottomley wrote:
> On Tue, 2004-09-21 at 11:13, Jesse Barnes wrote:
> > On Tuesday, September 21, 2004 1:46 am, Grant Grundler wrote:
> > > No it doesn't. Only if it depends on *when* the write hits the
> > >  device. The classic example is: writel(x, CMD_RESET);
> > >  udelay(10);
> > >  readl(x+STATUS); /* parisc will crash if not ready */
> > 
> > Ok, hopefully I've covered this in this release (patch to
> > deviceiobook will come later). 
> > 
> > The short of it is that we really need pioflush.  I'll resurrect my
> > mmiob patches, change the name and prototype, and resubmit.
> 
> Just to get back to the actual qla2xxx problem.
> 
> Do we agree that there are two possible solutions:
> 
> 1) Find a safe mmio read to trigger the flush of the posted
> reset write.
> 
> 2) Use a pio write to trigger the reset because we know the reset is
> active as soon as the write returns.
> 
> and if so, which one are we going to implement?
> 

>From what I can gather from the hw engineers, the config-read will not
guarantee a flush of posted writes.  And since there is 'no safe register'
accessible during the soft-reset, it seems that the pio-write issuance of 
the request to the RISC will be needed.  The issue of post-soft-reset delay
still persists -- so it would seem the udelay() are still needed.

BTW: Side comment, the 'failover-capable' driver which the 'vanilla-kernel'
driver is derived from has some special RD/WRT_REG_WORD macros defined which
perform explicit PIO operations:

	#define RD_REG_WORD_PIO(addr)		(inw((unsigned long)addr))
	#define WRT_REG_WORD_PIO(addr, data)	(outw(data,(unsigned long)addr))

that are used to access flash/gpio registers of an ISP2312v2 chip due to
some hardware problems.  I can post a patch for the soft-reset issue using 
these macros?  Which tree shall I use as a base, several patches have been
floating around which 'fix' the issue on SN2 paltforms, but also remove
what has been called 'excessive', or 'unnecessary' readw()s?

--
AV

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 15:58 Andrew Vasquez
@ 2004-09-21 16:07 ` Jesse Barnes
  2004-09-21 16:25 ` Matthew Wilcox
  1 sibling, 0 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 16:07 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: James Bottomley, Grant Grundler, pj, SCSI Mailing List, mdr,
	jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 11:58 am, Andrew Vasquez wrote:
>  #define WRT_REG_WORD_PIO(addr, data) (outw(data,(unsigned long)addr))
>
> that are used to access flash/gpio registers of an ISP2312v2 chip due to
> some hardware problems.  I can post a patch for the soft-reset issue using
> these macros?  Which tree shall I use as a base, several patches have been
> floating around which 'fix' the issue on SN2 paltforms, but also remove
> what has been called 'excessive', or 'unnecessary' readw()s?

I think changing the reset writes into PIO writes using the above macro, along 
with a udelay to wait for the card, is as close as we're going to get to 
solving this problem.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 15:58 Andrew Vasquez
  2004-09-21 16:07 ` Jesse Barnes
@ 2004-09-21 16:25 ` Matthew Wilcox
  2004-09-21 16:33   ` James Bottomley
  2004-09-21 20:43   ` Jeremy Higdon
  1 sibling, 2 replies; 78+ messages in thread
From: Matthew Wilcox @ 2004-09-21 16:25 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: James Bottomley, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 08:58:23AM -0700, Andrew Vasquez wrote:
> From what I can gather from the hw engineers, the config-read will not
> guarantee a flush of posted writes.

I believe your hardware engineers to be mistaken.  See PCI 2.2, Appendix
E, section E.2:

2. Memory writes can be posted in both directions in a bridge. I/O and
   Configuration writes are not posted. (I/O writes can be posted in the
   Host Bridge, but some restrictions apply.) Read transactions (Memory,
   I/O, or Configuration) are not posted.

5. A read transaction must push ahead of it through the bridge any posted
   writes originating on the same side of the bridge and posted before the
   read. Before the read transaction can complete on its originating bus,
   it must pull out of the bridge any posted writes that originated on
   the opposite side and were posted before the read command completes
   on the read-destination bus.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 16:25 ` Matthew Wilcox
@ 2004-09-21 16:33   ` James Bottomley
  2004-09-21 20:39     ` Jeremy Higdon
  2004-09-21 20:43   ` Jeremy Higdon
  1 sibling, 1 reply; 78+ messages in thread
From: James Bottomley @ 2004-09-21 16:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Vasquez, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, 2004-09-21 at 12:25, Matthew Wilcox wrote:
>    Configuration writes are not posted. (I/O writes can be posted in the
>    Host Bridge, but some restrictions apply.) Read transactions (Memory,
>    I/O, or Configuration) are not posted.

Erk, depending on what "some restrictions apply" means, that blows a
hole in the idea of using PIO writes to guarantee no posting.

James



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 16:33   ` James Bottomley
@ 2004-09-21 20:39     ` Jeremy Higdon
  0 siblings, 0 replies; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21 20:39 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Andrew Vasquez, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 12:33:08PM -0400, James Bottomley wrote:
> On Tue, 2004-09-21 at 12:25, Matthew Wilcox wrote:
> >    Configuration writes are not posted. (I/O writes can be posted in the
> >    Host Bridge, but some restrictions apply.) Read transactions (Memory,
> >    I/O, or Configuration) are not posted.
> 
> Erk, depending on what "some restrictions apply" means, that blows a
> hole in the idea of using PIO writes to guarantee no posting.
> 
> James

I don't think that P(ort)IO writes are the answer, either.

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 16:25 ` Matthew Wilcox
  2004-09-21 16:33   ` James Bottomley
@ 2004-09-21 20:43   ` Jeremy Higdon
  1 sibling, 0 replies; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21 20:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Vasquez, James Bottomley, Jesse Barnes, Grant Grundler, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 05:25:35PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 21, 2004 at 08:58:23AM -0700, Andrew Vasquez wrote:
> > From what I can gather from the hw engineers, the config-read will not
> > guarantee a flush of posted writes.
> 
> I believe your hardware engineers to be mistaken.  See PCI 2.2, Appendix
> E, section E.2:
> 
> 2. Memory writes can be posted in both directions in a bridge. I/O and
>    Configuration writes are not posted. (I/O writes can be posted in the
>    Host Bridge, but some restrictions apply.) Read transactions (Memory,
>    I/O, or Configuration) are not posted.
> 
> 5. A read transaction must push ahead of it through the bridge any posted
>    writes originating on the same side of the bridge and posted before the
>    read. Before the read transaction can complete on its originating bus,
>    it must pull out of the bridge any posted writes that originated on
>    the opposite side and were posted before the read command completes
>    on the read-destination bus.


I would agree.  A config read should retire any posted I/O writes, whether
Port or Memory Mapped.

So, unless the qla2xxx chips also do not respond to config reads after
reset, the config read should be the answer . . .

. . . unless there is some sort of random delay within the chip itself
between a completion of the IO write on the PCI bus and the chip resetting
itself.  That's not a problem, is it, Andrew?

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

[parent not found: <B179AE41C1147041AA1121F44614F0B060EF48@AVEXCH02.qlogic.org>]

[parent not found: <20040916121235.5e4f9c32.pj@sgi.com>]

[parent not found: <1095362263.16326.12.camel@praka>]

* Re: SCSI QLA not working on latest *-mm SN2
       [not found]   ` <1095362263.16326.12.camel@praka>
@ 2004-09-16 19:56     ` Paul Jackson
  2004-09-16 20:05       ` Jesse Barnes
  2004-09-16 20:11       ` Andrew Morton
  0 siblings, 2 replies; 78+ messages in thread
From: Paul Jackson @ 2004-09-16 19:56 UTC (permalink / raw)
  To: Andrew Vasquez, linux-scsi; +Cc: mdr, jeremy, djh, jbarnes, Andrew Morton

Andrew Vasquez has been looking at this, via private email with just
me (no progress yet).  Figured I update the larger list with this much ...

I am still seeing the SN2 SCSI QLA failure as I reported yesterday, but
now against 2.6.9-rc2-mm1.

For the benefit of those with limited memories, such as myself, I will
repeat the symptoms from the top.

Those with good memories can probably stop reading here - nothing much
new to report other than retesting on 2.6.9-rc2-mm1.

===

This is now testing against 2.6.9-rc2-mm1, plus a one line workaround
for some ACPI problem, from Jesse Barnes.  The symptoms resemble what I
first saw of this failure on 2.6.9-rc1-mm5.  I did not see any such
problem on 2.6.9-rc1-mm4.

I have to make the following change to sn2_defconfig in order to boot:

< CONFIG_SCSI_QLA22XX=y
< CONFIG_SCSI_QLA2300=y
< CONFIG_SCSI_QLA2322=y
---
> # CONFIG_SCSI_QLA22XX is not set
> # CONFIG_SCSI_QLA2300 is not set
> # CONFIG_SCSI_QLA2322 is not set

Once I comment out these QLA lines, I can boot.

Otherwise the boot fails, ending with the following lines of output. 
There are a couple of 5 or 10 second pauses somewhere in the last dozen
lines of this output.

=======================================================

VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 2048 (order 0, 16384 bytes)
SGI XFS with ACLs, realtime, large block/inode numbers, no debug enabled
SGI XFS Quota Management subsystem
Initializing Cryptographic API
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
IA-PC Multimedia Timer: v1.0, 25 MHz
EFI Time Services Driver v0.4
sn_console: Console driver init
ttySG0 at I/O 0x0 (irq = 0) is a SGI SN L1
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
tg3.c:v3.9 (August 30, 2004)
eth0: Tigon3 [partno(030-1771-000) rev 0105 PHY(5701)] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 08:00:69:13:dc:f6
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[0] 
netconsole: not configured, aborting
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SGIIOC4: IDE controller at PCI slot 0000:01:01.0, revision 79
ide0: BM-DMA at 0xc00002080c200140-0xc00002080c200163
hda: MATSHITADVD-ROM SR-8588, ATAPI CD/DVD-ROM drive
Using anticipatory io scheduler
ide0 at 0xc00002080c200100-0xc00002080c200107,0xc00002080c200120 on irq 55
ide1: I/O resource 0x376-0x376 not free.
ide1: ports already in use, skipping probe
ide2: I/O resource 0x3EE-0x3EE not free.
ide2: ports already in use, skipping probe
ide3: I/O resource 0x36E-0x36E not free.
ide3: ports already in use, skipping probe
ide4: I/O resource 0x3E6-0x3E6 not free.
ide4: ports already in use, skipping probe
ide5: I/O resource 0x366-0x366 not free.
ide5: ports already in use, skipping probe
hda: ATAPI 48X DVD-ROM drive, 256kB Cache, UDMA(16)
Uniform CD-ROM driver Revision: 3.20
qla1280: QLA12160 found on PCI bus 1, dev 3
scsi(0): Enabling SN2 PCI DMA dual channel lockup workaround
scsi(0): Enabling SN2 PCI DMA workaround
scsi(0:0): Resetting SCSI BUS
scsi(0:1): Resetting SCSI BUS
scsi0 : QLogic QLA12160 PCI to SCSI Host Adapter
       Firmware version: 10.04.32, Driver version 3.24.4
  Vendor: SGI       Model: ST336753LC        Rev: 2741
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(0:0:1:0): Sync: period 9, offset 14, Wide, DT, Tagged queuing: depth 255
  Vendor: SGI       Model: ST336753LC        Rev: 2741
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(0:0:2:0): Sync: period 9, offset 14, Wide, DT, Tagged queuing: depth 255
QLogic Fibre Channel HBA Driver (a0000001007b29c0)
qla2200 0000:03:01.0: Found an ISP2200, irq 58, iobase 0xc00002080f400000
qla2200 0000:03:01.0: Configuring PCI space...
PCI: slot 0000:03:01.0 has incorrect PCI cache line size of 0 bytes, correcting to 128
POD entered via OS requested halt, using Cac mode
2 008: POD SysCt Cac> 

=======================================================

By way of comparison, the lines just before and after the above
point of death, on a successful boot w/o the QLA2 config, looks like:

=======================================================

scsi0 : QLogic QLA12160 PCI to SCSI Host Adapter
       Firmware version: 10.04.32, Driver version 3.24.4
  Vendor: SGI       Model: ST336753LC        Rev: 2741
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(0:0:1:0): Sync: period 9, offset 14, Wide, DT, Tagged queuing: depth 255
  Vendor: SGI       Model: ST336753LC        Rev: 2741
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(0:0:2:0): Sync: period 9, offset 14, Wide, DT, Tagged queuing: depth 255
SCSI device sda: 71687372 512-byte hdwr sectors (36704 MB)
SCSI device sda: drive cache: write through
 sda: sda1 sda2 sda3
Attached scsi disk sda at scsi0, channel 0, id 1, lun 0
SCSI device sdb: 71687372 512-byte hdwr sectors (36704 MB)
SCSI device sdb: drive cache: write through
 sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7 sdb8
Attached scsi disk sdb at scsi0, channel 0, id 2, lun 0
Fusion MPT base driver 3.01.16
Copyright (c) 1999-2004 LSI Logic Corporation
Fusion MPT SCSI Host driver 3.01.16
mice: PS/2 mouse device common for all mice

=======================================================

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-16 19:56     ` Paul Jackson
@ 2004-09-16 20:05       ` Jesse Barnes
  2004-09-16 20:56         ` Andrew Vasquez
  2004-09-16 20:11       ` Andrew Morton
  1 sibling, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-16 20:05 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Andrew Vasquez, linux-scsi, mdr, jeremy, djh, jbarnes,
	Andrew Morton

On Thursday, September 16, 2004 12:56 pm, Paul Jackson wrote:
> Andrew Vasquez has been looking at this, via private email with just
> me (no progress yet).  Figured I update the larger list with this much ...

It seems to be failing on one of the accesses to PCI_COMMAND in config space 
in qla2x00_reset_chip().  I'm checking now to see if we're accessing the card 
right after a reset but before the card has finished.  That would cause a 
master abort, the symptom I'm seeing at least.

> I am still seeing the SN2 SCSI QLA failure as I reported yesterday, but
> now against 2.6.9-rc2-mm1.

Me too.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-16 20:05       ` Jesse Barnes
@ 2004-09-16 20:56         ` Andrew Vasquez
  2004-09-16 21:09           ` Jesse Barnes
  2004-09-16 23:14           ` Jeremy Higdon
  0 siblings, 2 replies; 78+ messages in thread
From: Andrew Vasquez @ 2004-09-16 20:56 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Paul Jackson, linux-scsi, mdr, jeremy, djh, jbarnes,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 669 bytes --]

On Thu, 2004-09-16 at 13:05, Jesse Barnes wrote:
> On Thursday, September 16, 2004 12:56 pm, Paul Jackson wrote:
> > Andrew Vasquez has been looking at this, via private email with just
> > me (no progress yet).  Figured I update the larger list with this much ...
> 
> It seems to be failing on one of the accesses to PCI_COMMAND in config space 
> in qla2x00_reset_chip().  I'm checking now to see if we're accessing the card 
> right after a reset but before the card has finished.  That would cause a 
> master abort, the symptom I'm seeing at least.
> 

Interesting, the only changes in reset_chip() are for PCI posting
issues.  Relevant diff attached.

--
Andrew

[-- Attachment #2: posting.diff --]
[-- Type: text/x-patch, Size: 4177 bytes --]

diff -Nurdp -X dontdiff 80000b14/qla_init.c 80000b21/qla_init.c
--- 80000b14/qla_init.c	2004-06-23 17:12:33.000000000 -0700
+++ 80000b21/qla_init.c	2004-09-02 13:11:35.000000000 -0700
@@ -315,6 +317,7 @@ qla2x00_pci_config(scsi_qla_host_t *ha)
 
 			/* Select FPM registers. */
 			WRT_REG_WORD(&ha->iobase->ctrl_status, 0x20);
+			RD_REG_WORD(&ha->iobase->ctrl_status);
 
 			/* Get the fb rev level */
 			ha->fb_rev = RD_FB_CMD_REG(ha, ha->iobase);
@@ -324,6 +327,7 @@ qla2x00_pci_config(scsi_qla_host_t *ha)
 
 			/* Deselect FPM registers. */
 			WRT_REG_WORD(&ha->iobase->ctrl_status, 0x0);
+			RD_REG_WORD(&ha->iobase->ctrl_status);
 
 			/* Release RISC module. */
 			WRT_REG_WORD(&ha->iobase->hccr, HCCR_RELEASE_RISC);
@@ -417,25 +421,32 @@ qla2x00_reset_chip(scsi_qla_host_t *ha) 
 				udelay(100);
 			}
 		} else {
+			RD_REG_WORD(&reg->hccr);	/* PCI Posting. */
 			udelay(10);
 		}
 
 		/* Select FPM registers. */
 		WRT_REG_WORD(&reg->ctrl_status, 0x20);
+		RD_REG_WORD(&reg->ctrl_status);		/* PCI Posting. */
 
 		/* FPM Soft Reset. */
 		WRT_REG_WORD(&reg->fpm_diag_config, 0x100);
+		RD_REG_WORD(&reg->fpm_diag_config);	/* PCI Posting. */
 
 		/* Toggle Fpm Reset. */
-		if (!IS_QLA2200(ha))
+		if (!IS_QLA2200(ha)) {
 			WRT_REG_WORD(&reg->fpm_diag_config, 0x0);
+			RD_REG_WORD(&reg->fpm_diag_config); /* PCI Posting. */
+		}
 
 		/* Select frame buffer registers. */
 		WRT_REG_WORD(&reg->ctrl_status, 0x10);
+		RD_REG_WORD(&reg->ctrl_status);		/* PCI Posting. */
 
 		/* Reset frame buffer FIFOs. */
 		if (IS_QLA2200(ha)) {
 			WRT_FB_CMD_REG(ha, reg, 0xa000);
+			RD_FB_CMD_REG(ha, reg);		/* PCI Posting. */
 		} else {
 			WRT_FB_CMD_REG(ha, reg, 0x00fc);
 
@@ -449,19 +460,25 @@ qla2x00_reset_chip(scsi_qla_host_t *ha) 
 
 		/* Select RISC module registers. */
 		WRT_REG_WORD(&reg->ctrl_status, 0);
+		RD_REG_WORD(&reg->ctrl_status);		/* PCI Posting. */
 
 		/* Reset RISC processor. */
 		WRT_REG_WORD(&reg->hccr, HCCR_RESET_RISC);
+		RD_REG_WORD(&reg->hccr);		/* PCI Posting. */
 
 		/* Release RISC processor. */
 		WRT_REG_WORD(&reg->hccr, HCCR_RELEASE_RISC);
+		RD_REG_WORD(&reg->hccr);		/* PCI Posting. */
 	}
 
 	WRT_REG_WORD(&reg->hccr, HCCR_CLR_RISC_INT);
+	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 	WRT_REG_WORD(&reg->hccr, HCCR_CLR_HOST_INT);
+	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	/* Reset ISP chip. */
 	WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
+	RD_REG_WORD(&reg->ctrl_status);			/* PCI Posting. */
 
 	/* Wait for RISC to recover from reset. */
 	if (IS_QLA2100(ha) || IS_QLA2200(ha) || IS_QLA2300(ha)) {
@@ -482,12 +499,13 @@ qla2x00_reset_chip(scsi_qla_host_t *ha) 
 
 	/* Reset RISC processor. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RESET_RISC);
+	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	WRT_REG_WORD(&reg->semaphore, 0);
 
 	/* Release RISC processor. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RELEASE_RISC);
-	RD_REG_WORD(&reg->hccr);		/* PCI Posting. */
+	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	if (IS_QLA2100(ha) || IS_QLA2200(ha) || IS_QLA2300(ha)) {
 		for (cnt = 0; cnt < 30000; cnt++) {
@@ -516,8 +534,10 @@ qla2x00_reset_chip(scsi_qla_host_t *ha) 
 	pci_write_config_word(ha->pdev, PCI_COMMAND, cmd);
 
 	/* Disable RISC pause on FPM parity error. */
-	if (!IS_QLA2100(ha))
+	if (!IS_QLA2100(ha)) {
 		WRT_REG_WORD(&reg->hccr, HCCR_DISABLE_PARITY_PAUSE);
+		RD_REG_WORD(&reg->hccr);		/* PCI Posting. */
+	}
 
 	spin_unlock_irqrestore(&ha->hardware_lock, flags);
 }
@@ -548,6 +568,8 @@ qla2x00_chip_diag(scsi_qla_host_t *ha)
 
 	/* Reset ISP chip. */
 	WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
+	RD_REG_WORD(&reg->ctrl_status);			/* PCI Posting. */
+
 	/*
 	 * We need to have a delay here since the card will not respond while
 	 * in reset causing an MCA on some architectures.
@@ -568,7 +590,9 @@ qla2x00_chip_diag(scsi_qla_host_t *ha)
 
 	/* Reset RISC processor. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RESET_RISC);
+	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RELEASE_RISC);
+	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	/* Workaround for QLA2312 PCI parity error */
 	if (IS_QLA2100(ha) || IS_QLA2200(ha) || IS_QLA2300(ha)) {

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-16 20:56         ` Andrew Vasquez
@ 2004-09-16 21:09           ` Jesse Barnes
  2004-09-16 21:40             ` Andrew Vasquez
  2004-09-16 23:14           ` Jeremy Higdon
  1 sibling, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-16 21:09 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: Paul Jackson, linux-scsi, mdr, jeremy, djh, jbarnes,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

On Thursday, September 16, 2004 1:56 pm, Andrew Vasquez wrote:
> On Thu, 2004-09-16 at 13:05, Jesse Barnes wrote:
> > On Thursday, September 16, 2004 12:56 pm, Paul Jackson wrote:
> > > Andrew Vasquez has been looking at this, via private email with just
> > > me (no progress yet).  Figured I update the larger list with this much
> > > ...
> >
> > It seems to be failing on one of the accesses to PCI_COMMAND in config
> > space in qla2x00_reset_chip().  I'm checking now to see if we're
> > accessing the card right after a reset but before the card has finished. 
> > That would cause a master abort, the symptom I'm seeing at least.
>
> Interesting, the only changes in reset_chip() are for PCI posting
> issues.  Relevant diff attached.

Yeah, I think one of these is the culprit.  Before I got your message, I fixed 
some of them in my tree already (see attached) and things seem to work.

        WRT_REG_WORD(&reg->hccr, HCCR_CLR_RISC_INT);
+       RD_REG_WORD(&reg->hccr);                        /* PCI Posting. */
        WRT_REG_WORD(&reg->hccr, HCCR_CLR_HOST_INT);
+       RD_REG_WORD(&reg->hccr);                        /* PCI Posting. */
 
        /* Reset ISP chip. */
        WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
+       RD_REG_WORD(&reg->ctrl_status);                 /* PCI Posting. */
 
In particular, are the above ok?  If the chip is resetting, won't doing a read 
cause a machine check (or at the very least, a device select timeout, which 
will return all ones on friendlier platforms).

        WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
+       RD_REG_WORD(&reg->ctrl_status);                 /* PCI Posting. */

Same here?

Thanks,
Jesse

[-- Attachment #2: qla2xxx-less-posting.patch --]
[-- Type: text/plain, Size: 1727 bytes --]

diff -Napur -X /home/jbarnes/dontdiff linux-2.6.9-rc2-mm1.orig/drivers/scsi/qla2xxx/qla_init.c linux-2.6.9-rc2-mm1/drivers/scsi/qla2xxx/qla_init.c
--- linux-2.6.9-rc2-mm1.orig/drivers/scsi/qla2xxx/qla_init.c	2004-09-16 09:35:50.000000000 -0700
+++ linux-2.6.9-rc2-mm1/drivers/scsi/qla2xxx/qla_init.c	2004-09-16 14:00:32.000000000 -0700
@@ -463,13 +463,10 @@ qla2x00_reset_chip(scsi_qla_host_t *ha) 
 	}
 
 	WRT_REG_WORD(&reg->hccr, HCCR_CLR_RISC_INT);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 	WRT_REG_WORD(&reg->hccr, HCCR_CLR_HOST_INT);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	/* Reset ISP chip. */
 	WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
-	RD_REG_WORD(&reg->ctrl_status);			/* PCI Posting. */
 
 	/* Wait for RISC to recover from reset. */
 	if (IS_QLA2100(ha) || IS_QLA2200(ha) || IS_QLA2300(ha)) {
@@ -490,7 +487,6 @@ qla2x00_reset_chip(scsi_qla_host_t *ha) 
 
 	/* Reset RISC processor. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RESET_RISC);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	WRT_REG_WORD(&reg->semaphore, 0);
 
@@ -559,7 +555,6 @@ qla2x00_chip_diag(scsi_qla_host_t *ha)
 
 	/* Reset ISP chip. */
 	WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
-	RD_REG_WORD(&reg->ctrl_status);			/* PCI Posting. */
 
 	/*
 	 * We need to have a delay here since the card will not respond while
@@ -581,9 +576,7 @@ qla2x00_chip_diag(scsi_qla_host_t *ha)
 
 	/* Reset RISC processor. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RESET_RISC);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RELEASE_RISC);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	/* Workaround for QLA2312 PCI parity error */
 	if (IS_QLA2100(ha) || IS_QLA2200(ha) || IS_QLA2300(ha)) {

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-16 21:09           ` Jesse Barnes
@ 2004-09-16 21:40             ` Andrew Vasquez
  2004-09-16 22:25               ` Andrew Morton
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Vasquez @ 2004-09-16 21:40 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Paul Jackson, linux-scsi, mdr, jeremy, djh, jbarnes,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2160 bytes --]

On Thu, 2004-09-16 at 14:09, Jesse Barnes wrote:
> On Thursday, September 16, 2004 1:56 pm, Andrew Vasquez wrote:
> > On Thu, 2004-09-16 at 13:05, Jesse Barnes wrote:
> > > On Thursday, September 16, 2004 12:56 pm, Paul Jackson wrote:
> > > > Andrew Vasquez has been looking at this, via private email with just
> > > > me (no progress yet).  Figured I update the larger list with this much
> > > > ...
> > >
> > > It seems to be failing on one of the accesses to PCI_COMMAND in config
> > > space in qla2x00_reset_chip().  I'm checking now to see if we're
> > > accessing the card right after a reset but before the card has finished. 
> > > That would cause a master abort, the symptom I'm seeing at least.
> >
> > Interesting, the only changes in reset_chip() are for PCI posting
> > issues.  Relevant diff attached.
> 
> Yeah, I think one of these is the culprit.  Before I got your message, I fixed 
> some of them in my tree already (see attached) and things seem to work.
> 

Hmm, seems we were a bit too over-aggressive in placement of the
readw()s :(

>         WRT_REG_WORD(&reg->hccr, HCCR_CLR_RISC_INT);
> +       RD_REG_WORD(&reg->hccr);                        /* PCI Posting. */
>         WRT_REG_WORD(&reg->hccr, HCCR_CLR_HOST_INT);
> +       RD_REG_WORD(&reg->hccr);                        /* PCI Posting. */
>  
>         /* Reset ISP chip. */
>         WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
> +       RD_REG_WORD(&reg->ctrl_status);                 /* PCI Posting. */
>  
> In particular, are the above ok?  If the chip is resetting, won't doing a read 
> cause a machine check (or at the very least, a device select timeout, which 
> will return all ones on friendlier platforms).
> 

There are several more which deltas in qla_dbg.c which are suspect
also.  

>         WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
> +       RD_REG_WORD(&reg->ctrl_status);                 /* PCI Posting. */
> 
> Same here?
> 

Andrew, please add Jesse's patch along with the patch I'm attaching to
your tree.  I'll be sure to add the ia64 machine back into our test
ring.


Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>

[-- Attachment #2: qla_dbg_less_posting.diff --]
[-- Type: text/x-patch, Size: 727 bytes --]

diff -Nurd -X /home/praka/Work/QLogic/Drivers/8.x/dontdiff linux-2.6.9-rc2-mm1/drivers/scsi/qla2xxx/qla_dbg.c linux-2.6.9-rc2-mm1_praka/drivers/scsi/qla2xxx/qla_dbg.c
--- linux-2.6.9-rc2-mm1/drivers/scsi/qla2xxx/qla_dbg.c	2004-09-16 14:30:38.000000000 -0700
+++ linux-2.6.9-rc2-mm1_praka/drivers/scsi/qla2xxx/qla_dbg.c	2004-09-16 14:36:18.907767776 -0700
@@ -712,7 +712,6 @@
 
 		/* Reset the ISP. */
 		WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
-		RD_REG_WORD(&reg->ctrl_status);		/* PCI Posting. */
 	}
 
 	for (cnt = 30000; RD_MAILBOX_REG(ha, reg, 0) != 0 &&
@@ -746,7 +745,6 @@
 
 			/* Release RISC. */
 			WRT_REG_WORD(&reg->hccr, HCCR_RELEASE_RISC);
-			RD_REG_WORD(&reg->hccr);	/* PCI Posting. */
 		}
 	}
 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-16 21:40             ` Andrew Vasquez
@ 2004-09-16 22:25               ` Andrew Morton
  2004-09-16 22:29                 ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Morton @ 2004-09-16 22:25 UTC (permalink / raw)
  To: Andrew Vasquez; +Cc: jbarnes, pj, linux-scsi, mdr, jeremy, djh, jbarnes

Andrew Vasquez <andrew.vasquez@qlogic.com> wrote:
>
> Andrew, please add Jesse's patch along with the patch I'm attaching to
> your tree.  I'll be sure to add the ia64 machine back into our test
> ring.

Could someone send me Jesse's patch?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-16 22:25               ` Andrew Morton
@ 2004-09-16 22:29                 ` Jesse Barnes
  2004-09-17 17:21                   ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-16 22:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh, jbarnes

[-- Attachment #1: Type: text/plain, Size: 548 bytes --]

On Thursday, September 16, 2004 3:25 pm, Andrew Morton wrote:
> Andrew Vasquez <andrew.vasquez@qlogic.com> wrote:
> > Andrew, please add Jesse's patch along with the patch I'm attaching to
> > your tree.  I'll be sure to add the ia64 machine back into our test
> > ring.
>
> Could someone send me Jesse's patch?

Here it is.

Reduce some overaggressive compensation for PCI write posting.  In some cases 
we don't actually want to read a value back, since the card could be 
resetting.

Signed-off-by: Jesse Barnes <jbarnes@sgi.com>

Thanks,
Jesse

[-- Attachment #2: qla2xxx-less-posting.patch --]
[-- Type: text/plain, Size: 1727 bytes --]

diff -Napur -X /home/jbarnes/dontdiff linux-2.6.9-rc2-mm1.orig/drivers/scsi/qla2xxx/qla_init.c linux-2.6.9-rc2-mm1/drivers/scsi/qla2xxx/qla_init.c
--- linux-2.6.9-rc2-mm1.orig/drivers/scsi/qla2xxx/qla_init.c	2004-09-16 09:35:50.000000000 -0700
+++ linux-2.6.9-rc2-mm1/drivers/scsi/qla2xxx/qla_init.c	2004-09-16 14:00:32.000000000 -0700
@@ -463,13 +463,10 @@ qla2x00_reset_chip(scsi_qla_host_t *ha) 
 	}
 
 	WRT_REG_WORD(&reg->hccr, HCCR_CLR_RISC_INT);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 	WRT_REG_WORD(&reg->hccr, HCCR_CLR_HOST_INT);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	/* Reset ISP chip. */
 	WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
-	RD_REG_WORD(&reg->ctrl_status);			/* PCI Posting. */
 
 	/* Wait for RISC to recover from reset. */
 	if (IS_QLA2100(ha) || IS_QLA2200(ha) || IS_QLA2300(ha)) {
@@ -490,7 +487,6 @@ qla2x00_reset_chip(scsi_qla_host_t *ha) 
 
 	/* Reset RISC processor. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RESET_RISC);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	WRT_REG_WORD(&reg->semaphore, 0);
 
@@ -559,7 +555,6 @@ qla2x00_chip_diag(scsi_qla_host_t *ha)
 
 	/* Reset ISP chip. */
 	WRT_REG_WORD(&reg->ctrl_status, CSR_ISP_SOFT_RESET);
-	RD_REG_WORD(&reg->ctrl_status);			/* PCI Posting. */
 
 	/*
 	 * We need to have a delay here since the card will not respond while
@@ -581,9 +576,7 @@ qla2x00_chip_diag(scsi_qla_host_t *ha)
 
 	/* Reset RISC processor. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RESET_RISC);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 	WRT_REG_WORD(&reg->hccr, HCCR_RELEASE_RISC);
-	RD_REG_WORD(&reg->hccr);			/* PCI Posting. */
 
 	/* Workaround for QLA2312 PCI parity error */
 	if (IS_QLA2100(ha) || IS_QLA2200(ha) || IS_QLA2300(ha)) {

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-16 22:29                 ` Jesse Barnes
@ 2004-09-17 17:21                   ` Jesse Barnes
  2004-09-18  6:10                     ` Grant Grundler
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-17 17:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh, jbarnes

On Thursday, September 16, 2004 3:29 pm, Jesse Barnes wrote:
> On Thursday, September 16, 2004 3:25 pm, Andrew Morton wrote:
> > Andrew Vasquez <andrew.vasquez@qlogic.com> wrote:
> > > Andrew, please add Jesse's patch along with the patch I'm attaching to
> > > your tree.  I'll be sure to add the ia64 machine back into our test
> > > ring.
> >
> > Could someone send me Jesse's patch?
>
> Here it is.
>
> Reduce some overaggressive compensation for PCI write posting.  In some
> cases we don't actually want to read a value back, since the card could be
> resetting.
>
> Signed-off-by: Jesse Barnes <jbarnes@sgi.com>

Btw Andrew (Vasquez), there's a small doc I put together that should describe 
when you have to worry about PCI posting.  It's in the tree:  
Documentation/io_ordering.txt.  If it's incomplete or confusing, just let me 
know and I'll update it.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-17 17:21                   ` Jesse Barnes
@ 2004-09-18  6:10                     ` Grant Grundler
  2004-09-20 22:40                       ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Grant Grundler @ 2004-09-18  6:10 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh, Andrew Morton

Jesse Barnes wrote:
...
> Btw Andrew (Vasquez), there's a small doc I put together that should describe 
> when you have to worry about PCI posting.  It's in the tree:  
> Documentation/io_ordering.txt.  If it's incomplete or confusing, just let me 
> know and I'll update it.

Jesse,
Both. incomplete and confusing.
"concrete example of a hypothetical driver" wasn't my first warning
this document needed work. :^)

I've hacked up the 2.6.9 version and even what I did still needs more work.
Have time to correct my mistakes and answer the questions I ask?

I'd be happy to review it again after you've done another round on it.

[]'s should all go away - used those to mark editorial notes.

hth,
grant

--------------------- cut here ------------------

Weakly Ordered Memory Mapped IO
-------------------------------

SGI Altix chipset implements weakly ordered Memory-Mapped I/O writes.
On this platform, driver writers are responsible for ensuring I/O writes
to memory-mapped addresses arrive in the order intended.

Like for PCI write posting problems, this is done by reading
a 'safe' device or bridge register, causing the I/O chipset to
flush pending writes to the device before any reads are issued.
A driver would issue the "safe" read immediately prior to the exit
of a critical section of code protected by spinlocks.  This would
ensure subsequent writes to I/O space arrived only after all prior
writes (much like a memory barrier op, mb(), only with respect to I/O).

Note: MMIO reads are expensive! Don't add MMIO reads after *every* MMIO
      write unless the device programming model absolutely requires it.

An example from a hypothetical device driver might help:

        ...
CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  val = readl(my_status);
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  val = readl(my_status);
CPU B:  ...
CPU B:  writel(newval2, ring_ptr);
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
        ...

In the case above, the device may receive newval2 before it receives newval,
which could cause problems.

[ Is this example broken or am I just staying up too late?
  The example is doing a readl() in the second critical section.
  Shouldn't that enforce the write ordering?
]

Fixing it is easy enough though:

        ...
CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  val = readl(my_status);
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  (void)readl(safe_register); /* maybe a config register? */
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  val = readl(my_status);
CPU B:  ...
CPU B:  writel(newval2, ring_ptr);
CPU B:  (void)readl(safe_register); /* maybe a config register? */
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

The reads from safe_register will cause the I/O chipset to flush any
pending writes before actually posting the read to the chipset, preventing
possible data corruption.

[ How about interactions with:
  o read_relaxed()?
  o DMA?
  o IO Port space reads?
  o IO Port space writes?
]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-18  6:10                     ` Grant Grundler
@ 2004-09-20 22:40                       ` Jesse Barnes
  2004-09-20 23:27                         ` Grant Grundler
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-20 22:40 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh, Andrew Morton

On Friday, September 17, 2004 11:10 pm, Grant Grundler wrote:
> Jesse Barnes wrote:
> ...
>
> > Btw Andrew (Vasquez), there's a small doc I put together that should
> > describe when you have to worry about PCI posting.  It's in the tree:
> > Documentation/io_ordering.txt.  If it's incomplete or confusing, just let
> > me know and I'll update it.
>
> Jesse,
> Both. incomplete and confusing.
> "concrete example of a hypothetical driver" wasn't my first warning
> this document needed work. :^)

Heh, yeah I noticed that on re-reading too :)

> I've hacked up the 2.6.9 version and even what I did still needs more work.
> Have time to correct my mistakes and answer the questions I ask?
>
> I'd be happy to review it again after you've done another round on it.
>
> []'s should all go away - used those to mark editorial notes.

Sure, thanks for reading it.

> Weakly Ordered Memory Mapped IO
> -------------------------------
>
> SGI Altix chipset implements weakly ordered Memory-Mapped I/O writes.
> On this platform, driver writers are responsible for ensuring I/O writes
> to memory-mapped addresses arrive in the order intended.

This is incorrect.  I wrote this before I understood that 'pci write posting' 
was the common term for describing the fact that writes from different CPUs 
could arrive out of order.  A s/weakly ordered/write posting would make the 
existing document much more accurate.

> Like for PCI write posting problems, this is done by reading
> a 'safe' device or bridge register, causing the I/O chipset to
> flush pending writes to the device before any reads are issued.
> A driver would issue the "safe" read immediately prior to the exit
> of a critical section of code protected by spinlocks.  This would
> ensure subsequent writes to I/O space arrived only after all prior
> writes (much like a memory barrier op, mb(), only with respect to I/O).
>
> Note: MMIO reads are expensive! Don't add MMIO reads after *every* MMIO
>       write unless the device programming model absolutely requires it.

Yes!

>
>
> An example from a hypothetical device driver might help:
>
>         ...
> CPU A:  spin_lock_irqsave(&dev_lock, flags)
> CPU A:  val = readl(my_status);
> CPU A:  ...
> CPU A:  writel(newval, ring_ptr);
> CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
>         ...
> CPU B:  spin_lock_irqsave(&dev_lock, flags)
> CPU B:  val = readl(my_status);
> CPU B:  ...
> CPU B:  writel(newval2, ring_ptr);
> CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
>         ...
>
> In the case above, the device may receive newval2 before it receives
> newval, which could cause problems.
>
> [ Is this example broken or am I just staying up too late?
>   The example is doing a readl() in the second critical section.
>   Shouldn't that enforce the write ordering?
> ]

Yep, that's a bug.  It should just be writes.

> The reads from safe_register will cause the I/O chipset to flush any
> pending writes before actually posting the read to the chipset, preventing
> possible data corruption.
>
> [ How about interactions with:
>   o read_relaxed()?
>   o DMA?
>   o IO Port space reads?
>   o IO Port space writes?
> ]

None that I know of.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-20 22:40                       ` Jesse Barnes
@ 2004-09-20 23:27                         ` Grant Grundler
  2004-09-21  0:09                           ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Grant Grundler @ 2004-09-20 23:27 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Grant Grundler, Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh,
	Andrew Morton

On Mon, Sep 20, 2004 at 03:40:02PM -0700, Jesse Barnes wrote:
> > Weakly Ordered Memory Mapped IO
> > -------------------------------
> >
> > SGI Altix chipset implements weakly ordered Memory-Mapped I/O writes.
> > On this platform, driver writers are responsible for ensuring I/O writes
> > to memory-mapped addresses arrive in the order intended.
> 
> This is incorrect.  I wrote this before I understood that 'pci write posting' 
> was the common term for describing the fact that writes from different CPUs 
> could arrive out of order.  A s/weakly ordered/write posting would make the 
> existing document much more accurate.

"write posting" is orthogonal to PCI ordering rules.
AFAIK, Write posting is not specific to PCI - but any memory mapped IO.

I understand "write posting" as when the CPU posts the write
to the chipset and the chipset says the write is done even though
it hasn't reached the PCI device. It just means the write has reached
the PCI "domain" (which is supposed to be strongly ordered).

Page 304 of "ia-64 linux kernel" book decsribes how writel ordering
(within a CPU) is enforced with "release semantics" (ie .rel in asm ).
IA64 spinlocks also use .rel and thus I'm pretty sure from the CPU
PoV, writel()s will be posted by the releasing the lock.
I think willy pointed at the docs describing this in general terms.

Secondly, I don't recall hearing about problems like this
on Intel or HP ia64 machines. I've only run into PCI posted write
and DMA syncronization problems where the drivers aren't following
all the rules quite right (missing mb() and readl()'s mostly).

So far, I still think this document is misnamed and should
be called something like "SGI Altix porting issues" and moved
under the Documentation/ia64 directory.

> > [ Is this example broken or am I just staying up too late?
> >   The example is doing a readl() in the second critical section.
> >   Shouldn't that enforce the write ordering?
> > ]
> 
> Yep, that's a bug.  It should just be writes.

ok. Can you fix that up and post a new version where I can see it?

> > The reads from safe_register will cause the I/O chipset to flush any
> > pending writes before actually posting the read to the chipset, preventing
> > possible data corruption.
> >
> > [ How about interactions with:
> >   o read_relaxed()?
> >   o DMA?
> >   o IO Port space reads?
> >   o IO Port space writes?
> > ]
> 
> None that I know of.

You mean none that are surprising to you?
ie writes can pass read_relaxed() transactions or vice versa?
DMA read returns can bypass MMIO writes? (parisc chipsets allow this)

IIRC, IO port space writes are NOT posted.
So the rules for ordering must be impacted or different somehow.
ie Are IO Port space writes strongly ordered WRT MMIO space writes?

thanks,
grant

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-20 23:27                         ` Grant Grundler
@ 2004-09-21  0:09                           ` Jesse Barnes
  2004-09-21  5:46                             ` Grant Grundler
  2004-09-21 23:03                             ` Guennadi Liakhovetski
  0 siblings, 2 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21  0:09 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

On Monday, September 20, 2004 4:27 pm, Grant Grundler wrote:
> "write posting" is orthogonal to PCI ordering rules.
> AFAIK, Write posting is not specific to PCI - but any memory mapped IO.

Right.

> I understand "write posting" as when the CPU posts the write
> to the chipset and the chipset says the write is done even though
> it hasn't reached the PCI device. It just means the write has reached
> the PCI "domain" (which is supposed to be strongly ordered).

That's my understanding too.

> Secondly, I don't recall hearing about problems like this
> on Intel or HP ia64 machines. I've only run into PCI posted write
> and DMA syncronization problems where the drivers aren't following
> all the rules quite right (missing mb() and readl()'s mostly).

Problems like what?  If mmio writes are posted, then the driver has to deal 
with it with reads like you said.  If the example code was fixed to lose the 
read() in the second spinlock protected region, I think it would describe 
mmio write posting accurately, no?

> So far, I still think this document is misnamed and should
> be called something like "SGI Altix porting issues" and moved
> under the Documentation/ia64 directory.

But it has nothing to do with Altix at all...

>
> > > [ Is this example broken or am I just staying up too late?
> > >   The example is doing a readl() in the second critical section.
> > >   Shouldn't that enforce the write ordering?
> > > ]
> >
> > Yep, that's a bug.  It should just be writes.
>
> ok. Can you fix that up and post a new version where I can see it?
>
> > > The reads from safe_register will cause the I/O chipset to flush any
> > > pending writes before actually posting the read to the chipset,
> > > preventing possible data corruption.
> > >
> > > [ How about interactions with:
> > >   o read_relaxed()?
> > >   o DMA?
> > >   o IO Port space reads?
> > >   o IO Port space writes?
> > > ]
> >
> > None that I know of.
>
> You mean none that are surprising to you?
> ie writes can pass read_relaxed() transactions or vice versa?
> DMA read returns can bypass MMIO writes? (parisc chipsets allow this)

No, as far as mmio ordering goes, read_relaxed is exactly the same as read, so 
in the example code, a read_relaxed would be sufficient for write ordering.

> IIRC, IO port space writes are NOT posted.
> So the rules for ordering must be impacted or different somehow.
> ie Are IO Port space writes strongly ordered WRT MMIO space writes?

Right, they're supposed to be strongly ordered, I think arches are supposed to 
guarantee that in their in/out routines.

Here's a new version that should be clearer.

Thanks,
Jesse

[-- Attachment #2: io_ordering.txt --]
[-- Type: text/plain, Size: 2098 bytes --]

Dealing with posted writes
--------------------------

On some platforms platforms, driver writers are responsible for
ensuring that I/O writes to memory-mapped addresses on their device
arrive in the order intended.  This is typically done by reading a
'safe' device or bridge register, causing the I/O chipset to flush
pending writes to the device before any reads are posted.  A driver
would usually use this technique immediately prior to the exit of a
critical section of code protected by spinlocks.  This would ensure
that subsequent writes to I/O space arrived only after all prior
writes (much like a memory barrier op, mb(), only with respect to
I/O).

Some pseudocode to illustrate the problem:

        ...
CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  ...
CPU B:  writel(newval2, ring_ptr);
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
        ...

In the case above, the device may receive newval2 before it receives newval,
which could cause problems.  Fixing it is easy enough though:

        ...
CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  (void)readl(safe_register); /* maybe a config register? */
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  ...
CPU B:  writel(newval2, ring_ptr);
CPU B:  (void)readl(safe_register); /* or read_relaxed() */
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

Here, the reads from safe_register will cause the I/O chipset to flush any
pending writes before actually posting the read to the chipset, preventing
possible data corruption.

This sort of synchronization is only necessary for read/write calls,
not in/out calls, since they're by definition strongly ordered.

We should probably add a writeflush call or something to deal with the
above in an easier to read way.  Some platforms could even implement
such a routine more efficiently than a regular read.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21  0:09                           ` Jesse Barnes
@ 2004-09-21  5:46                             ` Grant Grundler
  2004-09-21  6:45                               ` Jeremy Higdon
                                                 ` (2 more replies)
  2004-09-21 23:03                             ` Guennadi Liakhovetski
  1 sibling, 3 replies; 78+ messages in thread
From: Grant Grundler @ 2004-09-21  5:46 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Grant Grundler, Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh,
	Andrew Morton

On Mon, Sep 20, 2004 at 05:09:44PM -0700, Jesse Barnes wrote:
> > Secondly, I don't recall hearing about problems like this
> > on Intel or HP ia64 machines. I've only run into PCI posted write
> > and DMA syncronization problems where the drivers aren't following
> > all the rules quite right (missing mb() and readl()'s mostly).
> 
> Problems like what?

I've never heard of multiple writes from different CPUs going out of order
to the PCI device.

> If mmio writes are posted, then the driver has to deal 
> with it with reads like you said.

No it doesn't. Only if it depends on *when* the write hits the device.
The classic example is:
	writel(x, CMD_RESET);
	udelay(10);
	readl(x+STATUS);	/* parisc will crash if not ready */

>   If the example code was fixed to lose the 
> read() in the second spinlock protected region, I think it would describe 
> mmio write posting accurately, no?

No.  Can you add something to the example that shows they expected
the writes to hit the device at a certain time?

The CPU would continue doing other work before the writes reach
the device but they would reach the device in order.
I'm pretty sure of that on most IA32, parisc, and IA64 platforms.

The only exceptions I'm aware of are some broken ia32 chipsets which
have issues with write ordering - see TG3_FLAG_MBOX_WRITE_REORDER usage
in drivers/net/tg3.*.
Comment says:
        /* If we have an AMD 762 or Intel ICH/ICH0/ICH2 chipset, write
         * reordering to the mailbox registers done by the host
         * controller can cause major troubles.  We read back from
         * every mailbox register write to force the writes to be
         * posted to the chip in order.
         */

I haven't seen any evidence of this happening yet on ia64.
If it is, then I'd really like to know about it and we should fix
tg3 since both HP and SGI ship product that depends on tg3 driver.

> > So far, I still think this document is misnamed and should
> > be called something like "SGI Altix porting issues" and moved
> > under the Documentation/ia64 directory.
> 
> But it has nothing to do with Altix at all...

Ok.
Can you be explicit on which platforms and which drivers
anyone at SGI has seen the ordering problem?
Why did you write this document in the first place?

The first sentence on the new version (below) still introduces
this as an ordering problem and not a write posting problem.

> > You mean none that are surprising to you?
> > ie writes can pass read_relaxed() transactions or vice versa?
> > DMA read returns can bypass MMIO writes? (parisc chipsets allow this)
> 
> No, as far as mmio ordering goes, read_relaxed is exactly the same as read,
> so in the example code, a read_relaxed would be sufficient for write ordering.

Ok - that's surprising to me and should be clearly stated.
I do not expect read_relaxed() to enforce ordering in either direction
of the data path - not for MMIO writes nor DMA writes.

> > IIRC, IO port space writes are NOT posted.
> > So the rules for ordering must be impacted or different somehow.
> > ie Are IO Port space writes strongly ordered WRT MMIO space writes?
> 
> Right, they're supposed to be strongly ordered, I think arches are
> supposed to guarantee that in their in/out routines.

Yes. Again, stating it in this document makes it clear what you
expect from the platform support code.

> Here's a new version that should be clearer.
> 
> Thanks,
> Jesse

> Dealing with posted writes
> --------------------------
> 
> On some platforms platforms, driver writers are responsible for
> ensuring that I/O writes to memory-mapped addresses on their device
> arrive in the order intended.

The writes will arrive in order according to PCI ordering rules.
Wasn't this supposed to be about write posting?

Documentation/DocBook/deviceiobook.tmpl has a paragraph on write posting.
I think a patch to deviceiobook.tmpl would be better than having
write posting discussed in a seperate file if you think it needs
an example.

> This is typically done by reading a
> 'safe' device or bridge register, causing the I/O chipset to flush
> pending writes to the device before any reads are posted.  A driver
> would usually use this technique immediately prior to the exit of a
> critical section of code protected by spinlocks.  This would ensure
> that subsequent writes to I/O space arrived only after all prior
> writes (much like a memory barrier op, mb(), only with respect to
> I/O).
> 
> Some pseudocode to illustrate the problem:
> 
>         ...
> CPU A:  spin_lock_irqsave(&dev_lock, flags)
> CPU A:  ...
> CPU A:  writel(newval, ring_ptr);
> CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
>         ...
> CPU B:  spin_lock_irqsave(&dev_lock, flags)
> CPU B:  ...
> CPU B:  writel(newval2, ring_ptr);
> CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
>         ...

I'm pretty sure spinlocks are supposed to provide memory barriers.
Maybe that's only to gcc so it doesn't re-order loads/stores around
a spinlock. But I thought the ia64 implemention used the same "release"
and "acquire" semantics as readX() and writeX() do.

The alph implementation explicitly enforces it:
(arch/alpha/lib/io.c)

	void _writel(u32 b, unsigned long addr)
	{
		__writel(b, addr);
		mb();
	}


> In the case above, the device may receive newval2 before it receives newval,
> which could cause problems.  Fixing it is easy enough though:
> 
>         ...
> CPU A:  spin_lock_irqsave(&dev_lock, flags)
> CPU A:  ...
> CPU A:  writel(newval, ring_ptr);
> CPU A:  (void)readl(safe_register); /* maybe a config register? */
> CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
>         ...
> CPU B:  spin_lock_irqsave(&dev_lock, flags)
> CPU B:  ...
> CPU B:  writel(newval2, ring_ptr);
> CPU B:  (void)readl(safe_register); /* or read_relaxed() */
> CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
> 
> Here, the reads from safe_register will cause the I/O chipset to flush any
> pending writes before actually posting the read to the chipset, preventing
> possible data corruption.

That should probably be
	 "...flush any posted writes before posting the read return...".


> This sort of synchronization is only necessary for read/write calls,
> not in/out calls, since they're by definition strongly ordered.

Similarly:
	inX and outX calls are strongly ordered and non-postable.
	They do not need special handling. But this is something to watch out
	for when converting drivers to use MMIO space from IO Port space.

> We should probably add a writeflush call or something to deal with the
> above in an easier to read way.  Some platforms could even implement
> such a routine more efficiently than a regular read.

thanks,
grant

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21  5:46                             ` Grant Grundler
@ 2004-09-21  6:45                               ` Jeremy Higdon
  2004-09-21 13:29                                 ` Jesse Barnes
  2004-09-21 13:25                               ` Jesse Barnes
  2004-09-21 15:13                               ` Jesse Barnes
  2 siblings, 1 reply; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21  6:45 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Jesse Barnes, Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh,
	Andrew Morton

Lots of issues covered.

I'd like to cover one of them first, since it is an underlying
principle in the discussion.

On Mon, Sep 20, 2004 at 11:46:26PM -0600, Grant Grundler wrote:
> On Mon, Sep 20, 2004 at 05:09:44PM -0700, Jesse Barnes wrote:
> > > Secondly, I don't recall hearing about problems like this
> > > on Intel or HP ia64 machines. I've only run into PCI posted write
> > > and DMA syncronization problems where the drivers aren't following
> > > all the rules quite right (missing mb() and readl()'s mostly).
> > 
> > Problems like what?
> 
> I've never heard of multiple writes from different CPUs going out of order
> to the PCI device.

It was my understanding that this could be a problem on any
MP machine where the CPUs use a write buffer (which is just
about everything today).

The question seems to be whether release semantics (or equivalent
on other chips) in the IA64 apply to MMIO writes.  I believe that
they do not.  It seems that you think that it does.

On Altix, we ran into a problem with the qla1280 driver (see
version 1.56 in the scsi-misc-2.6 bk tree) because the spinunlock
(apparently) did not imply a retirement of a previous mmio write.
In that rev, I added an mmio read to so that the mmio write would
be completed before releasing the spinlock (I believe the host
lock held during the call to queuecommand).

Before making that change, the problem was the two different CPUs
would mmio write to the Request In register, and the ordering
would flip, causing the qla1280 to think that it suddenly had
an entire request queue.  We didn't see this problem on puny
64p machines (at least not under ordinary stress testing); we
needed a 512p machine to see it, though odds are that it would
have occurred very occasionally on smaller machines.

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21  6:45                               ` Jeremy Higdon
@ 2004-09-21 13:29                                 ` Jesse Barnes
  0 siblings, 0 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 13:29 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Grant Grundler, Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh,
	Andrew Morton

On Tuesday, September 21, 2004 2:45 am, Jeremy Higdon wrote:
> It was my understanding that this could be a problem on any
> MP machine where the CPUs use a write buffer (which is just
> about everything today).
>
> The question seems to be whether release semantics (or equivalent
> on other chips) in the IA64 apply to MMIO writes.  I believe that
> they do not.  It seems that you think that it does.

I think it's even more subtle than that.  Stores with release semantics will 
be visible on other CPUs if they're written to cacheable space, but in the 
case of MMIO writes, it's the local receiving Hub that can cause the problem.  
It'll try to send the PIO along to its destination Hub as soon as possible, 
but due to congestion, credit counts, etc., the write may be delayed and 
occur *after* a write from another CPU whose Hub is closer.  But then again, 
I could be confused.

> On Altix, we ran into a problem with the qla1280 driver (see
> version 1.56 in the scsi-misc-2.6 bk tree) because the spinunlock
> (apparently) did not imply a retirement of a previous mmio write.
> In that rev, I added an mmio read to so that the mmio write would
> be completed before releasing the spinlock (I believe the host
> lock held during the call to queuecommand).
>
> Before making that change, the problem was the two different CPUs
> would mmio write to the Request In register, and the ordering
> would flip, causing the qla1280 to think that it suddenly had
> an entire request queue.  We didn't see this problem on puny
> 64p machines (at least not under ordinary stress testing); we
> needed a 512p machine to see it, though odds are that it would
> have occurred very occasionally on smaller machines.

Good example, thanks for reminding me.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21  5:46                             ` Grant Grundler
  2004-09-21  6:45                               ` Jeremy Higdon
@ 2004-09-21 13:25                               ` Jesse Barnes
  2004-09-21 15:13                               ` Jesse Barnes
  2 siblings, 0 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 13:25 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 1:46 am, Grant Grundler wrote:
> I've never heard of multiple writes from different CPUs going out of order
> to the PCI device.

Are you sure?  Isn't that the definition of write posting?  At any rate, sn2 
has the write posting issue (which seems to call for a pioflush routine), but 
also a *potential* for out of order writes, which is what's described in this 
document.  There was a big discussion on lkml about this awhile back when I 
tried to push in an mmiob() (pioflush) API.  It was lacking arguments, 
however, and so was not really usable on many platforms.

> > If mmio writes are posted, then the driver has to deal
> > with it with reads like you said.
>
> No it doesn't. Only if it depends on *when* the write hits the device.
> The classic example is:
>  writel(x, CMD_RESET);
>  udelay(10);
>  readl(x+STATUS); /* parisc will crash if not ready */
>
> >   If the example code was fixed to lose the
> > read() in the second spinlock protected region, I think it would describe
> > mmio write posting accurately, no?
>
> No.  Can you add something to the example that shows they expected
> the writes to hit the device at a certain time?

Yeah, the qla2xxx driver is a perfect example of this.  But we don't have an 
alternative to udelay at this point, maybe we need pioflush(device *, 
unsigned long addr)?  That would allow arch code to read from the device's 
bridge config space, or at the very least do a delay (a maximal value, 
guaranteed to not be exceeded) and read addr?

> The CPU would continue doing other work before the writes reach
> the device but they would reach the device in order.
> I'm pretty sure of that on most IA32, parisc, and IA64 platforms.

Not on sn2.  The local chipset could receive the write, the process could be 
rescheduled to another cpu closer to the target device, and its write (the 
second one) could get there first.  So on sn2 we need a read following writes 
prior to spinlock release.

> The only exceptions I'm aware of are some broken ia32 chipsets which
> have issues with write ordering - see TG3_FLAG_MBOX_WRITE_REORDER usage
> in drivers/net/tg3.*.
> Comment says:
>         /* If we have an AMD 762 or Intel ICH/ICH0/ICH2 chipset, write
>          * reordering to the mailbox registers done by the host
>          * controller can cause major troubles.  We read back from
>          * every mailbox register write to force the writes to be
>          * posted to the chip in order.
>          */
>
> I haven't seen any evidence of this happening yet on ia64.
> If it is, then I'd really like to know about it and we should fix
> tg3 since both HP and SGI ship product that depends on tg3 driver.

This will only happen if the writes come from different CPUs.  A stream of 
program order writes under a spinlock will *not* be reordered (that's a big 
hw bug IMO).

> > > So far, I still think this document is misnamed and should
> > > be called something like "SGI Altix porting issues" and moved
> > > under the Documentation/ia64 directory.
> >
> > But it has nothing to do with Altix at all...
>
> Ok.
> Can you be explicit on which platforms and which drivers
> anyone at SGI has seen the ordering problem?
> Why did you write this document in the first place?

To help people understand that they need to deal with the issue.

> The first sentence on the new version (below) still introduces
> this as an ordering problem and not a write posting problem.
>
> > > You mean none that are surprising to you?
> > > ie writes can pass read_relaxed() transactions or vice versa?
> > > DMA read returns can bypass MMIO writes? (parisc chipsets allow this)
> >
> > No, as far as mmio ordering goes, read_relaxed is exactly the same as
> > read, so in the example code, a read_relaxed would be sufficient for
> > write ordering.
>
> Ok - that's surprising to me and should be clearly stated.
> I do not expect read_relaxed() to enforce ordering in either direction
> of the data path - not for MMIO writes nor DMA writes.

But it *has* to for mmio writes.  Not only do those transactions occur on the 
same 'channel', but the read may depend on the side effects of prior writes.

> Yes. Again, stating it in this document makes it clear what you
> expect from the platform support code.

Ok.

> The writes will arrive in order according to PCI ordering rules.
> Wasn't this supposed to be about write posting?

More confusion.  It's about write posting with a small, added wrinkle.  The 
issues are related enough that it probably makes sense to include them in the 
same doc (though I had neglected to describe the fact that writes won't get 
to the device right away).

> Documentation/DocBook/deviceiobook.tmpl has a paragraph on write posting.
> I think a patch to deviceiobook.tmpl would be better than having
> write posting discussed in a seperate file if you think it needs
> an example.

Ok, that sounds good.

> I'm pretty sure spinlocks are supposed to provide memory barriers.
> Maybe that's only to gcc so it doesn't re-order loads/stores around
> a spinlock. But I thought the ia64 implemention used the same "release"
> and "acquire" semantics as readX() and writeX() do.

This has nothing to do with weak ordering (I was wrong to state it that way, 
and ended up just making for a confusing analogy).  It has to do with writes 
coming from different nodes on the NUMA fabric.

> > Here, the reads from safe_register will cause the I/O chipset to flush
> > any pending writes before actually posting the read to the chipset,
> > preventing possible data corruption.
>
> That should probably be
>   "...flush any posted writes before posting the read return...".

Ok.

>
> > This sort of synchronization is only necessary for read/write calls,
> > not in/out calls, since they're by definition strongly ordered.
>
> Similarly:
>  inX and outX calls are strongly ordered and non-postable.
>  They do not need special handling. But this is something to watch out
>  for when converting drivers to use MMIO space from IO Port space.

Sounds good.

Thanks again,
Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21  5:46                             ` Grant Grundler
  2004-09-21  6:45                               ` Jeremy Higdon
  2004-09-21 13:25                               ` Jesse Barnes
@ 2004-09-21 15:13                               ` Jesse Barnes
  2004-09-21 15:41                                 ` James Bottomley
  2 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 15:13 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On Tuesday, September 21, 2004 1:46 am, Grant Grundler wrote:
> No it doesn't. Only if it depends on *when* the write hits the device.
> The classic example is:
>  writel(x, CMD_RESET);
>  udelay(10);
>  readl(x+STATUS); /* parisc will crash if not ready */

Ok, hopefully I've covered this in this release (patch to deviceiobook will 
come later).

The short of it is that we really need pioflush.  I'll resurrect my mmiob 
patches, change the name and prototype, and resubmit.

Thanks,
Jesse

[-- Attachment #2: io_ordering.txt --]
[-- Type: text/plain, Size: 3466 bytes --]

Dealing with posted writes
--------------------------

On some platforms platforms, driver writers are responsible for
ensuring that I/O writes to memory-mapped addresses on their device
arrive when expected and in the order intended.  This is typically
done by reading a 'safe' device or bridge register, causing the I/O
chipset to flush pending writes to the device before any reads are
posted.  A driver would usually use this technique immediately prior
to the exit of a critical section of code protected by spinlocks.
This would ensure that subsequent writes to I/O space arrived only
after all prior writes (much like a memory barrier op, mb(), only with
respect to I/O).

Some pseudocode to illustrate the problem of write posting:

...
spin_lock_irqsave(&dev_lock, flags)
...
writel(resetval, reset_reg); /* reset the card */
udelay(10); /* wait for reset (also needs pioflush) */
val = readl(ring_ptr); /* read initial value */
spin_unlock_irqrestore(&dev_lock, flags)
...

In this case, the card is reset by the first write.  The driver
attempts to wait for the completion of the reset using udelay.  But
since the write may be delayed and the udelay will probably start
executing right away, it may be that there's not enough time for the
write to actually arrive at the card and for the reset to occur before
the read is executed.  On some platforms, this can result in a machine
check.  Without a pioflush routine, the udelay must account for worst
case behavior.

And an example of reordering of writes between CPUs on a NUMA machine:

	...
CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  ...
CPU B:  writel(newval2, ring_ptr);
CPU B:  (void)readl(safe_register); /* or read_relaxed() */

In the case above, the device may receive newval2 before it receives newval,
which could cause problems.  Fixing it is easy enough though:

        ...
CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  (void)readl(safe_register); /* maybe a config register? */
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  ...
CPU B:  writel(newval2, ring_ptr);
CPU B:  (void)readl(safe_register); /* or read_relaxed() */
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

Here, the reads from safe_register will cause the I/O chipset to flush any
posted writes before actually sending the read to the chipset, preventing
possible data corruption.

inX and outX calls, on the other hand, are strongly ordered and
non-postable.  They do not need special handling. But this is
something to watch out for when converting drivers to use MMIO space
from IO Port space.

A new pioflush routine could address both of the above problems
(though drivers would still have to know how long to wait for card
resets).  It would ensure that pio writes had arrived at their
destination device before allowing executing in the current context to
continue.  Since some platforms would only be able to achieve this
through a read of a bridge config register, I think a prototype like:
  pioflush(struct device *dev, unsigned long addr);
would be necessary.  The dev argument would correspond to the device
in question, and the addr argument would be a safe register to read on
the device.  Either could be zero, but not both.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 15:13                               ` Jesse Barnes
@ 2004-09-21 15:41                                 ` James Bottomley
  2004-09-21 15:58                                   ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: James Bottomley @ 2004-09-21 15:41 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Grant Grundler, Andrew Vasquez, pj, SCSI Mailing List, mdr,
	jeremy, djh, Andrew Morton

On Tue, 2004-09-21 at 11:13, Jesse Barnes wrote:
> On Tuesday, September 21, 2004 1:46 am, Grant Grundler wrote:
> > No it doesn't. Only if it depends on *when* the write hits the device.
> > The classic example is:
> >  writel(x, CMD_RESET);
> >  udelay(10);
> >  readl(x+STATUS); /* parisc will crash if not ready */
> 
> Ok, hopefully I've covered this in this release (patch to deviceiobook will 
> come later).
> 
> The short of it is that we really need pioflush.  I'll resurrect my mmiob 
> patches, change the name and prototype, and resubmit.

Just to get back to the actual qla2xxx problem.

Do we agree that there are two possible solutions:

1) Find a safe mmio read to trigger the flush of the posted reset write.

2) Use a pio write to trigger the reset because we know the reset is
active as soon as the write returns.

and if so, which one are we going to implement?

James



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 15:41                                 ` James Bottomley
@ 2004-09-21 15:58                                   ` Jesse Barnes
  2004-09-21 16:01                                     ` Matthew Wilcox
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 15:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Grant Grundler, Andrew Vasquez, pj, SCSI Mailing List, mdr,
	jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 11:41 am, James Bottomley wrote:
> Just to get back to the actual qla2xxx problem.
>
> Do we agree that there are two possible solutions:
>
> 1) Find a safe mmio read to trigger the flush of the posted reset write.
>
> 2) Use a pio write to trigger the reset because we know the reset is
> active as soon as the write returns.
>
> and if so, which one are we going to implement?

or
 3) use a regular write() with an associated pioflush()

For now, (1) is a good option if the driver can do it (I still haven't heard 
if qla2xxx supports this).  I suspect that neither (1) nor (2) is available 
to some device, leaving only udelay() or (3), which I'm working on.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 15:58                                   ` Jesse Barnes
@ 2004-09-21 16:01                                     ` Matthew Wilcox
  2004-09-21 16:05                                       ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: Matthew Wilcox @ 2004-09-21 16:01 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: James Bottomley, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 11:58:24AM -0400, Jesse Barnes wrote:
> On Tuesday, September 21, 2004 11:41 am, James Bottomley wrote:
> > Just to get back to the actual qla2xxx problem.
> >
> > Do we agree that there are two possible solutions:
> >
> > 1) Find a safe mmio read to trigger the flush of the posted reset write.
> >
> > 2) Use a pio write to trigger the reset because we know the reset is
> > active as soon as the write returns.
> >
> > and if so, which one are we going to implement?
> 
> or
>  3) use a regular write() with an associated pioflush()
> 
> For now, (1) is a good option if the driver can do it (I still haven't heard 
> if qla2xxx supports this).  I suspect that neither (1) nor (2) is available 
> to some device, leaving only udelay() or (3), which I'm working on.

How can we do pioflush()?  My understanding of PCI ordering rules is that
we need to go all the way down to the device to prevent some intermediate
bridge from delaying the write arbitrarily.  So we'd actually need the
pci_dev and read its device ID back from config space, or something.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 16:01                                     ` Matthew Wilcox
@ 2004-09-21 16:05                                       ` Jesse Barnes
  2004-09-21 16:11                                         ` James Bottomley
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 16:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: James Bottomley, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 12:01 pm, Matthew Wilcox wrote:
> How can we do pioflush()?  My understanding of PCI ordering rules is that
> we need to go all the way down to the device to prevent some intermediate
> bridge from delaying the write arbitrarily.  So we'd actually need the
> pci_dev and read its device ID back from config space, or something.

Reading from the closest bridge won't be enough?  If not, then dealing with 
posting in a nice way is simply impossible for some devices.  We'd be stuck 
with udelay().

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 16:05                                       ` Jesse Barnes
@ 2004-09-21 16:11                                         ` James Bottomley
  2004-09-21 16:18                                           ` Jesse Barnes
  2004-09-21 17:03                                           ` Jesse Barnes
  0 siblings, 2 replies; 78+ messages in thread
From: James Bottomley @ 2004-09-21 16:11 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Matthew Wilcox, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, 2004-09-21 at 12:05, Jesse Barnes wrote:
> Reading from the closest bridge won't be enough?  If not, then dealing with 
> posting in a nice way is simply impossible for some devices.  We'd be stuck 
> with udelay().

That's correct.  The posted write is held somewhere in one of the
bridges, but the PCI ordering rules only require it to be ordered with
other MMIO reads to the *device*; so you only guarantee that the posted
write is flushed by doing a device read....obviously, there exist
bridges with looser ideas than this, but we need to follow the PCI
specs.

udelay() isn't a viable solution because the PCI specs have no upper
bound on the length of time a write may remain posted.

James

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 16:11                                         ` James Bottomley
@ 2004-09-21 16:18                                           ` Jesse Barnes
  2004-09-21 16:24                                             ` James Bottomley
  2004-09-21 17:03                                           ` Jesse Barnes
  1 sibling, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 16:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 12:11 pm, James Bottomley wrote:
> On Tue, 2004-09-21 at 12:05, Jesse Barnes wrote:
> > Reading from the closest bridge won't be enough?  If not, then dealing
> > with posting in a nice way is simply impossible for some devices.  We'd
> > be stuck with udelay().
>
> That's correct.  The posted write is held somewhere in one of the
> bridges, but the PCI ordering rules only require it to be ordered with
> other MMIO reads to the *device*; so you only guarantee that the posted
> write is flushed by doing a device read....obviously, there exist
> bridges with looser ideas than this, but we need to follow the PCI
> specs.
>
> udelay() isn't a viable solution because the PCI specs have no upper
> bound on the length of time a write may remain posted.

Well then.  Sounds like we're hosed for the qla2xxx case at least.  I think we 
still need pioflush for the case of writes from different CPUs (described in 
the document).  Using it *should* make the window pretty small for qla2xxx 
type problems too, but we have no guarantee.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 16:18                                           ` Jesse Barnes
@ 2004-09-21 16:24                                             ` James Bottomley
  0 siblings, 0 replies; 78+ messages in thread
From: James Bottomley @ 2004-09-21 16:24 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Matthew Wilcox, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, 2004-09-21 at 12:18, Jesse Barnes wrote:
> Well then.  Sounds like we're hosed for the qla2xxx case at least.  I think we 
> still need pioflush for the case of writes from different CPUs (described in 
> the document).  Using it *should* make the window pretty small for qla2xxx 
> type problems too, but we have no guarantee.

Not if we can do a PIO write ... PIO writes aren't posted.

James



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 16:11                                         ` James Bottomley
  2004-09-21 16:18                                           ` Jesse Barnes
@ 2004-09-21 17:03                                           ` Jesse Barnes
  2004-09-21 17:15                                             ` Matthew Wilcox
  2004-09-21 17:20                                             ` James Bottomley
  1 sibling, 2 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 17:03 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

On Tuesday, September 21, 2004 12:11 pm, James Bottomley wrote:
> On Tue, 2004-09-21 at 12:05, Jesse Barnes wrote:
> > Reading from the closest bridge won't be enough?  If not, then dealing
> > with posting in a nice way is simply impossible for some devices.  We'd
> > be stuck with udelay().
>
> That's correct.  The posted write is held somewhere in one of the
> bridges, but the PCI ordering rules only require it to be ordered with
> other MMIO reads to the *device*; so you only guarantee that the posted
> write is flushed by doing a device read....obviously, there exist
> bridges with looser ideas than this, but we need to follow the PCI
> specs.
>
> udelay() isn't a viable solution because the PCI specs have no upper
> bound on the length of time a write may remain posted.

Does this patch describe and correctly implement what we've discussed?  I've 
added information to io_ordering.txt to describe the conclusion about posting 
we seem to have agreed on.  Obviously, this is just a first cut.  It 
compiles, but I haven't made prototypes for any arch other than ia64.

Thanks,
Jesse

[-- Attachment #2: ioflush-ia64.patch --]
[-- Type: text/x-diff, Size: 11706 bytes --]

===== Documentation/io_ordering.txt 1.1 vs edited =====
--- 1.1/Documentation/io_ordering.txt	2003-03-18 02:02:11 -08:00
+++ edited/Documentation/io_ordering.txt	2004-09-21 09:59:09 -07:00
@@ -1,47 +1,75 @@
-On some platforms, so-called memory-mapped I/O is weakly ordered.  On such
-platforms, driver writers are responsible for ensuring that I/O writes to
-memory-mapped addresses on their device arrive in the order intended.  This is
-typically done by reading a 'safe' device or bridge register, causing the I/O
-chipset to flush pending writes to the device before any reads are posted.  A
-driver would usually use this technique immediately prior to the exit of a
-critical section of code protected by spinlocks.  This would ensure that
-subsequent writes to I/O space arrived only after all prior writes (much like a
-memory barrier op, mb(), only with respect to I/O).
+Dealing with posted writes
+--------------------------
 
-A more concrete example from a hypothetical device driver:
+Driver writers are responsible for ensuring that I/O writes to memory-mapped
+addresses on their device arrive when expected and in the order intended.
+This is typically done by reading a 'safe' device or bridge register, causing
+the I/O chipset to flush pending writes to the device before any reads are
+sent to the target device.  A driver would usually use this technique
+immediately prior to a read after a card reset or the exit of a critical
+section of code protected by spinlocks.  This would ensure that subsequent I/O
+space accesses arrived only after all prior writes.  There are really two
+issues at play here, one is 'posting', i.e. memory-mapped I/O writes not sent
+to the device immediately, and ordering, where on a large system writes from
+different CPUs may arrive out of order.
 
-        ...
+Some pseudocode to illustrate the problem of write posting:
+
+...
+spin_lock_irqsave(&dev_lock, flags)
+...
+writel(resetval, reset_reg); /* reset the card */
+udelay(10); /* wait for reset (also needs pioflush) */
+val = readl(ring_ptr); /* read initial value */
+spin_unlock_irqrestore(&dev_lock, flags)
+...
+
+In this case, the card is reset by the first write.  The driver attempts to
+wait for the completion of the reset using udelay.  But since the write may be
+delayed and the udelay will probably start executing right away, it may be
+that there's not enough time for the write to actually arrive at the card and
+for the reset to occur before the read is executed.  On some platforms, this
+can result in a machine check.  Unfortunately, there's no way to guarantee
+that a write has arrived at a device short of a read from the same address
+space, so in some cases, udelay() is the only option.  In any case, the driver
+should issue an ioflush() call prior to the udelay(), passing in 0 for the
+addr argument if no safe register exists.  This will allow the platform to
+make an effort to get the write as close to the device as possible before
+allowing the udelay to begin.
+
+And an example of reordering of writes between CPUs on a NUMA machine:
+
+	...
 CPU A:  spin_lock_irqsave(&dev_lock, flags)
-CPU A:  val = readl(my_status);
 CPU A:  ...
 CPU A:  writel(newval, ring_ptr);
 CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
         ...
 CPU B:  spin_lock_irqsave(&dev_lock, flags)
-CPU B:  val = readl(my_status);
 CPU B:  ...
 CPU B:  writel(newval2, ring_ptr);
-CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-        ...
+CPU B:  (void)readl(safe_register); /* or read_relaxed() */
 
 In the case above, the device may receive newval2 before it receives newval,
 which could cause problems.  Fixing it is easy enough though:
 
         ...
 CPU A:  spin_lock_irqsave(&dev_lock, flags)
-CPU A:  val = readl(my_status);
 CPU A:  ...
 CPU A:  writel(newval, ring_ptr);
-CPU A:  (void)readl(safe_register); /* maybe a config register? */
+CPU A:  ioflush(dev, safe_register);
 CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
         ...
 CPU B:  spin_lock_irqsave(&dev_lock, flags)
-CPU B:  val = readl(my_status);
 CPU B:  ...
 CPU B:  writel(newval2, ring_ptr);
-CPU B:  (void)readl(safe_register); /* maybe a config register? */
+CPU B:  ioflush(dev, safe_register);
 CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
 
-Here, the reads from safe_register will cause the I/O chipset to flush any
-pending writes before actually posting the read to the chipset, preventing
-possible data corruption.
+Here, the ioflush() call will cause the I/O chipset to flush any outstanding
+writes before actually sending the read to the chipset, preventing possible
+data corruption.
+
+inX and outX calls, on the other hand, are strongly ordered and non-postable.
+They do not need special handling.  But this is something to watch out for
+when converting drivers to use MMIO space from IO Port space.
===== arch/ia64/sn/io/machvec/iomv.c 1.9 vs edited =====
--- 1.9/arch/ia64/sn/io/machvec/iomv.c	2004-05-26 06:49:19 -07:00
+++ edited/arch/ia64/sn/io/machvec/iomv.c	2004-09-21 09:34:19 -07:00
@@ -54,23 +54,18 @@
 EXPORT_SYMBOL(sn_io_addr);
 
 /**
- * sn_mmiob - I/O space memory barrier
+ * __sn_ioflush - I/O space write flush
  *
- * Acts as a memory mapped I/O barrier for platforms that queue writes to 
- * I/O space.  This ensures that subsequent writes to I/O space arrive after
- * all previous writes.  For most ia64 platforms, this is a simple
- * 'mf.a' instruction.  For other platforms, mmiob() may have to read
- * a chipset register to ensure ordering.
+ * See include/asm-ia64/io.h and Documentation/io_ordering.txt for details.
  *
  * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
  * See PV 871084 for details about the WAR about zero value.
  *
  */
-void
-sn_mmiob (void)
+void __sn_ioflush(struct device *dev, unsigned long addr)
 {
 	while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) != 
 				SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
 		cpu_relax();
 }
-EXPORT_SYMBOL(sn_mmiob);
+EXPORT_SYMBOL(__sn_ioflush);
===== include/asm-ia64/io.h 1.19 vs edited =====
--- 1.19/include/asm-ia64/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-ia64/io.h	2004-09-21 09:46:08 -07:00
@@ -91,6 +91,26 @@
  */
 #define __ia64_mf_a()	ia64_mfa()
 
+/**
+ * __ia64_ioflush - I/O write flush
+ * @dev: device we're flushing
+ * @addr: safe register to read
+ *
+ * Flush I/O space writes out to their target device to ensure ordering.
+ * all previous writes.  For most ia64 platforms, this is a simple
+ * 'mf.a' instruction, so the address is ignored.  For other platforms,
+ * the address may be required to ensure proper ordering of writes to I/O space
+ * since a 'dummy' read might be necessary to barrier the write operation.
+ *
+ * If either @dev or @addr is 0, don't use it.
+ *
+ * See Documentation/io_ordering.txt for more information.
+ */
+static inline void __ia64_ioflush (struct device *dev, unsigned long addr)
+{
+	ia64_mfa();
+}
+
 static inline const unsigned long
 __ia64_get_io_port_base (void)
 {
@@ -267,6 +287,7 @@
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
+#define __ioflush	platform_ioflush
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -280,6 +301,7 @@
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
+#define ioflush(d,a)	__ioflush(d,a)
 
 /*
  * The address passed to these functions are ioremap()ped already.
===== include/asm-ia64/machvec.h 1.26 vs edited =====
--- 1.26/include/asm-ia64/machvec.h	2004-08-03 16:05:22 -07:00
+++ edited/include/asm-ia64/machvec.h	2004-09-21 09:35:32 -07:00
@@ -62,6 +62,7 @@
 typedef void ia64_mv_outb_t (unsigned char, unsigned long);
 typedef void ia64_mv_outw_t (unsigned short, unsigned long);
 typedef void ia64_mv_outl_t (unsigned int, unsigned long);
+typedef void ia64_mv_ioflush_t (struct device *, unsigned long);
 typedef unsigned char ia64_mv_readb_t (void *);
 typedef unsigned short ia64_mv_readw_t (void *);
 typedef unsigned int ia64_mv_readl_t (void *);
@@ -130,6 +131,7 @@
 #  define platform_outb		ia64_mv.outb
 #  define platform_outw		ia64_mv.outw
 #  define platform_outl		ia64_mv.outl
+#  define platform_ioflush	ia64_mv.ioflush
 #  define platform_readb        ia64_mv.readb
 #  define platform_readw        ia64_mv.readw
 #  define platform_readl        ia64_mv.readl
@@ -176,6 +178,7 @@
 	ia64_mv_outb_t *outb;
 	ia64_mv_outw_t *outw;
 	ia64_mv_outl_t *outl;
+	ia64_mv_ioflush_t *ioflush;
 	ia64_mv_readb_t *readb;
 	ia64_mv_readw_t *readw;
 	ia64_mv_readl_t *readl;
@@ -218,6 +221,7 @@
 	platform_outb,				\
 	platform_outw,				\
 	platform_outl,				\
+	platform_ioflush,			\
 	platform_readb,				\
 	platform_readw,				\
 	platform_readl,				\
@@ -343,6 +347,9 @@
 #endif
 #ifndef platform_outl
 # define platform_outl		__ia64_outl
+#endif
+#ifndef platform_ioflush
+# define platform_ioflush	__ia64_ioflush
 #endif
 #ifndef platform_readb
 # define platform_readb		__ia64_readb
===== include/asm-ia64/machvec_init.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/machvec_init.h	2004-02-06 00:30:24 -08:00
+++ edited/include/asm-ia64/machvec_init.h	2004-09-21 09:35:47 -07:00
@@ -12,6 +12,7 @@
 extern ia64_mv_outb_t __ia64_outb;
 extern ia64_mv_outw_t __ia64_outw;
 extern ia64_mv_outl_t __ia64_outl;
+extern ia64_mv_ioflush_t __ia64_ioflush;
 extern ia64_mv_readb_t __ia64_readb;
 extern ia64_mv_readw_t __ia64_readw;
 extern ia64_mv_readl_t __ia64_readl;
===== include/asm-ia64/machvec_sn2.h 1.14 vs edited =====
--- 1.14/include/asm-ia64/machvec_sn2.h	2004-07-10 17:14:00 -07:00
+++ edited/include/asm-ia64/machvec_sn2.h	2004-09-21 09:36:14 -07:00
@@ -49,6 +49,7 @@
 extern ia64_mv_outb_t __sn_outb;
 extern ia64_mv_outw_t __sn_outw;
 extern ia64_mv_outl_t __sn_outl;
+extern ia64_mv_ioflush_t __sn_ioflush;
 extern ia64_mv_readb_t __sn_readb;
 extern ia64_mv_readw_t __sn_readw;
 extern ia64_mv_readl_t __sn_readl;
@@ -92,6 +93,7 @@
 #define platform_outb			__sn_outb
 #define platform_outw			__sn_outw
 #define platform_outl			__sn_outl
+#define platform_ioflush		__sn_ioflush
 #define platform_readb			__sn_readb
 #define platform_readw			__sn_readw
 #define platform_readl			__sn_readl
===== include/asm-ia64/sn/io.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/sn/io.h	2004-02-13 07:00:22 -08:00
+++ edited/include/asm-ia64/sn/io.h	2004-09-21 09:38:27 -07:00
@@ -58,8 +58,8 @@
 #include <asm/sn/sn2/shubio.h>
 
 /*
- * Used to ensure write ordering (like mb(), but for I/O space)
+ * Used to ensure write ordering
  */
-extern void sn_mmiob(void);
+extern void __sn_ioflush(struct device *dev, unsigned long addr);
 
 #endif /* _ASM_IA64_SN_IO_H */
===== include/asm-ia64/sn/sn2/io.h 1.6 vs edited =====
--- 1.6/include/asm-ia64/sn/sn2/io.h	2004-07-22 17:00:00 -07:00
+++ edited/include/asm-ia64/sn/sn2/io.h	2004-09-21 09:31:56 -07:00
@@ -11,8 +11,10 @@
 #include <linux/compiler.h>
 #include <asm/intrinsics.h>
 
-extern void * sn_io_addr(unsigned long port) __attribute_const__; /* Forward definition */
-extern void sn_mmiob(void); /* Forward definition */
+/* Forward declarations */
+struct device;
+extern void *sn_io_addr(unsigned long port) __attribute_const__;
+extern void __sn_ioflush(struct device *dev, unsigned long addr);
 
 #define __sn_mf_a()   ia64_mfa()
 
@@ -91,7 +93,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_ioflush(0, 0);
 	}
 }
 
@@ -102,7 +104,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_ioflush(0, 0);
 	}
 }
 
@@ -113,7 +115,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_ioflush(0, 0);
 	}
 }
 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 17:03                                           ` Jesse Barnes
@ 2004-09-21 17:15                                             ` Matthew Wilcox
  2004-09-21 17:24                                               ` Jesse Barnes
  2004-09-21 17:20                                             ` James Bottomley
  1 sibling, 1 reply; 78+ messages in thread
From: Matthew Wilcox @ 2004-09-21 17:15 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: James Bottomley, Matthew Wilcox, Grant Grundler, Andrew Vasquez,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 01:03:18PM -0400, Jesse Barnes wrote:
> Does this patch describe and correctly implement what we've discussed?  I've 
> added information to io_ordering.txt to describe the conclusion about posting 
> we seem to have agreed on.  Obviously, this is just a first cut.  It 
> compiles, but I haven't made prototypes for any arch other than ia64.

Please kill the Documentation/io_ordering.txt file and work on improving
the information in Documentation/DocBook/deviceiobook instead.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 17:15                                             ` Matthew Wilcox
@ 2004-09-21 17:24                                               ` Jesse Barnes
  0 siblings, 0 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 17:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: James Bottomley, Matthew Wilcox, Grant Grundler, Andrew Vasquez,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 1:15 pm, Matthew Wilcox wrote:
> On Tue, Sep 21, 2004 at 01:03:18PM -0400, Jesse Barnes wrote:
> > Does this patch describe and correctly implement what we've discussed? 
> > I've added information to io_ordering.txt to describe the conclusion
> > about posting we seem to have agreed on.  Obviously, this is just a first
> > cut.  It compiles, but I haven't made prototypes for any arch other than
> > ia64.
>
> Please kill the Documentation/io_ordering.txt file and work on improving
> the information in Documentation/DocBook/deviceiobook instead.

Sure, I can do that in the next spin.  Please make sure things look ok 
otherwise though.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 17:03                                           ` Jesse Barnes
  2004-09-21 17:15                                             ` Matthew Wilcox
@ 2004-09-21 17:20                                             ` James Bottomley
  2004-09-21 17:46                                               ` Jesse Barnes
  1 sibling, 1 reply; 78+ messages in thread
From: James Bottomley @ 2004-09-21 17:20 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Matthew Wilcox, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, 2004-09-21 at 13:03, Jesse Barnes wrote:
> +Driver writers are responsible for ensuring that I/O writes to memory-mapped
> +addresses on their device arrive when expected and in the order intended.

Really, no.  You're making the document more confusing.  PCI devices
have *two* types of non DMA accesses (well, three, but lets forget
configuration space for the moment).

I/O Space accesses (what we call PIO) and memory accesses (what we call
MMIO)

> +This is typically done by reading a 'safe' device or bridge register, causing
> +the I/O chipset to flush pending writes to the device before any reads are

Not "bridge register" the specs say this must be an access to the
device's space.

> +sent to the target device.  A driver would usually use this technique
> +immediately prior to a read after a card reset or the exit of a critical
> +section of code protected by spinlocks.  This would ensure that subsequent I/O
> +space accesses arrived only after all prior writes.  There are really two
> +issues at play here, one is 'posting', i.e. memory-mapped I/O writes not sent
> +to the device immediately, and ordering, where on a large system writes from
> +different CPUs may arrive out of order.
>  
> -        ...
> +Some pseudocode to illustrate the problem of write posting:
> +
> +...
> +spin_lock_irqsave(&dev_lock, flags)
> +...
> +writel(resetval, reset_reg); /* reset the card */
> +udelay(10); /* wait for reset (also needs pioflush) */
> +val = readl(ring_ptr); /* read initial value */
> +spin_unlock_irqrestore(&dev_lock, flags)
> +...
> +
> +In this case, the card is reset by the first write.  The driver attempts to
> +wait for the completion of the reset using udelay.  But since the write may be
> +delayed and the udelay will probably start executing right away, it may be
> +that there's not enough time for the write to actually arrive at the card and
> +for the reset to occur before the read is executed.  On some platforms, this
> +can result in a machine check.  Unfortunately, there's no way to guarantee
> +that a write has arrived at a device short of a read from the same address

Not same address space, any address space (IO, memory or config) of the
device will do.

> +space, so in some cases, udelay() is the only option.  In any case, the driver
> +should issue an ioflush() call prior to the udelay(), passing in 0 for the

No; using udelay() to try to wait for the flush of posted writes to
occur is always a bug.

> +addr argument if no safe register exists.  This will allow the platform to
> +make an effort to get the write as close to the device as possible before
> +allowing the udelay to begin.

What ioflush() call?  There's no such thing in PCI; this is effectively
our problem.  If there were a nice flush instruction we wouldn't have to
worry about reading from somewhere on the device.  The problem is that
there's no a-priori way of knowing what read is safe to do, so there's
no generic way to extract a posting flush API.

James



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 17:20                                             ` James Bottomley
@ 2004-09-21 17:46                                               ` Jesse Barnes
  2004-09-21 17:56                                                 ` James Bottomley
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 17:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2444 bytes --]

On Tuesday, September 21, 2004 1:20 pm, James Bottomley wrote:
> On Tue, 2004-09-21 at 13:03, Jesse Barnes wrote:
> > +Driver writers are responsible for ensuring that I/O writes to
> > memory-mapped +addresses on their device arrive when expected and in the
> > order intended.
>
> Really, no.  You're making the document more confusing.  PCI devices
> have *two* types of non DMA accesses (well, three, but lets forget
> configuration space for the moment).
>
> I/O Space accesses (what we call PIO) and memory accesses (what we call
> MMIO)

Right, fixed.

> > +This is typically done by reading a 'safe' device or bridge register,
> > causing +the I/O chipset to flush pending writes to the device before any
> > reads are
>
> Not "bridge register" the specs say this must be an access to the
> device's space.

Fixed.

> > Unfortunately, there's no way to guarantee +that a write has arrived at a
> > device short of a read from the same address
>
> Not same address space, any address space (IO, memory or config) of the
> device will do.

Ok, fixed.

> > +space, so in some cases, udelay() is the only option.  In any case, the
> > driver +should issue an ioflush() call prior to the udelay(), passing in
> > 0 for the
>
> No; using udelay() to try to wait for the flush of posted writes to
> occur is always a bug.

I thought we determined that sometimes there's no other option?  I'll remove 
that sentence anyway.


> > +addr argument if no safe register exists.  This will allow the platform
> > to +make an effort to get the write as close to the device as possible
> > before +allowing the udelay to begin.
>
> What ioflush() call?  There's no such thing in PCI; this is effectively
> our problem.  If there were a nice flush instruction we wouldn't have to
> worry about reading from somewhere on the device.  The problem is that
> there's no a-priori way of knowing what read is safe to do, so there's
> no generic way to extract a posting flush API.

The one I just added with this patch.  It takes a struct device and address 
arguments, the latter is supposed to be a safe register, like config space.

I'm tying together the concepts of write posting and ordering intentionally, 
but I wonder if that's a good idea.  On the one hand, we don't want to 
introduce *two* new driver APIs, but on the other, ensuring ordering and 
actually flushing writes out are two different things.  Maybe we just need to 
tell people 

Jesse

[-- Attachment #2: ioflush-ia64-2.patch --]
[-- Type: text/x-diff, Size: 11817 bytes --]

===== Documentation/io_ordering.txt 1.1 vs edited =====
--- 1.1/Documentation/io_ordering.txt	2003-03-18 02:02:11 -08:00
+++ edited/Documentation/io_ordering.txt	2004-09-21 10:32:06 -07:00
@@ -1,24 +1,60 @@
-On some platforms, so-called memory-mapped I/O is weakly ordered.  On such
-platforms, driver writers are responsible for ensuring that I/O writes to
-memory-mapped addresses on their device arrive in the order intended.  This is
-typically done by reading a 'safe' device or bridge register, causing the I/O
-chipset to flush pending writes to the device before any reads are posted.  A
-driver would usually use this technique immediately prior to the exit of a
-critical section of code protected by spinlocks.  This would ensure that
-subsequent writes to I/O space arrived only after all prior writes (much like a
-memory barrier op, mb(), only with respect to I/O).
+Dealing with posted writes
+--------------------------
 
-A more concrete example from a hypothetical device driver:
+Driver writers are responsible for ensuring that I/O writes to their device
+arrive when expected and in the order intended.  This is typically done by
+reading a 'safe' device register, causing the I/O chipset to flush pending
+writes to the device before any reads are sent to the target device.  A driver
+would usually use this technique immediately prior to a read after a card
+reset or the exit of a critical section of code protected by spinlocks.  This
+would ensure that subsequent I/O space accesses arrived only after all prior
+writes.  There are really two issues at play here, one is 'posting', i.e.
+memory-mapped I/O writes not sent to the device immediately, and ordering,
+where on a large system writes from different CPUs may arrive out of order.
 
-        ...
+Some pseudocode to illustrate the problem of write posting:
+
+...
+spin_lock_irqsave(&dev_lock, flags)
+...
+writel(resetval, reset_reg); /* reset the card */
+udelay(10); /* wait for reset (also needs pioflush) */
+val = readl(ring_ptr); /* read initial value */
+spin_unlock_irqrestore(&dev_lock, flags)
+...
+
+In this case, the card is reset by the first write.  The driver attempts to
+wait for the completion of the reset using udelay.  But since the write may be
+delayed and the udelay will probably start executing right away, it may be
+that there's not enough time for the write to actually arrive at the card and
+for the reset to occur before the read is executed.  On some platforms, this
+can result in a machine check.  Unfortunately, there's no way to guarantee
+that a write has arrived at a device short of a read from a safe register.  In
+any case, the driver should issue an ioflush() call prior to the udelay(),
+passing in 0 for the addr argument if no safe register exists or if the driver
+is merely trying to ensure write ordering.  This will allow the platform to
+make an effort to get the write as close to the device as possible before
+allowing the udelay to begin.  Example:
+
+...
+spin_lock_irqsave(&dev_lock, flags)
+...
+writel(resetval, reset_reg); /* reset the card */
+ioflush(dev, config_reg); /* flush out the write */
+udelay(10);
+val = readl(ring_ptr); /* read initial value */
+spin_unlock_irqrestore(&dev_lock, flags)
+...
+
+Here's an example of reordering of writes between CPUs on a NUMA machine:
+
+	...
 CPU A:  spin_lock_irqsave(&dev_lock, flags)
-CPU A:  val = readl(my_status);
 CPU A:  ...
 CPU A:  writel(newval, ring_ptr);
 CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
         ...
 CPU B:  spin_lock_irqsave(&dev_lock, flags)
-CPU B:  val = readl(my_status);
 CPU B:  ...
 CPU B:  writel(newval2, ring_ptr);
 CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
@@ -29,19 +65,21 @@
 
         ...
 CPU A:  spin_lock_irqsave(&dev_lock, flags)
-CPU A:  val = readl(my_status);
 CPU A:  ...
 CPU A:  writel(newval, ring_ptr);
-CPU A:  (void)readl(safe_register); /* maybe a config register? */
+CPU A:  ioflush(dev, 0);
 CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
         ...
 CPU B:  spin_lock_irqsave(&dev_lock, flags)
-CPU B:  val = readl(my_status);
 CPU B:  ...
 CPU B:  writel(newval2, ring_ptr);
-CPU B:  (void)readl(safe_register); /* maybe a config register? */
+CPU B:  ioflush(dev, 0);
 CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
 
-Here, the reads from safe_register will cause the I/O chipset to flush any
-pending writes before actually posting the read to the chipset, preventing
-possible data corruption.
+Here, the ioflush() call will cause the I/O chipset to flush any outstanding
+writes before actually sending the read to the chipset, preventing possible
+data corruption.
+
+inX and outX calls, on the other hand, are strongly ordered and non-postable.
+They do not need special handling.  But this is something to watch out for
+when converting drivers to use MMIO space from IO Port space.
===== arch/ia64/sn/io/machvec/iomv.c 1.9 vs edited =====
--- 1.9/arch/ia64/sn/io/machvec/iomv.c	2004-05-26 06:49:19 -07:00
+++ edited/arch/ia64/sn/io/machvec/iomv.c	2004-09-21 09:34:19 -07:00
@@ -54,23 +54,18 @@
 EXPORT_SYMBOL(sn_io_addr);
 
 /**
- * sn_mmiob - I/O space memory barrier
+ * __sn_ioflush - I/O space write flush
  *
- * Acts as a memory mapped I/O barrier for platforms that queue writes to 
- * I/O space.  This ensures that subsequent writes to I/O space arrive after
- * all previous writes.  For most ia64 platforms, this is a simple
- * 'mf.a' instruction.  For other platforms, mmiob() may have to read
- * a chipset register to ensure ordering.
+ * See include/asm-ia64/io.h and Documentation/io_ordering.txt for details.
  *
  * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
  * See PV 871084 for details about the WAR about zero value.
  *
  */
-void
-sn_mmiob (void)
+void __sn_ioflush(struct device *dev, unsigned long addr)
 {
 	while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) != 
 				SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
 		cpu_relax();
 }
-EXPORT_SYMBOL(sn_mmiob);
+EXPORT_SYMBOL(__sn_ioflush);
===== include/asm-ia64/io.h 1.19 vs edited =====
--- 1.19/include/asm-ia64/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-ia64/io.h	2004-09-21 10:30:16 -07:00
@@ -91,6 +91,27 @@
  */
 #define __ia64_mf_a()	ia64_mfa()
 
+/**
+ * __ia64_ioflush - I/O write flush
+ * @dev: device we're flushing
+ * @addr: safe register to read
+ *
+ * Flush I/O space writes out to their target device to ensure ordering.
+ * all previous writes.  For most ia64 platforms, this is a simple
+ * 'mf.a' instruction, so the address is ignored.  For other platforms,
+ * the address may be required to ensure proper ordering of writes to I/O space
+ * since a 'dummy' read might be necessary to barrier the write operation.
+ *
+ * If either @dev or @addr is 0, don't use it.  @addr should be 0 if the driver
+ * is just trying to make sure writes arrive in order.
+ *
+ * See Documentation/io_ordering.txt for more information.
+ */
+static inline void __ia64_ioflush (struct device *dev, unsigned long addr)
+{
+	ia64_mfa();
+}
+
 static inline const unsigned long
 __ia64_get_io_port_base (void)
 {
@@ -267,6 +288,7 @@
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
+#define __ioflush	platform_ioflush
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -280,6 +302,7 @@
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
+#define ioflush(d,a)	__ioflush(d,a)
 
 /*
  * The address passed to these functions are ioremap()ped already.
===== include/asm-ia64/machvec.h 1.26 vs edited =====
--- 1.26/include/asm-ia64/machvec.h	2004-08-03 16:05:22 -07:00
+++ edited/include/asm-ia64/machvec.h	2004-09-21 09:35:32 -07:00
@@ -62,6 +62,7 @@
 typedef void ia64_mv_outb_t (unsigned char, unsigned long);
 typedef void ia64_mv_outw_t (unsigned short, unsigned long);
 typedef void ia64_mv_outl_t (unsigned int, unsigned long);
+typedef void ia64_mv_ioflush_t (struct device *, unsigned long);
 typedef unsigned char ia64_mv_readb_t (void *);
 typedef unsigned short ia64_mv_readw_t (void *);
 typedef unsigned int ia64_mv_readl_t (void *);
@@ -130,6 +131,7 @@
 #  define platform_outb		ia64_mv.outb
 #  define platform_outw		ia64_mv.outw
 #  define platform_outl		ia64_mv.outl
+#  define platform_ioflush	ia64_mv.ioflush
 #  define platform_readb        ia64_mv.readb
 #  define platform_readw        ia64_mv.readw
 #  define platform_readl        ia64_mv.readl
@@ -176,6 +178,7 @@
 	ia64_mv_outb_t *outb;
 	ia64_mv_outw_t *outw;
 	ia64_mv_outl_t *outl;
+	ia64_mv_ioflush_t *ioflush;
 	ia64_mv_readb_t *readb;
 	ia64_mv_readw_t *readw;
 	ia64_mv_readl_t *readl;
@@ -218,6 +221,7 @@
 	platform_outb,				\
 	platform_outw,				\
 	platform_outl,				\
+	platform_ioflush,			\
 	platform_readb,				\
 	platform_readw,				\
 	platform_readl,				\
@@ -343,6 +347,9 @@
 #endif
 #ifndef platform_outl
 # define platform_outl		__ia64_outl
+#endif
+#ifndef platform_ioflush
+# define platform_ioflush	__ia64_ioflush
 #endif
 #ifndef platform_readb
 # define platform_readb		__ia64_readb
===== include/asm-ia64/machvec_init.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/machvec_init.h	2004-02-06 00:30:24 -08:00
+++ edited/include/asm-ia64/machvec_init.h	2004-09-21 09:35:47 -07:00
@@ -12,6 +12,7 @@
 extern ia64_mv_outb_t __ia64_outb;
 extern ia64_mv_outw_t __ia64_outw;
 extern ia64_mv_outl_t __ia64_outl;
+extern ia64_mv_ioflush_t __ia64_ioflush;
 extern ia64_mv_readb_t __ia64_readb;
 extern ia64_mv_readw_t __ia64_readw;
 extern ia64_mv_readl_t __ia64_readl;
===== include/asm-ia64/machvec_sn2.h 1.14 vs edited =====
--- 1.14/include/asm-ia64/machvec_sn2.h	2004-07-10 17:14:00 -07:00
+++ edited/include/asm-ia64/machvec_sn2.h	2004-09-21 09:36:14 -07:00
@@ -49,6 +49,7 @@
 extern ia64_mv_outb_t __sn_outb;
 extern ia64_mv_outw_t __sn_outw;
 extern ia64_mv_outl_t __sn_outl;
+extern ia64_mv_ioflush_t __sn_ioflush;
 extern ia64_mv_readb_t __sn_readb;
 extern ia64_mv_readw_t __sn_readw;
 extern ia64_mv_readl_t __sn_readl;
@@ -92,6 +93,7 @@
 #define platform_outb			__sn_outb
 #define platform_outw			__sn_outw
 #define platform_outl			__sn_outl
+#define platform_ioflush		__sn_ioflush
 #define platform_readb			__sn_readb
 #define platform_readw			__sn_readw
 #define platform_readl			__sn_readl
===== include/asm-ia64/sn/io.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/sn/io.h	2004-02-13 07:00:22 -08:00
+++ edited/include/asm-ia64/sn/io.h	2004-09-21 09:38:27 -07:00
@@ -58,8 +58,8 @@
 #include <asm/sn/sn2/shubio.h>
 
 /*
- * Used to ensure write ordering (like mb(), but for I/O space)
+ * Used to ensure write ordering
  */
-extern void sn_mmiob(void);
+extern void __sn_ioflush(struct device *dev, unsigned long addr);
 
 #endif /* _ASM_IA64_SN_IO_H */
===== include/asm-ia64/sn/sn2/io.h 1.6 vs edited =====
--- 1.6/include/asm-ia64/sn/sn2/io.h	2004-07-22 17:00:00 -07:00
+++ edited/include/asm-ia64/sn/sn2/io.h	2004-09-21 09:31:56 -07:00
@@ -11,8 +11,10 @@
 #include <linux/compiler.h>
 #include <asm/intrinsics.h>
 
-extern void * sn_io_addr(unsigned long port) __attribute_const__; /* Forward definition */
-extern void sn_mmiob(void); /* Forward definition */
+/* Forward declarations */
+struct device;
+extern void *sn_io_addr(unsigned long port) __attribute_const__;
+extern void __sn_ioflush(struct device *dev, unsigned long addr);
 
 #define __sn_mf_a()   ia64_mfa()
 
@@ -91,7 +93,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_ioflush(0, 0);
 	}
 }
 
@@ -102,7 +104,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_ioflush(0, 0);
 	}
 }
 
@@ -113,7 +115,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_ioflush(0, 0);
 	}
 }
 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 17:46                                               ` Jesse Barnes
@ 2004-09-21 17:56                                                 ` James Bottomley
  2004-09-21 18:09                                                   ` Jesse Barnes
  0 siblings, 1 reply; 78+ messages in thread
From: James Bottomley @ 2004-09-21 17:56 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Matthew Wilcox, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, 2004-09-21 at 13:46, Jesse Barnes wrote:
> > What ioflush() call?  There's no such thing in PCI; this is effectively
> > our problem.  If there were a nice flush instruction we wouldn't have to
> > worry about reading from somewhere on the device.  The problem is that
> > there's no a-priori way of knowing what read is safe to do, so there's
> > no generic way to extract a posting flush API.
> 
> The one I just added with this patch.  It takes a struct device and address 
> arguments, the latter is supposed to be a safe register, like config space.
> 
> I'm tying together the concepts of write posting and ordering intentionally, 
> but I wonder if that's a good idea.  On the one hand, we don't want to 
> introduce *two* new driver APIs, but on the other, ensuring ordering and 
> actually flushing writes out are two different things.  Maybe we just need to 
> tell people 

Really, I don't think this is a good idea.  If there were a way to
produce an api that was just ioflush(struct device *) then yes, since it
would reduce confusion.  However, since all your API does is hide the
fact that you're doing a MMIO read then the proposed API provides no
relief from the fact that the driver writer needs to understand
posting...all it does is add one extra function for them to misuse or
get confused about.

James



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 17:56                                                 ` James Bottomley
@ 2004-09-21 18:09                                                   ` Jesse Barnes
  2004-09-21 19:06                                                     ` Grant Grundler
  0 siblings, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 18:09 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Grant Grundler, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 1:56 pm, James Bottomley wrote:
> Really, I don't think this is a good idea.  If there were a way to
> produce an api that was just ioflush(struct device *) then yes, since it
> would reduce confusion.  However, since all your API does is hide the
> fact that you're doing a MMIO read then the proposed API provides no
> relief from the fact that the driver writer needs to understand
> posting...all it does is add one extra function for them to misuse or
> get confused about.

Agreed.  I'll rename it, fix the io_ordering.txt doc to only describe I/O 
ordering issues, and resubmit it solely as a performance improvement API.

Grant, you say that I/O writes can't possibly arrive out of order if issued 
from different CPUs on any machines you're aware of?  If not, it may be that 
the API will only benefit SGI machines.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 18:09                                                   ` Jesse Barnes
@ 2004-09-21 19:06                                                     ` Grant Grundler
  2004-09-21 19:40                                                       ` Jesse Barnes
  2004-09-21 21:03                                                       ` Jeremy Higdon
  0 siblings, 2 replies; 78+ messages in thread
From: Grant Grundler @ 2004-09-21 19:06 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: James Bottomley, Matthew Wilcox, Grant Grundler, Andrew Vasquez,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 02:09:10PM -0400, Jesse Barnes wrote:
> Grant, you say that I/O writes can't possibly arrive out of order if issued 
> from different CPUs on any machines you're aware of?

Well, that's what I believe right now based on my experience.
Re-reading different parts of the PCI local bus spec doesn't give
me the warm fuzzies - especially in regard to retries.

The qla example Jeremy gave earlier wasn't clear on where the
mmio write reordering was taking place.  Based on my understanding
(and I've been wrong before) of ia64 .rel/.acq semantics the reordering
didn't happen in the CPU coherency - ie writes are ordered WRT
to each other out to the Mckinley bus. That really only leaves
the chipset suspect - ie anything between Mckinley bus and PCI bus.

And this chipset is already known to violate DMA writes and
MMIO read return ordering rules...which led to read_relaxed().
We've agreed DMA reads bypassing MMIO writes violates the spec
but is "mostly harmless" (I haven't seen a case in real life where
it matters). HP parisc and ia64 platforms also implement
this "optimization" (bug) and I've not seen a problem with it yet.

> If not, it may be that the API will only benefit SGI machines.

Or hurt them.
We haven't talked about the flip side of this workaround much.

Normally, I expect the chipset is responsible for maintaining
order of MMIO writes - though that sounds near impossible on
a large fabric where the spinlock transactions may take a different
path than the IO transactions. But allowing out of order MMIO
write transactions is a big deal if we want high performance
devices that can operate correctly by only consuming MMIO
writes and doing DMA for everything else. Adding the MMIO
reads to enforcing MMIO write ordering will set us all back
a few years in terms of performance.

This behavior (bug?) essentially means drivers that only do MMIO
writes during normal operation (and MMIO reads during set up) will
not operate correctly on Altix (or similar if they exist) boxes.

hth,
grant

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 19:06                                                     ` Grant Grundler
@ 2004-09-21 19:40                                                       ` Jesse Barnes
  2004-09-21 22:44                                                         ` Grant Grundler
  2004-09-21 21:03                                                       ` Jeremy Higdon
  1 sibling, 1 reply; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 19:40 UTC (permalink / raw)
  To: Grant Grundler
  Cc: James Bottomley, Matthew Wilcox, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tuesday, September 21, 2004 3:06 pm, Grant Grundler wrote:
> The qla example Jeremy gave earlier wasn't clear on where the
> mmio write reordering was taking place.  Based on my understanding
> (and I've been wrong before) of ia64 .rel/.acq semantics the reordering
> didn't happen in the CPU coherency - ie writes are ordered WRT
> to each other out to the Mckinley bus. That really only leaves
> the chipset suspect - ie anything between Mckinley bus and PCI bus.

Yep, that's where the reordering occurs for us.

> Or hurt them.
> We haven't talked about the flip side of this workaround much.
>
> Normally, I expect the chipset is responsible for maintaining
> order of MMIO writes - though that sounds near impossible on
> a large fabric where the spinlock transactions may take a different
> path than the IO transactions.

I think it is.  I wouldn't be surprised if your hw guys told you the same 
thing for your large machines.

> But allowing out of order MMIO 
> write transactions is a big deal if we want high performance
> devices that can operate correctly by only consuming MMIO
> writes and doing DMA for everything else. Adding the MMIO
> reads to enforcing MMIO write ordering will set us all back
> a few years in terms of performance.

But *only* prior to releasing a lock.  Writes in program order will arrive in 
order, it's just that writes from some other CPU may beat them to the device.  
So you'll only have one read for every so many writes.  And if your chipset 
supports it, you don't have to do a full read out to the target bus, but just 
to the local chipset.

> This behavior (bug?) essentially means drivers that only do MMIO
> writes during normal operation (and MMIO reads during set up) will
> not operate correctly on Altix (or similar if they exist) boxes.

It's a pretty hard bug to hit, as Jeremy mentioned.  You'll only see it on 
large boxes.  The performance gain for us comes from not doing a full read 
from the device, but just making sure our local hub has sent the write to its 
destination hub, where ordering is guaranteed.  Of course, it's only a gain 
if one assumes that the necessary full reads are already in place.  If not, 
then the hardware is already imposing I/O space write penalties anyway, 
except for all writes.  I'd think that's worse than just flushing the ones 
you care about, and only when you need to.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 19:40                                                       ` Jesse Barnes
@ 2004-09-21 22:44                                                         ` Grant Grundler
  0 siblings, 0 replies; 78+ messages in thread
From: Grant Grundler @ 2004-09-21 22:44 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Grant Grundler, James Bottomley, Matthew Wilcox, Andrew Vasquez,
	pj, SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 03:40:32PM -0400, Jesse Barnes wrote:
> > Normally, I expect the chipset is responsible for maintaining
> > order of MMIO writes - though that sounds near impossible on
> > a large fabric where the spinlock transactions may take a different
> > path than the IO transactions.
> 
> I think it is.  I wouldn't be surprised if your hw guys told you the same 
> thing for your large machines.

I was told Superdome chipsets (SX1000) do NOT have this problem.
AFAIK, it only scales to 16 nodes (4 sockets/node) and the fabric
may not have the multiple paths SGI Altix (or other interconnects)
may have. (And I'd like the "other chipsets" better defined if anyone
knows of other chipsets).

I was also told likely *all* larger PCI-E systems will have this problem.
Ie any time the fabric allows multiple pathes to the same device.

And as usual, I was wrong. Someone educated me on HP V-class systems (PARISC)
having the same problem when running in NUMA config (4 node cluster).
Of course parisc-linux doesn't run on V-class...and HP didn't sell that
many V-class clusters...but here's the story anyway.

Despite strongly ordered CPU accesses, the chipset couldn't preserve
ordering across the NUMA links.  The NIC drivers exposed this problem
when writing descriptors to remote shared memory.  This shared memory
is implemented on each Host PCI bus controller for that bus segment.
ie some MMIO writes had to cross both a NUMA Link and X-bar compared
to local nodes only crossing the X-bar.
Result was some of the descriptors picked up by NICs would contain garbage.
The workaround was adding MMIO Reads after each descriptor was
updated - exactly what SGI wants to do for qla driver.

...
> So you'll only have one read for every so many writes.  And if your chipset 
> supports it, you don't have to do a full read out to the target bus, but just 
> to the local chipset.

Yes - agreed - not every MMIO write and we really only need to guarantee
the writes have reach the targeted PCI segment.
But it's still alot more reads and will measureably affect performance
on smaller boxes if it's done unconditionally.

Large scale NUMA is going to suffer under RDMA.
RDMA using smaller boxes will be much faster with at
least 10000-2000 cycles less overhead and latency per packet.

> It's a pretty hard bug to hit, as Jeremy mentioned.  You'll only see it on 
> large boxes.

Yes - a fabric that can't preserve ordering is the key bit here.

> If not, 
> then the hardware is already imposing I/O space write penalties anyway, 
> except for all writes.  I'd think that's worse than just flushing the ones 
> you care about, and only when you need to.

I have the impression it's not feasible for HW to enforce ordering on large
fabrics.  And the "standard" PCI programming model clearly can't deal
with out of order MMIO writes. You guys just have the misfortune of
pushing the "envelope" right now.

I don't want to overload an interface that deals with write posting
with MMIO write ordering workarounds. The cases we need to enforce
write posting are different from the cases which need to enforce
MMIO write ordering. I think I understand both well enough now
and hope you do too. :^) 

hth,
grant

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 19:06                                                     ` Grant Grundler
  2004-09-21 19:40                                                       ` Jesse Barnes
@ 2004-09-21 21:03                                                       ` Jeremy Higdon
  2004-09-21 21:11                                                         ` Matthew Wilcox
  1 sibling, 1 reply; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21 21:03 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Jesse Barnes, James Bottomley, Matthew Wilcox, Andrew Vasquez, pj,
	SCSI Mailing List, mdr, jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 01:06:25PM -0600, Grant Grundler wrote:
> On Tue, Sep 21, 2004 at 02:09:10PM -0400, Jesse Barnes wrote:
> > Grant, you say that I/O writes can't possibly arrive out of order if issued 
> > from different CPUs on any machines you're aware of?
> 
> Well, that's what I believe right now based on my experience.
> Re-reading different parts of the PCI local bus spec doesn't give
> me the warm fuzzies - especially in regard to retries.
> 
> The qla example Jeremy gave earlier wasn't clear on where the
> mmio write reordering was taking place.  Based on my understanding
> (and I've been wrong before) of ia64 .rel/.acq semantics the reordering
> didn't happen in the CPU coherency - ie writes are ordered WRT
> to each other out to the Mckinley bus. That really only leaves
> the chipset suspect - ie anything between Mckinley bus and PCI bus.

Do IO space writes follow the memory coherency rules?

> And this chipset is already known to violate DMA writes and
> MMIO read return ordering rules...which led to read_relaxed().
> We've agreed DMA reads bypassing MMIO writes violates the spec
> but is "mostly harmless" (I haven't seen a case in real life where
> it matters). HP parisc and ia64 platforms also implement
> this "optimization" (bug) and I've not seen a problem with it yet.
> 
> > If not, it may be that the API will only benefit SGI machines.
> 
> Or hurt them.
> We haven't talked about the flip side of this workaround much.
> 
> Normally, I expect the chipset is responsible for maintaining
> order of MMIO writes - though that sounds near impossible on
> a large fabric where the spinlock transactions may take a different
> path than the IO transactions. But allowing out of order MMIO
> write transactions is a big deal if we want high performance
> devices that can operate correctly by only consuming MMIO
> writes and doing DMA for everything else. Adding the MMIO
> reads to enforcing MMIO write ordering will set us all back
> a few years in terms of performance.

On Altix, we do have the sn_mmiob() option.  I don't think that
we want to Linux API to require that

CPUA:	writel(value_X, common_register)
	spin_unlock(common_lock)

CPUB:	spin_lock(common_lock)
	writel(value_Y, common_register)

be strongly ordered, because it places a performance penalty
on all writes.  On Altix, we have the "sn_mmiob()" function to do
that.  I.e.

CPUA:	writel(value_X, common_register)
	sn_mmiob()
	spin_unlock(common_lock)

CPUB:	spin_lock(common_lock)
	writel(value_Y, common_register)

would strongly order the writes.

I think the flush that Jesse was talking about was an architecture
independent abstraction of the sn_mmiob().

Note that the sn_mmiob() does not guarantee that the preceding writel is
completed -- just that there will be no other writeX issued from other
CPUs ahead of it.

However, if we lose this argument, the simple answer is to add an
sn_mmiob to the Altix platform's writeX functions.  See ___sn_outb(),
by the way.

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 21:03                                                       ` Jeremy Higdon
@ 2004-09-21 21:11                                                         ` Matthew Wilcox
  2004-09-21 21:43                                                           ` Jeremy Higdon
  0 siblings, 1 reply; 78+ messages in thread
From: Matthew Wilcox @ 2004-09-21 21:11 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Grant Grundler, Jesse Barnes, James Bottomley, Matthew Wilcox,
	Andrew Vasquez, pj, SCSI Mailing List, mdr, jeremy, djh,
	Andrew Morton

On Tue, Sep 21, 2004 at 02:03:42PM -0700, Jeremy Higdon wrote:
> On Altix, we do have the sn_mmiob() option.  I don't think that
> we want to Linux API to require that
> 
> CPUA:	writel(value_X, common_register)
> 	spin_unlock(common_lock)
> 
> CPUB:	spin_lock(common_lock)
> 	writel(value_Y, common_register)
> 
> be strongly ordered, because it places a performance penalty
> on all writes.

I disagree with you.  If this were memory, then you would expect
common_register to be set to value_Y after this sequence.  Why should
IO be different?

> On Altix, we have the "sn_mmiob()" function to do
> that.  I.e.
> 
> CPUA:	writel(value_X, common_register)
> 	sn_mmiob()
> 	spin_unlock(common_lock)
> 
> CPUB:	spin_lock(common_lock)
> 	writel(value_Y, common_register)
> 
> would strongly order the writes.

I think your _raw_spin_unlock() should include an sn_mmiob().

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 21:11                                                         ` Matthew Wilcox
@ 2004-09-21 21:43                                                           ` Jeremy Higdon
  2004-09-21 22:33                                                             ` Jesse Barnes
  2004-09-22  0:02                                                             ` Matthew Wilcox
  0 siblings, 2 replies; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-21 21:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Grant Grundler, Jesse Barnes, James Bottomley, Matthew Wilcox,
	Andrew Vasquez, pj, SCSI Mailing List, mdr, jeremy, djh,
	Andrew Morton

On Tue, Sep 21, 2004 at 10:11:08PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 21, 2004 at 02:03:42PM -0700, Jeremy Higdon wrote:
> > On Altix, we do have the sn_mmiob() option.  I don't think that
> > we want to Linux API to require that
> > 
> > CPUA:	writel(value_X, common_register)
> > 	spin_unlock(common_lock)
> > 
> > CPUB:	spin_lock(common_lock)
> > 	writel(value_Y, common_register)
> > 
> > be strongly ordered, because it places a performance penalty
> > on all writes.
> 
> I disagree with you.  If this were memory, then you would expect
> common_register to be set to value_Y after this sequence.  Why should
> IO be different?

I/O is partially outside of the memory coherency domain.  So it is
different from memory, even though we might wish that it weren't.

> > On Altix, we have the "sn_mmiob()" function to do
> > that.  I.e.
> > 
> > CPUA:	writel(value_X, common_register)
> > 	sn_mmiob()
> > 	spin_unlock(common_lock)
> > 
> > CPUB:	spin_lock(common_lock)
> > 	writel(value_Y, common_register)
> > 
> > would strongly order the writes.
> 
> I think your _raw_spin_unlock() should include an sn_mmiob().

That could be very painful.
On Irix, we actually had a separate spinunlock (io_spin_unlock) that
added the MIPS equivalent.  I'm assuming we don't want to entertain
that here :-)  (Though tell me if I'm wrong)

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 21:43                                                           ` Jeremy Higdon
@ 2004-09-21 22:33                                                             ` Jesse Barnes
  2004-09-22  0:02                                                             ` Matthew Wilcox
  1 sibling, 0 replies; 78+ messages in thread
From: Jesse Barnes @ 2004-09-21 22:33 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Matthew Wilcox, Grant Grundler, James Bottomley, Matthew Wilcox,
	Andrew Vasquez, pj, SCSI Mailing List, mdr, jeremy, djh,
	Andrew Morton

On Tuesday, September 21, 2004 5:43 pm, Jeremy Higdon wrote:
> > > CPUA: writel(value_X, common_register)
> > >  sn_mmiob()
> > >  spin_unlock(common_lock)
> > >
> > > CPUB: spin_lock(common_lock)
> > >  writel(value_Y, common_register)
> > >
> > > would strongly order the writes.
> >
> > I think your _raw_spin_unlock() should include an sn_mmiob().
>
> That could be very painful.
> On Irix, we actually had a separate spinunlock (io_spin_unlock) that
> added the MIPS equivalent.  I'm assuming we don't want to entertain
> that here :-)  (Though tell me if I'm wrong)

It's worth noting that this affects some MIPS systems as well, so that's at 
least two Linux arches that would benefit from a nice solution to this 
problem.

Jesse

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21 21:43                                                           ` Jeremy Higdon
  2004-09-21 22:33                                                             ` Jesse Barnes
@ 2004-09-22  0:02                                                             ` Matthew Wilcox
  2004-09-22  1:16                                                               ` Jeremy Higdon
  1 sibling, 1 reply; 78+ messages in thread
From: Matthew Wilcox @ 2004-09-22  0:02 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Matthew Wilcox, Grant Grundler, Jesse Barnes, James Bottomley,
	Matthew Wilcox, Andrew Vasquez, pj, SCSI Mailing List, mdr,
	jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 02:43:02PM -0700, Jeremy Higdon wrote:
> That could be very painful.
> On Irix, we actually had a separate spinunlock (io_spin_unlock) that
> added the MIPS equivalent.  I'm assuming we don't want to entertain
> that here :-)  (Though tell me if I'm wrong)

That seems like a reasonable solution to me.  Want to float it to Linus
and see how he takes it?

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-22  0:02                                                             ` Matthew Wilcox
@ 2004-09-22  1:16                                                               ` Jeremy Higdon
  2004-09-22  1:44                                                                 ` Grant Grundler
  0 siblings, 1 reply; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-22  1:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Grant Grundler, Jesse Barnes, James Bottomley, Matthew Wilcox,
	Andrew Vasquez, pj, SCSI Mailing List, mdr, jeremy, djh,
	Andrew Morton

On Wed, Sep 22, 2004 at 01:02:11AM +0100, Matthew Wilcox wrote:
> On Tue, Sep 21, 2004 at 02:43:02PM -0700, Jeremy Higdon wrote:
> > That could be very painful.
> > On Irix, we actually had a separate spinunlock (io_spin_unlock) that
> > added the MIPS equivalent.  I'm assuming we don't want to entertain
> > that here :-)  (Though tell me if I'm wrong)
> 
> That seems like a reasonable solution to me.  Want to float it to Linus
> and see how he takes it?


First let's make sure that we like it best.

The other alternative is an explicit I/O barrier.

	writel(high-water, request-in)
	mmiob()  /* memory-mapped I/O barrier */
	spin_unlock(hostlock)

versus

	writel(high-water, request-in)
	spin_unlock_iob(hostlock)

I think I like the former better (even though we had the latter in Irix).
But I don't have a strong preference.

One of us will offer a patch.

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-22  1:16                                                               ` Jeremy Higdon
@ 2004-09-22  1:44                                                                 ` Grant Grundler
  2004-09-22  2:58                                                                   ` Jeremy Higdon
  0 siblings, 1 reply; 78+ messages in thread
From: Grant Grundler @ 2004-09-22  1:44 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Matthew Wilcox, Grant Grundler, Jesse Barnes, James Bottomley,
	Matthew Wilcox, Andrew Vasquez, pj, SCSI Mailing List, mdr,
	jeremy, djh, Andrew Morton

On Tue, Sep 21, 2004 at 06:16:52PM -0700, Jeremy Higdon wrote:
> First let's make sure that we like it best.
> 
> The other alternative is an explicit I/O barrier.
> 
> 	writel(high-water, request-in)
> 	mmiob()  /* memory-mapped I/O barrier */
> 	spin_unlock(hostlock)

I strongly prefer a seperate function call.

I'm wondering if one for write posting and a different for write ordering
would be called for. These are really distinct uses but I'm not sure
it's a distinction that's clear to most folks and they will get misused.
If not, then io_mwb() would be my preference.

> versus
> 
> 	writel(high-water, request-in)
> 	spin_unlock_iob(hostlock)

More variants of spinlocks?
When to use and how to implement it become even less clear.

thanks,
grant

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-22  1:44                                                                 ` Grant Grundler
@ 2004-09-22  2:58                                                                   ` Jeremy Higdon
  0 siblings, 0 replies; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-22  2:58 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Matthew Wilcox, Jesse Barnes, James Bottomley, Matthew Wilcox,
	Andrew Vasquez, pj, SCSI Mailing List, mdr, jeremy, djh,
	Andrew Morton

On Tue, Sep 21, 2004 at 07:44:28PM -0600, Grant Grundler wrote:
> On Tue, Sep 21, 2004 at 06:16:52PM -0700, Jeremy Higdon wrote:
> > First let's make sure that we like it best.
> > 
> > The other alternative is an explicit I/O barrier.
> > 
> > 	writel(high-water, request-in)
> > 	mmiob()  /* memory-mapped I/O barrier */
> > 	spin_unlock(hostlock)
> 
> I strongly prefer a seperate function call.
> 
> I'm wondering if one for write posting and a different for write ordering
> would be called for. These are really distinct uses but I'm not sure
> it's a distinction that's clear to most folks and they will get misused.
> If not, then io_mwb() would be my preference.

I risk opening yet another can of worms, but here goes . . .

I think I'm convinced that we want separate solutions for
write ordering and (I'll call it) write flushing.

Both issues are derived from write posting.

I'd define write posting as: a store instruction completing
before the write is completed.  Write ordering is a problem
if posted writes from two CPUs can complete to the device
out of order.  Write flushing would be the process of
assuring that a posted write has indeed been completed.
A write flush takes care of write ordering.

I think the only way to perform a write flush is to issue
a register read to the card in question.  This makes it
very heavy weight when all you need is ordering, especially
since ordering is apparently not a problem for many (most?)
platforms that do write posting.

So I think we need one solution for write ordering, and
a separate heavier-weight solution for write flushing.  For
platforms in which write posting does not create ordering problems
the write ordering primitive would be a no-op.

The solution for write ordering on the two arches I know about
is processor/CPU chipset specific (SGI MIPS is a "sync" instruction,
Altix is a read of a SHub register).

Whether we call it io_mwb() or mmiob() is not important to me.
Does io_mwb stand for I/O Memory-mapped Write Barrier?

Sorry to be long-winded.

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-21  0:09                           ` Jesse Barnes
  2004-09-21  5:46                             ` Grant Grundler
@ 2004-09-21 23:03                             ` Guennadi Liakhovetski
  1 sibling, 0 replies; 78+ messages in thread
From: Guennadi Liakhovetski @ 2004-09-21 23:03 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Grant Grundler, Andrew Vasquez, pj, linux-scsi, mdr, jeremy, djh,
	Andrew Morton

On Mon, 20 Sep 2004, Jesse Barnes wrote:

> On Monday, September 20, 2004 4:27 pm, Grant Grundler wrote:
>> I understand "write posting" as when the CPU posts the write
>> to the chipset and the chipset says the write is done even though
>> it hasn't reached the PCI device. It just means the write has reached
>> the PCI "domain" (which is supposed to be strongly ordered).
>
> That's my understanding too.

It might be OT for this specific discussion - but ("for completeness") - I 
think, there are PCI host-controllers that implement _only_ PCI-to-host 
post-write and read FIFOs (buffers)... Example - IT8152.

Regards
Guennadi
---
Guennadi Liakhovetski


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-16 20:56         ` Andrew Vasquez
  2004-09-16 21:09           ` Jesse Barnes
@ 2004-09-16 23:14           ` Jeremy Higdon
  1 sibling, 0 replies; 78+ messages in thread
From: Jeremy Higdon @ 2004-09-16 23:14 UTC (permalink / raw)
  To: Andrew Vasquez
  Cc: Jesse Barnes, Paul Jackson, linux-scsi, mdr, jeremy, djh, jbarnes,
	Andrew Morton

On Thu, Sep 16, 2004 at 01:56:50PM -0700, Andrew Vasquez wrote:
> On Thu, 2004-09-16 at 13:05, Jesse Barnes wrote:
> > On Thursday, September 16, 2004 12:56 pm, Paul Jackson wrote:
> > > Andrew Vasquez has been looking at this, via private email with just
> > > me (no progress yet).  Figured I update the larger list with this much ...
> > 
> > It seems to be failing on one of the accesses to PCI_COMMAND in config space 
> > in qla2x00_reset_chip().  I'm checking now to see if we're accessing the card 
> > right after a reset but before the card has finished.  That would cause a 
> > master abort, the symptom I'm seeing at least.
> > 
> 
> Interesting, the only changes in reset_chip() are for PCI posting
> issues.  Relevant diff attached.
> 
> --
> Andrew

Are all of those reads really necessary?  Generally the only reason
for doing a read to flush a posted write is for timing issues (in
which the read may not be good enough, according to a thread I saw
from Grant Grundler), or to enforce ordering before releasing a
lock (sleeping or spinning).

Have you run into platforms in which two I/O writes from one CPU are
retired out of order?

jeremy

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-16 19:56     ` Paul Jackson
  2004-09-16 20:05       ` Jesse Barnes
@ 2004-09-16 20:11       ` Andrew Morton
  1 sibling, 0 replies; 78+ messages in thread
From: Andrew Morton @ 2004-09-16 20:11 UTC (permalink / raw)
  To: Paul Jackson, Bjorn Helgaas
  Cc: andrew.vasquez, linux-scsi, mdr, jeremy, djh, jbarnes

Paul Jackson <pj@sgi.com> wrote:
>
>  I am still seeing the SN2 SCSI QLA failure as I reported yesterday, but
>  now against 2.6.9-rc2-mm1.

Is it an acpi thing?  Try reverting the three
incorrect-pci-interrupt-assignment* patches?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* SCSI QLA not working on latest *-mm SN2
@ 2004-09-15 22:51 Paul Jackson
  2004-09-15 23:13 ` Andrew Morton
  0 siblings, 1 reply; 78+ messages in thread
From: Paul Jackson @ 2004-09-15 22:51 UTC (permalink / raw)
  To: linux-scsi, Andrew Vasquez; +Cc: mdr, jeremy, djh, jbarnes, Andrew Morton

Andrew Vasquez,

[My inbox had a message recommending a copy linux-scsi as well,
 so resending to include that list as well. ]

Jeremy Higdon recommended I send this to you.  I am running into
a fatal problem with SCSI QLA2, unable to boot using the latest *-mm
on SGI's SN2 hardware (don't know if other hardware involved or not).

This is on 2.6.9-rc1-mm5, and more recent variants of *-mm.

$ grep QLA2XXX_VERSION drivers/scsi/qla2xxx/qla_version.h
#define QLA2XXX_VERSION      "8.00.00b21-k"

My source currently includes the patch of Monday that begins:

======================================================================
===== drivers/scsi/qla2xxx/qla_os.c 1.46 vs edited =====
--- 1.46/drivers/scsi/qla2xxx/qla_os.c	2004-09-06 12:07:52 -07:00
+++ edited/drivers/scsi/qla2xxx/qla_os.c	2004-09-13 14:07:23 -07:00
@@ -2892,6 +2892,19 @@
 			continue;
 		}
 
+		/* get consistent memory allocated for init control block */
+		ha->init_cb = dma_pool_alloc(ha->s_dma_pool, GFP_KERNEL,
...
======================================================================

I am pretty sure that I also saw this problem earlier Monday morning,
before this patch.

Beginning with 2.6.9-rc1-mm5, I have had to comment out some of the QLA
config options in order to boot on an SN2.  That is, unless I make the
following edit to my .config:

========================================
--- 2.6.9-rc2-mmx/.config       2004-09-14 23:03:21.000000000 -0700
+++ 2.6.9-rc2-mmx/.config.disable_qla   2004-09-14 23:02:50.000000000 -0700
@@ -315,9 +315,9 @@ CONFIG_SCSI_SATA_VITESSE=y
 CONFIG_SCSI_QLOGIC_1280=y
 CONFIG_SCSI_QLA2XXX=y
 # CONFIG_SCSI_QLA21XX is not set
-CONFIG_SCSI_QLA22XX=y
-CONFIG_SCSI_QLA2300=y
-CONFIG_SCSI_QLA2322=y
+# CONFIG_SCSI_QLA22XX is not set
+# CONFIG_SCSI_QLA2300 is not set
+# CONFIG_SCSI_QLA2322 is not set
 # CONFIG_SCSI_QLA6312 is not set
 # CONFIG_SCSI_QLA6322 is not set
 # CONFIG_SCSI_DC395x is not set
========================================

my boot fails with the following output:

========================================
Uniform CD-ROM driver Revision: 3.20
qla1280: QLA12160 found on PCI bus 1, dev 3
ACPI: PCI interrupt 0000:01:03.0[A]: no GSI
scsi(0): Enabling SN2 PCI DMA dual channel lockup workaround
scsi(0): Enabling SN2 PCI DMA workaround
scsi(0:0): Resetting SCSI BUS
scsi(0:1): Resetting SCSI BUS
scsi0 : QLogic QLA12160 PCI to SCSI Host Adapter
       Firmware version: 10.04.32, Driver version 3.24.4
  Vendor: SGI       Model: ST336753LC        Rev: 2741
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(0:0:1:0): Sync: period 9, offset 14, Wide, DT, Tagged queuing: depth 255
  Vendor: SGI       Model: ST336753LC        Rev: 2741
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi(0:0:2:0): Sync: period 9, offset 14, Wide, DT, Tagged queuing: depth 255
QLogic Fibre Channel HBA Driver (a0000001007ab1d0)
ACPI: PCI interrupt 0000:02:01.0[A]: no GSI
qla2200 0000:02:01.0: Found an ISP2200, irq 58, iobase 0xc00000080cc00000
qla2200 0000:02:01.0: Configuring PCI space...
PCI: slot 0000:02:01.0 has incorrect PCI cache line size of 0 bytes, correcting to 128
POD entered via OS requested halt, using Cac mode
========================================

This is on a couple of small (4 and 8 cpu) SN2 systems, that boot with
earlier kernels just fine.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: SCSI QLA not working on latest *-mm SN2
  2004-09-15 22:51 Paul Jackson
@ 2004-09-15 23:13 ` Andrew Morton
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Morton @ 2004-09-15 23:13 UTC (permalink / raw)
  To: Paul Jackson; +Cc: linux-scsi, andrew.vasquez, mdr, jeremy, djh, jbarnes

Paul Jackson <pj@sgi.com> wrote:
>
> Jeremy Higdon recommended I send this to you.  I am running into
> a fatal problem with SCSI QLA2, unable to boot using the latest *-mm
> on SGI's SN2 hardware (don't know if other hardware involved or not).

Do you have this?


From: Andrew Vasquez <andrew.vasquez@qlogic.com>

Hmm, there seems to be some merging problems in changeset 1.44 for qla_os.c
-- the 'DMA pool/api usage' patch I sent is not completely integrated
(appears to be massaging problems while attempting to apply on top off
1.43).

Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 25-akpm/drivers/scsi/qla2xxx/qla_os.c |   28 +++++++++++++---------------
 1 files changed, 13 insertions(+), 15 deletions(-)

diff -puN drivers/scsi/qla2xxx/qla_os.c~qlogic-oops-fix drivers/scsi/qla2xxx/qla_os.c
--- 25/drivers/scsi/qla2xxx/qla_os.c~qlogic-oops-fix	Tue Sep 14 16:20:34 2004
+++ 25-akpm/drivers/scsi/qla2xxx/qla_os.c	Tue Sep 14 16:20:34 2004
@@ -2892,6 +2892,19 @@ qla2x00_mem_alloc(scsi_qla_host_t *ha)
 			continue;
 		}
 
+		/* get consistent memory allocated for init control block */
+		ha->init_cb = dma_pool_alloc(ha->s_dma_pool, GFP_KERNEL,
+		    &ha->init_cb_dma);
+		if (ha->init_cb == NULL) {
+			qla_printk(KERN_WARNING, ha,
+			    "Memory Allocation failed - init_cb\n");
+
+			qla2x00_mem_free(ha);
+			msleep(100);
+
+			continue;
+		}
+		memset(ha->init_cb, 0, sizeof(init_cb_t));
 
 		/* Get consistent memory allocated for Get Port Database cmd */
 		ha->iodesc_pd = dma_pool_alloc(ha->s_dma_pool, GFP_KERNEL,
@@ -2983,21 +2996,6 @@ qla2x00_mem_alloc(scsi_qla_host_t *ha)
 			memset(ha->ct_sns, 0, sizeof(struct ct_sns_pkt));
 		}
 
-		/* Get consistent memory allocated for Get Port Database cmd */
-		ha->iodesc_pd = pci_alloc_consistent(ha->pdev,
-		    PORT_DATABASE_SIZE, &ha->iodesc_pd_dma);
-		if (ha->iodesc_pd == NULL) {
-			/* error */
-			qla_printk(KERN_WARNING, ha,
-			    "Memory Allocation failed - iodesc_pd\n");
-
-			qla2x00_mem_free(ha);
-			msleep(100);
-
-			continue;
-		}
-		memset(ha->iodesc_pd, 0, PORT_DATABASE_SIZE);
-
 		/* Done all allocations without any error. */
 		status = 0;
 
_


^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2004-09-22 21:33 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-17 22:55 SCSI QLA not working on latest *-mm SN2 Andrew Vasquez
2004-09-17 23:10 ` Jesse Barnes
2004-09-17 23:55 ` James Bottomley
2004-09-18  1:15   ` Andrew Vasquez
2004-09-18  1:25     ` Matthew Wilcox
2004-09-18  1:24       ` Andrew Vasquez
2004-09-18  2:36       ` Jeremy Higdon
2004-09-18 19:12       ` James Bottomley
  -- strict thread matches above, loose matches on Subject: below --
2004-09-21 21:22 Andrew Vasquez
2004-09-21 21:44 ` Jeremy Higdon
2004-09-21 22:37   ` Jesse Barnes
2004-09-21 22:49     ` Jeremy Higdon
2004-09-21 20:50 Andrew Vasquez
2004-09-21 21:06 ` Jeremy Higdon
2004-09-21 22:36   ` Jesse Barnes
2004-09-21 22:39     ` Jeremy Higdon
2004-09-21 22:43       ` Jesse Barnes
2004-09-21 22:54         ` Jeremy Higdon
2004-09-21 23:17           ` Jesse Barnes
2004-09-22 21:33             ` Jesse Barnes
2004-09-21 17:33 Andrew Vasquez
2004-09-21 17:52 ` Jesse Barnes
2004-09-21 18:04 ` Matthew Wilcox
2004-09-21 18:59 ` Matthew Wilcox
2004-09-21 19:10   ` Jesse Barnes
2004-09-21 15:58 Andrew Vasquez
2004-09-21 16:07 ` Jesse Barnes
2004-09-21 16:25 ` Matthew Wilcox
2004-09-21 16:33   ` James Bottomley
2004-09-21 20:39     ` Jeremy Higdon
2004-09-21 20:43   ` Jeremy Higdon
     [not found] <B179AE41C1147041AA1121F44614F0B060EF48@AVEXCH02.qlogic.org>
     [not found] ` <20040916121235.5e4f9c32.pj@sgi.com>
     [not found]   ` <1095362263.16326.12.camel@praka>
2004-09-16 19:56     ` Paul Jackson
2004-09-16 20:05       ` Jesse Barnes
2004-09-16 20:56         ` Andrew Vasquez
2004-09-16 21:09           ` Jesse Barnes
2004-09-16 21:40             ` Andrew Vasquez
2004-09-16 22:25               ` Andrew Morton
2004-09-16 22:29                 ` Jesse Barnes
2004-09-17 17:21                   ` Jesse Barnes
2004-09-18  6:10                     ` Grant Grundler
2004-09-20 22:40                       ` Jesse Barnes
2004-09-20 23:27                         ` Grant Grundler
2004-09-21  0:09                           ` Jesse Barnes
2004-09-21  5:46                             ` Grant Grundler
2004-09-21  6:45                               ` Jeremy Higdon
2004-09-21 13:29                                 ` Jesse Barnes
2004-09-21 13:25                               ` Jesse Barnes
2004-09-21 15:13                               ` Jesse Barnes
2004-09-21 15:41                                 ` James Bottomley
2004-09-21 15:58                                   ` Jesse Barnes
2004-09-21 16:01                                     ` Matthew Wilcox
2004-09-21 16:05                                       ` Jesse Barnes
2004-09-21 16:11                                         ` James Bottomley
2004-09-21 16:18                                           ` Jesse Barnes
2004-09-21 16:24                                             ` James Bottomley
2004-09-21 17:03                                           ` Jesse Barnes
2004-09-21 17:15                                             ` Matthew Wilcox
2004-09-21 17:24                                               ` Jesse Barnes
2004-09-21 17:20                                             ` James Bottomley
2004-09-21 17:46                                               ` Jesse Barnes
2004-09-21 17:56                                                 ` James Bottomley
2004-09-21 18:09                                                   ` Jesse Barnes
2004-09-21 19:06                                                     ` Grant Grundler
2004-09-21 19:40                                                       ` Jesse Barnes
2004-09-21 22:44                                                         ` Grant Grundler
2004-09-21 21:03                                                       ` Jeremy Higdon
2004-09-21 21:11                                                         ` Matthew Wilcox
2004-09-21 21:43                                                           ` Jeremy Higdon
2004-09-21 22:33                                                             ` Jesse Barnes
2004-09-22  0:02                                                             ` Matthew Wilcox
2004-09-22  1:16                                                               ` Jeremy Higdon
2004-09-22  1:44                                                                 ` Grant Grundler
2004-09-22  2:58                                                                   ` Jeremy Higdon
2004-09-21 23:03                             ` Guennadi Liakhovetski
2004-09-16 23:14           ` Jeremy Higdon
2004-09-16 20:11       ` Andrew Morton
2004-09-15 22:51 Paul Jackson
2004-09-15 23:13 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).