From: Grant Grundler <grundler@parisc-linux.org>
To: Jesse Barnes <jbarnes@engr.sgi.com>
Cc: Grant Grundler <grundler@parisc-linux.org>,
James Bottomley <James.Bottomley@steeleye.com>,
Matthew Wilcox <willy@debian.org>,
Andrew Vasquez <andrew.vasquez@qlogic.com>,
pj@sgi.com, SCSI Mailing List <linux-scsi@vger.kernel.org>,
mdr@cthulhu.engr.sgi.com, jeremy@cthulhu.engr.sgi.com,
djh@cthulhu.engr.sgi.com, Andrew Morton <akpm@osdl.org>
Subject: Re: SCSI QLA not working on latest *-mm SN2
Date: Tue, 21 Sep 2004 16:44:03 -0600 [thread overview]
Message-ID: <20040921224403.GA20053@colo.lackof.org> (raw)
In-Reply-To: <200409211540.32554.jbarnes@engr.sgi.com>
On Tue, Sep 21, 2004 at 03:40:32PM -0400, Jesse Barnes wrote:
> > Normally, I expect the chipset is responsible for maintaining
> > order of MMIO writes - though that sounds near impossible on
> > a large fabric where the spinlock transactions may take a different
> > path than the IO transactions.
>
> I think it is. I wouldn't be surprised if your hw guys told you the same
> thing for your large machines.
I was told Superdome chipsets (SX1000) do NOT have this problem.
AFAIK, it only scales to 16 nodes (4 sockets/node) and the fabric
may not have the multiple paths SGI Altix (or other interconnects)
may have. (And I'd like the "other chipsets" better defined if anyone
knows of other chipsets).
I was also told likely *all* larger PCI-E systems will have this problem.
Ie any time the fabric allows multiple pathes to the same device.
And as usual, I was wrong. Someone educated me on HP V-class systems (PARISC)
having the same problem when running in NUMA config (4 node cluster).
Of course parisc-linux doesn't run on V-class...and HP didn't sell that
many V-class clusters...but here's the story anyway.
Despite strongly ordered CPU accesses, the chipset couldn't preserve
ordering across the NUMA links. The NIC drivers exposed this problem
when writing descriptors to remote shared memory. This shared memory
is implemented on each Host PCI bus controller for that bus segment.
ie some MMIO writes had to cross both a NUMA Link and X-bar compared
to local nodes only crossing the X-bar.
Result was some of the descriptors picked up by NICs would contain garbage.
The workaround was adding MMIO Reads after each descriptor was
updated - exactly what SGI wants to do for qla driver.
...
> So you'll only have one read for every so many writes. And if your chipset
> supports it, you don't have to do a full read out to the target bus, but just
> to the local chipset.
Yes - agreed - not every MMIO write and we really only need to guarantee
the writes have reach the targeted PCI segment.
But it's still alot more reads and will measureably affect performance
on smaller boxes if it's done unconditionally.
Large scale NUMA is going to suffer under RDMA.
RDMA using smaller boxes will be much faster with at
least 10000-2000 cycles less overhead and latency per packet.
> It's a pretty hard bug to hit, as Jeremy mentioned. You'll only see it on
> large boxes.
Yes - a fabric that can't preserve ordering is the key bit here.
> If not,
> then the hardware is already imposing I/O space write penalties anyway,
> except for all writes. I'd think that's worse than just flushing the ones
> you care about, and only when you need to.
I have the impression it's not feasible for HW to enforce ordering on large
fabrics. And the "standard" PCI programming model clearly can't deal
with out of order MMIO writes. You guys just have the misfortune of
pushing the "envelope" right now.
I don't want to overload an interface that deals with write posting
with MMIO write ordering workarounds. The cases we need to enforce
write posting are different from the cases which need to enforce
MMIO write ordering. I think I understand both well enough now
and hope you do too. :^)
hth,
grant
next prev parent reply other threads:[~2004-09-21 22:44 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <B179AE41C1147041AA1121F44614F0B060EF48@AVEXCH02.qlogic.org>
[not found] ` <20040916121235.5e4f9c32.pj@sgi.com>
[not found] ` <1095362263.16326.12.camel@praka>
2004-09-16 19:56 ` SCSI QLA not working on latest *-mm SN2 Paul Jackson
2004-09-16 20:05 ` Jesse Barnes
2004-09-16 20:56 ` Andrew Vasquez
2004-09-16 21:09 ` Jesse Barnes
2004-09-16 21:40 ` Andrew Vasquez
2004-09-16 22:25 ` Andrew Morton
2004-09-16 22:29 ` Jesse Barnes
2004-09-17 17:21 ` Jesse Barnes
2004-09-18 6:10 ` Grant Grundler
2004-09-18 17:57 ` Documentation/io_ordering.txt is wrong Matthew Wilcox
2004-09-20 23:39 ` Jesse Barnes
2004-09-21 0:38 ` Jesse Barnes
2004-09-20 22:40 ` SCSI QLA not working on latest *-mm SN2 Jesse Barnes
2004-09-20 23:27 ` Grant Grundler
2004-09-21 0:09 ` Jesse Barnes
2004-09-21 5:46 ` Grant Grundler
2004-09-21 6:45 ` Jeremy Higdon
2004-09-21 13:29 ` Jesse Barnes
2004-09-21 13:25 ` Jesse Barnes
2004-09-21 15:13 ` Jesse Barnes
2004-09-21 15:41 ` James Bottomley
2004-09-21 15:58 ` Jesse Barnes
2004-09-21 16:01 ` Matthew Wilcox
2004-09-21 16:05 ` Jesse Barnes
2004-09-21 16:11 ` James Bottomley
2004-09-21 16:18 ` Jesse Barnes
2004-09-21 16:24 ` James Bottomley
2004-09-21 17:03 ` Jesse Barnes
2004-09-21 17:15 ` Matthew Wilcox
2004-09-21 17:24 ` Jesse Barnes
2004-09-21 17:20 ` James Bottomley
2004-09-21 17:46 ` Jesse Barnes
2004-09-21 17:56 ` James Bottomley
2004-09-21 18:09 ` Jesse Barnes
2004-09-21 19:06 ` Grant Grundler
2004-09-21 19:40 ` Jesse Barnes
2004-09-21 22:44 ` Grant Grundler [this message]
2004-09-21 21:03 ` Jeremy Higdon
2004-09-21 21:11 ` Matthew Wilcox
2004-09-21 21:43 ` Jeremy Higdon
2004-09-21 22:33 ` Jesse Barnes
2004-09-22 0:02 ` Matthew Wilcox
2004-09-22 1:16 ` Jeremy Higdon
2004-09-22 1:44 ` Grant Grundler
2004-09-22 2:58 ` Jeremy Higdon
2004-09-22 14:32 ` I/O write ordering Matthew Wilcox
2004-09-22 14:40 ` Benjamin Herrenschmidt
2004-09-22 14:50 ` Jesse Barnes
2004-09-22 14:47 ` James Bottomley
2004-09-22 14:51 ` Benjamin Herrenschmidt
2004-09-22 15:11 ` James Bottomley
2004-09-22 15:11 ` Benjamin Herrenschmidt
2004-09-22 15:22 ` James Bottomley
2004-09-22 15:28 ` Benjamin Herrenschmidt
2004-09-22 15:43 ` James Bottomley
2004-09-23 0:19 ` Benjamin Herrenschmidt
2004-09-23 1:58 ` Matthew Wilcox
2004-09-23 3:01 ` James Bottomley
2004-09-23 3:40 ` Benjamin Herrenschmidt
2004-09-23 4:26 ` Grant Grundler
2004-09-21 23:03 ` SCSI QLA not working on latest *-mm SN2 Guennadi Liakhovetski
2004-09-16 23:14 ` Jeremy Higdon
2004-09-16 20:11 ` Andrew Morton
2004-09-21 21:22 Andrew Vasquez
2004-09-21 21:44 ` Jeremy Higdon
2004-09-21 22:37 ` Jesse Barnes
2004-09-21 22:49 ` Jeremy Higdon
-- strict thread matches above, loose matches on Subject: below --
2004-09-21 20:50 Andrew Vasquez
2004-09-21 21:06 ` Jeremy Higdon
2004-09-21 22:36 ` Jesse Barnes
2004-09-21 22:39 ` Jeremy Higdon
2004-09-21 22:43 ` Jesse Barnes
2004-09-21 22:54 ` Jeremy Higdon
2004-09-21 23:17 ` Jesse Barnes
2004-09-22 21:33 ` Jesse Barnes
2004-09-21 17:33 Andrew Vasquez
2004-09-21 17:52 ` Jesse Barnes
2004-09-21 18:04 ` Matthew Wilcox
2004-09-21 18:59 ` Matthew Wilcox
2004-09-21 19:10 ` Jesse Barnes
2004-09-21 15:58 Andrew Vasquez
2004-09-21 16:07 ` Jesse Barnes
2004-09-21 16:25 ` Matthew Wilcox
2004-09-21 16:33 ` James Bottomley
2004-09-21 20:39 ` Jeremy Higdon
2004-09-21 20:43 ` Jeremy Higdon
2004-09-17 22:55 Andrew Vasquez
2004-09-17 23:10 ` Jesse Barnes
2004-09-17 23:55 ` James Bottomley
2004-09-18 1:15 ` Andrew Vasquez
2004-09-18 1:25 ` Matthew Wilcox
2004-09-18 1:24 ` Andrew Vasquez
2004-09-18 2:36 ` Jeremy Higdon
2004-09-18 19:12 ` James Bottomley
2004-09-15 22:51 Paul Jackson
2004-09-15 23:13 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040921224403.GA20053@colo.lackof.org \
--to=grundler@parisc-linux.org \
--cc=James.Bottomley@steeleye.com \
--cc=akpm@osdl.org \
--cc=andrew.vasquez@qlogic.com \
--cc=djh@cthulhu.engr.sgi.com \
--cc=jbarnes@engr.sgi.com \
--cc=jeremy@cthulhu.engr.sgi.com \
--cc=linux-scsi@vger.kernel.org \
--cc=mdr@cthulhu.engr.sgi.com \
--cc=pj@sgi.com \
--cc=willy@debian.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.