* Re: ethtool -d MCAs rx2600
2004-01-23 23:12 ethtool -d MCAs rx2600 Grant Grundler
@ 2004-01-23 23:56 ` Grant Grundler
2004-01-26 16:30 ` Jack Steiner
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Grant Grundler @ 2004-01-23 23:56 UTC (permalink / raw)
To: linux-ia64
On Fri, Jan 23, 2004 at 03:12:54PM -0800, Grant Grundler wrote:
> In case someone wants to dig more now, I've dropped the "errdump mca"
> output on
> ftp://gsyprf10.external.hp.com/kernels/rx2600/mca_ethtool
>
> (Matching vmlinuz, System.map, .config is also there 2.6.1-rc1.tgz)
Alex Williams tells me it's a PIO read timeout.
(confirms my guess given what man page said)
Offending address is likely 0x0000000090807000.
Matches nicely with what /proc/iomem thinks:
...
90000000-97ffffff : PCI Bus 0000:20
90800000-9080ffff : tg3
...
Now just need to hunt down the code that pokes at +0x7000.
thanks,
grant
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: ethtool -d MCAs rx2600
2004-01-23 23:12 ethtool -d MCAs rx2600 Grant Grundler
2004-01-23 23:56 ` Grant Grundler
@ 2004-01-26 16:30 ` Jack Steiner
2004-01-26 16:58 ` Matthew Wilcox
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Jack Steiner @ 2004-01-26 16:30 UTC (permalink / raw)
To: linux-ia64
On Fri, Jan 23, 2004 at 03:56:50PM -0800, Grant Grundler wrote:
> On Fri, Jan 23, 2004 at 03:12:54PM -0800, Grant Grundler wrote:
> > In case someone wants to dig more now, I've dropped the "errdump mca"
> > output on
> > ftp://gsyprf10.external.hp.com/kernels/rx2600/mca_ethtool
> >
> > (Matching vmlinuz, System.map, .config is also there 2.6.1-rc1.tgz)
>
> Alex Williams tells me it's a PIO read timeout.
> (confirms my guess given what man page said)
>
> Offending address is likely 0x0000000090807000.
> Matches nicely with what /proc/iomem thinks:
> ...
> 90000000-97ffffff : PCI Bus 0000:20
> 90800000-9080ffff : tg3
> ...
>
> Now just need to hunt down the code that pokes at +0x7000.
>
> thanks,
> grant
We see a similar problem with "ethtool -d" on the SGI SN systems. We havent
isolated the cause, but it looks similar - PIO read timeout.
FWIW, the failure occurs in the vicinity of tg3_get_regs+0xb60 called from
tg3_ethtool_ioctl+0xbb0. (This is on 2.4.21+).
Looks like it occurs here (but I dont put a lot of faith in this):
GET_REG32_LOOP(BUFMGR_MODE, 0x58);
GET_REG32_LOOP(RDMAC_MODE, 0x08);
>>>> GET_REG32_LOOP(WDMAC_MODE, 0x08);
GET_REG32_LOOP(RX_CPU_BASE, 0x280);
GET_REG32_LOOP(TX_CPU_BASE, 0x280);
--
Thanks
Jack Steiner (steiner@sgi.com) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: ethtool -d MCAs rx2600
2004-01-23 23:12 ethtool -d MCAs rx2600 Grant Grundler
2004-01-23 23:56 ` Grant Grundler
2004-01-26 16:30 ` Jack Steiner
@ 2004-01-26 16:58 ` Matthew Wilcox
2004-01-26 17:39 ` Jeff Garzik
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2004-01-26 16:58 UTC (permalink / raw)
To: linux-ia64
On Mon, Jan 26, 2004 at 10:30:23AM -0600, Jack Steiner wrote:
> On Fri, Jan 23, 2004 at 03:56:50PM -0800, Grant Grundler wrote:
> > On Fri, Jan 23, 2004 at 03:12:54PM -0800, Grant Grundler wrote:
> > > In case someone wants to dig more now, I've dropped the "errdump mca"
> > > output on
> > > ftp://gsyprf10.external.hp.com/kernels/rx2600/mca_ethtool
> > >
> > > (Matching vmlinuz, System.map, .config is also there 2.6.1-rc1.tgz)
> >
> > Alex Williams tells me it's a PIO read timeout.
> > (confirms my guess given what man page said)
> >
> > Offending address is likely 0x0000000090807000.
> > Matches nicely with what /proc/iomem thinks:
> > ...
> > 90000000-97ffffff : PCI Bus 0000:20
> > 90800000-9080ffff : tg3
> > ...
> >
> > Now just need to hunt down the code that pokes at +0x7000.
> >
> > thanks,
> > grant
>
> We see a similar problem with "ethtool -d" on the SGI SN systems. We havent
> isolated the cause, but it looks similar - PIO read timeout.
>
> FWIW, the failure occurs in the vicinity of tg3_get_regs+0xb60 called from
> tg3_ethtool_ioctl+0xbb0. (This is on 2.4.21+).
>
> Looks like it occurs here (but I dont put a lot of faith in this):
> GET_REG32_LOOP(BUFMGR_MODE, 0x58);
> GET_REG32_LOOP(RDMAC_MODE, 0x08);
> >>>> GET_REG32_LOOP(WDMAC_MODE, 0x08);
> GET_REG32_LOOP(RX_CPU_BASE, 0x280);
> GET_REG32_LOOP(TX_CPU_BASE, 0x280);
My suspicion is that some tg3 variants don't support this register, but
it's OK to read the register on x86 because it soft-fails. Most ia64
chipsets hard-fail so we need to avoid this. Jeff, Dave, can you comment?
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: ethtool -d MCAs rx2600
2004-01-23 23:12 ethtool -d MCAs rx2600 Grant Grundler
` (2 preceding siblings ...)
2004-01-26 16:58 ` Matthew Wilcox
@ 2004-01-26 17:39 ` Jeff Garzik
2004-01-29 19:18 ` Grant Grundler
2004-01-29 20:41 ` Jack Steiner
5 siblings, 0 replies; 7+ messages in thread
From: Jeff Garzik @ 2004-01-26 17:39 UTC (permalink / raw)
To: linux-ia64
Matthew Wilcox wrote:
> On Mon, Jan 26, 2004 at 10:30:23AM -0600, Jack Steiner wrote:
>>Looks like it occurs here (but I dont put a lot of faith in this):
>> GET_REG32_LOOP(BUFMGR_MODE, 0x58);
>> GET_REG32_LOOP(RDMAC_MODE, 0x08);
>> >>>> GET_REG32_LOOP(WDMAC_MODE, 0x08);
>> GET_REG32_LOOP(RX_CPU_BASE, 0x280);
>> GET_REG32_LOOP(TX_CPU_BASE, 0x280);
>
>
> My suspicion is that some tg3 variants don't support this register, but
> it's OK to read the register on x86 because it soft-fails. Most ia64
> chipsets hard-fail so we need to avoid this. Jeff, Dave, can you comment?
I get lockups occasionally on x86 too, but have had higher priority
things to look at. Since regdump is mainly an engineer's tool, we felt
it was a "use at your own risk" feature.
But if we can fix it, all the better.
Tangent -- I would love for somebody to take this output and prettyprint
it in userland ethtool package (d/l and cvs at
http://sf.net/projects/gkernel/).
Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: ethtool -d MCAs rx2600
2004-01-23 23:12 ethtool -d MCAs rx2600 Grant Grundler
` (3 preceding siblings ...)
2004-01-26 17:39 ` Jeff Garzik
@ 2004-01-29 19:18 ` Grant Grundler
2004-01-29 20:41 ` Jack Steiner
5 siblings, 0 replies; 7+ messages in thread
From: Grant Grundler @ 2004-01-29 19:18 UTC (permalink / raw)
To: linux-ia64
On Mon, Jan 26, 2004 at 12:39:42PM -0500, Jeff Garzik wrote:
> I get lockups occasionally on x86 too, but have had higher priority
> things to look at. Since regdump is mainly an engineer's tool, we felt
> it was a "use at your own risk" feature.
>
> But if we can fix it, all the better.
tg3_get_regs() is reading registers that don't exist.
Neither HPUX nor tru64 drivers attempt to touch NVRAM on BCM5700/1 chips.
And tg3 in most other places doesn't either.
It just needs to check TG3_FLAG_NVRAM before reading NVRAM regs.
Jack, you also using the bcm5701 chip?
Jeff, please apply. Following patch is against 2.6.2-rc2.
thanks,
grant
=== drivers/net/tg3.c 1.81 vs edited ==--- 1.81/drivers/net/tg3.c Wed Dec 31 23:40:32 2003
+++ edited/drivers/net/tg3.c Thu Jan 29 10:19:46 2004
@@ -5904,7 +5904,9 @@
GET_REG32_LOOP(MSGINT_MODE, 0x0c);
GET_REG32_1(DMAC_MODE);
GET_REG32_LOOP(GRC_MODE, 0x4c);
- GET_REG32_LOOP(NVRAM_CMD, 0x24);
+ if (tp->tg3_flags & TG3_FLAG_NVRAM) {
+ GET_REG32_LOOP(NVRAM_CMD, 0x24);
+ }
#undef __GET_REG32
#undef GET_REG32_LOOP
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: ethtool -d MCAs rx2600
2004-01-23 23:12 ethtool -d MCAs rx2600 Grant Grundler
` (4 preceding siblings ...)
2004-01-29 19:18 ` Grant Grundler
@ 2004-01-29 20:41 ` Jack Steiner
5 siblings, 0 replies; 7+ messages in thread
From: Jack Steiner @ 2004-01-29 20:41 UTC (permalink / raw)
To: linux-ia64
On Thu, Jan 29, 2004 at 11:18:37AM -0800, Grant Grundler wrote:
> On Mon, Jan 26, 2004 at 12:39:42PM -0500, Jeff Garzik wrote:
> > I get lockups occasionally on x86 too, but have had higher priority
> > things to look at. Since regdump is mainly an engineer's tool, we felt
> > it was a "use at your own risk" feature.
> >
> > But if we can fix it, all the better.
>
> tg3_get_regs() is reading registers that don't exist.
> Neither HPUX nor tru64 drivers attempt to touch NVRAM on BCM5700/1 chips.
> And tg3 in most other places doesn't either.
> It just needs to check TG3_FLAG_NVRAM before reading NVRAM regs.
>
> Jack, you also using the bcm5701 chip?
Yes. Tigon3 [rev 0105 PHY(5701)] (PCI:66MHz:64-bit
We'll apply the patch.
Thanks....
>
> Jeff, please apply. Following patch is against 2.6.2-rc2.
>
> thanks,
> grant
>
> === drivers/net/tg3.c 1.81 vs edited ==> --- 1.81/drivers/net/tg3.c Wed Dec 31 23:40:32 2003
> +++ edited/drivers/net/tg3.c Thu Jan 29 10:19:46 2004
> @@ -5904,7 +5904,9 @@
> GET_REG32_LOOP(MSGINT_MODE, 0x0c);
> GET_REG32_1(DMAC_MODE);
> GET_REG32_LOOP(GRC_MODE, 0x4c);
> - GET_REG32_LOOP(NVRAM_CMD, 0x24);
> + if (tp->tg3_flags & TG3_FLAG_NVRAM) {
> + GET_REG32_LOOP(NVRAM_CMD, 0x24);
> + }
>
> #undef __GET_REG32
> #undef GET_REG32_LOOP
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Thanks
Jack Steiner (steiner@sgi.com) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.
^ permalink raw reply [flat|nested] 7+ messages in thread