* TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way)
@ 2004-12-21 0:12 Peter Chubb
2004-12-21 0:15 ` David S. Miller
0 siblings, 1 reply; 19+ messages in thread
From: Peter Chubb @ 2004-12-21 0:12 UTC (permalink / raw)
To: netdev
Hi,
I've tracked down the problem I was having with the TG3
driver on the HP 16-way. If you'll recall, the problem was that the
link was always marked as down, and autonegotiation was failing.
The problem is that the driver gives up before the switch that the NIC
is connected to has finished the negotiation phase.
Here's a simple patch. I changed the way the loop works too, because
tg3_readphys() sets *val to 0xffffffff if it fails.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
===== drivers/net/tg3.c 1.222 vs edited =====
--- 1.222/drivers/net/tg3.c 2004-11-15 23:53:08 +00:00
+++ edited/drivers/net/tg3.c 2004-12-21 00:00:58 +00:00
@@ -1554,10 +1554,11 @@ static int tg3_setup_copper_phy(struct t
}
}
- bmsr = 0;
- for (i = 0; i < 100; i++) {
- tg3_readphy(tp, MII_BMSR, &bmsr);
+
+ for (i = 0; i < 1000; i++) {
tg3_readphy(tp, MII_BMSR, &bmsr);
+ if (tg3_readphy(tp, MII_BMSR, &bmsr))
+ bmsr = 0;
if (bmsr & BMSR_LSTATUS)
break;
udelay(40);
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2004-12-21 0:12 TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) Peter Chubb @ 2004-12-21 0:15 ` David S. Miller 2004-12-21 1:11 ` Peter Chubb 0 siblings, 1 reply; 19+ messages in thread From: David S. Miller @ 2004-12-21 0:15 UTC (permalink / raw) To: Peter Chubb; +Cc: netdev On Tue, 21 Dec 2004 11:12:23 +1100 Peter Chubb <peterc@gelato.unsw.edu.au> wrote: > The problem is that the driver gives up before the switch that the NIC > is connected to has finished the negotiation phase. > > Here's a simple patch. I changed the way the loop works too, because > tg3_readphys() sets *val to 0xffffffff if it fails. This patch shouldn't be needed. This waiting loop is just an optimization in case we can negotiation quickly. If the loop fails, we just wait for the chip to interrupt us when the link status changes next (or we notice a link status change via timer based polling which we use on chips which can't provide the interrupt based notification properly), at which time we'll call tg3_setup_copper_phy() from the interrupt handler. You need to figure out why that isn't working correctly. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2004-12-21 0:15 ` David S. Miller @ 2004-12-21 1:11 ` Peter Chubb 2004-12-21 6:19 ` David S. Miller 2005-01-06 23:19 ` David S. Miller 0 siblings, 2 replies; 19+ messages in thread From: Peter Chubb @ 2004-12-21 1:11 UTC (permalink / raw) To: David S. Miller; +Cc: Peter Chubb, netdev >>>>> "David" == David S Miller <davem@davemloft.net> writes: David> On Tue, 21 Dec 2004 11:12:23 +1100 Peter Chubb David> <peterc@gelato.unsw.edu.au> wrote: >> The problem is that the driver gives up before the switch that the >> NIC is connected to has finished the negotiation phase. >> >> Here's a simple patch. I changed the way the loop works too, >> because tg3_readphys() sets *val to 0xffffffff if it fails. David> This patch shouldn't be needed. This waiting loop is just an David> optimization in case we can negotiation quickly. David> You need to figure out why that isn't working correctly. It doesn't work because tg3_readphy sets bmsr to 0xffffffff even if it can't read the value. This breaks that loop early; and because BBMSR_LSTATUS is set, all sorts of things happen before the card is ready. Why is tg3_readphys returning 0xffffffff if it can't read a value anyway? Here's another possible fix. ===== drivers/net/tg3.c 1.222 vs edited ===== Index: linux-2.6.10-rc3/drivers/net/tg3.c =================================================================== --- linux-2.6.10-rc3.orig/drivers/net/tg3.c 2004-12-21 00:25:34.000000000 +0000 +++ linux-2.6.10-rc3/drivers/net/tg3.c 2004-12-21 01:00:56.586108274 +0000 @@ -1554,10 +1554,11 @@ } } - bmsr = 0; + for (i = 0; i < 100; i++) { tg3_readphy(tp, MII_BMSR, &bmsr); - tg3_readphy(tp, MII_BMSR, &bmsr); + if (tg3_readphy(tp, MII_BMSR, &bmsr)) + bmsr = 0; if (bmsr & BMSR_LSTATUS) break; udelay(40); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2004-12-21 1:11 ` Peter Chubb @ 2004-12-21 6:19 ` David S. Miller 2005-01-06 23:19 ` David S. Miller 1 sibling, 0 replies; 19+ messages in thread From: David S. Miller @ 2004-12-21 6:19 UTC (permalink / raw) To: Peter Chubb; +Cc: peterc, netdev On Tue, 21 Dec 2004 12:11:40 +1100 Peter Chubb <peterc@gelato.unsw.edu.au> wrote: > It doesn't work because tg3_readphy sets bmsr to 0xffffffff even if it > can't read the value. This breaks that loop early; and because > BBMSR_LSTATUS is set, all sorts of things happen before the card is ready. > > Why is tg3_readphys returning 0xffffffff if it can't read a value anyway? Because callers are not supposed to depend upon the value being set to anything valid if tg3_readphy() returns an error. I thought it should never be returning an error at this spot. The PHY should always return a valid value within PHY_BUSY_LOOPS. If MI_COM_BUSY is staying set for such a long time that's a pretty serious problem. Taking a peek at the bcm5700 driver by Broadcom, they handle all PHY read timeouts the way your patch does in this one spot, by setting the returned value to zero. So it seems the device can time out like that in these situations, and your patch is something close to the correct fix. Good catch Peter, I'll think some more about this and probably end up using something similar to your second patch. Thanks. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2004-12-21 1:11 ` Peter Chubb 2004-12-21 6:19 ` David S. Miller @ 2005-01-06 23:19 ` David S. Miller 2005-01-07 0:17 ` Darren Williams 1 sibling, 1 reply; 19+ messages in thread From: David S. Miller @ 2005-01-06 23:19 UTC (permalink / raw) To: Peter Chubb; +Cc: peterc, netdev Peter, let me know if this patch solves your PHY link up problem too. # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2005/01/06 14:53:55-08:00 davem@nuts.davemloft.net # [TG3]: Return 0 when PHY read times out, not all-ones. # # Noticed by Peter Chubb. # # Signed-off-by: David S. Miller <davem@davemloft.net> # # drivers/net/tg3.c # 2005/01/06 14:53:07-08:00 davem@nuts.davemloft.net +1 -1 # [TG3]: Return 0 when PHY read times out, not all-ones. # diff -Nru a/drivers/net/tg3.c b/drivers/net/tg3.c --- a/drivers/net/tg3.c 2005-01-06 14:54:21 -08:00 +++ b/drivers/net/tg3.c 2005-01-06 14:54:21 -08:00 @@ -485,7 +485,7 @@ udelay(80); } - *val = 0xffffffff; + *val = 0x0; frame_val = ((PHY_ADDR << MI_COM_PHY_ADDR_SHIFT) & MI_COM_PHY_ADDR_MASK); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-01-06 23:19 ` David S. Miller @ 2005-01-07 0:17 ` Darren Williams 2005-01-07 3:48 ` David S. Miller 0 siblings, 1 reply; 19+ messages in thread From: Darren Williams @ 2005-01-07 0:17 UTC (permalink / raw) To: David S. Miller; +Cc: Peter Chubb, netdev, mchan Hi David On Thu, 06 Jan 2005, David S. Miller wrote: > > Peter, let me know if this patch solves your PHY link up problem > too. > > # This is a BitKeeper generated diff -Nru style patch. > # > # ChangeSet > # 2005/01/06 14:53:55-08:00 davem@nuts.davemloft.net > # [TG3]: Return 0 when PHY read times out, not all-ones. > # > # Noticed by Peter Chubb. > # > # Signed-off-by: David S. Miller <davem@davemloft.net> > # > # drivers/net/tg3.c > # 2005/01/06 14:53:07-08:00 davem@nuts.davemloft.net +1 -1 > # [TG3]: Return 0 when PHY read times out, not all-ones. > # > diff -Nru a/drivers/net/tg3.c b/drivers/net/tg3.c > --- a/drivers/net/tg3.c 2005-01-06 14:54:21 -08:00 > +++ b/drivers/net/tg3.c 2005-01-06 14:54:21 -08:00 > @@ -485,7 +485,7 @@ > udelay(80); > } > > - *val = 0xffffffff; > + *val = 0x0; > > frame_val = ((PHY_ADDR << MI_COM_PHY_ADDR_SHIFT) & > MI_COM_PHY_ADDR_MASK); No, if I revert back to an earlier driver the link comes up OK printing: tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. after a short delay. With the current dirver I do not see any prinks from tg3_link_report about the state of the link. Darren -------------------------------------------------- Darren Williams <dsw AT gelato.unsw.edu.au> Gelato@UNSW <www.gelato.unsw.edu.au> -------------------------------------------------- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-01-07 0:17 ` Darren Williams @ 2005-01-07 3:48 ` David S. Miller 2005-01-07 5:30 ` Darren Williams 0 siblings, 1 reply; 19+ messages in thread From: David S. Miller @ 2005-01-07 3:48 UTC (permalink / raw) To: Darren Williams; +Cc: peterc, netdev, mchan On Fri, 7 Jan 2005 11:17:44 +1100 Darren Williams <dsw@gelato.unsw.edu.au> wrote: > No, if I revert back to an earlier driver the link comes > up OK printing: > tg3: eth0: Link is up at 1000 Mbps, full duplex. > tg3: eth0: Flow control is off for TX and off for RX. > after a short delay. With the current dirver I do not > see any prinks from tg3_link_report about the state > of the link. Is the working verion with Peter's patch applied? In any case, please try this one instead which is a variant of the second patch Peter proposed plus some signedness fixes I've done. Thanks. --- ../linus-2.6/drivers/net/tg3.c 2005-01-06 19:23:03.000000000 -0800 +++ drivers/net/tg3.c 2005-01-06 19:22:53.000000000 -0800 @@ -60,8 +60,8 @@ #define DRV_MODULE_NAME "tg3" #define PFX DRV_MODULE_NAME ": " -#define DRV_MODULE_VERSION "3.14" -#define DRV_MODULE_RELDATE "November 15, 2004" +#define DRV_MODULE_VERSION "3.15" +#define DRV_MODULE_RELDATE "January 6, 2005" #define TG3_DEF_MAC_MODE 0 #define TG3_DEF_RX_MODE 0 @@ -493,7 +493,8 @@ static int tg3_readphy(struct tg3 *tp, int reg, u32 *val) { u32 frame_val; - int loops, ret; + unsigned int loops; + int ret; if ((tp->mi_mode & MAC_MI_MODE_AUTO_POLL) != 0) { tw32_f(MAC_MI_MODE, @@ -512,7 +513,7 @@ tw32_f(MAC_MI_COM, frame_val); loops = PHY_BUSY_LOOPS; - while (loops-- > 0) { + while (loops != 0) { udelay(10); frame_val = tr32(MAC_MI_COM); @@ -521,10 +522,11 @@ frame_val = tr32(MAC_MI_COM); break; } + loops -= 1; } ret = -EBUSY; - if (loops > 0) { + if (loops != 0) { *val = frame_val & MI_COM_DATA_MASK; ret = 0; } @@ -540,7 +542,8 @@ static int tg3_writephy(struct tg3 *tp, int reg, u32 val) { u32 frame_val; - int loops, ret; + unsigned int loops; + int ret; if ((tp->mi_mode & MAC_MI_MODE_AUTO_POLL) != 0) { tw32_f(MAC_MI_MODE, @@ -558,7 +561,7 @@ tw32_f(MAC_MI_COM, frame_val); loops = PHY_BUSY_LOOPS; - while (loops-- > 0) { + while (loops != 0) { udelay(10); frame_val = tr32(MAC_MI_COM); if ((frame_val & MI_COM_BUSY) == 0) { @@ -566,10 +569,11 @@ frame_val = tr32(MAC_MI_COM); break; } + loops -= 1; } ret = -EBUSY; - if (loops > 0) + if (loops != 0) ret = 0; if ((tp->mi_mode & MAC_MI_MODE_AUTO_POLL) != 0) { @@ -1557,7 +1561,9 @@ bmsr = 0; for (i = 0; i < 100; i++) { tg3_readphy(tp, MII_BMSR, &bmsr); - tg3_readphy(tp, MII_BMSR, &bmsr); + if (tg3_readphy(tp, MII_BMSR, &bmsr)) + bmsr = 0; + if (bmsr & BMSR_LSTATUS) break; udelay(40); @@ -1640,7 +1646,9 @@ tg3_phy_copper_begin(tp); tg3_readphy(tp, MII_BMSR, &tmp); - tg3_readphy(tp, MII_BMSR, &tmp); + if (tg3_readphy(tp, MII_BMSR, &tmp)) + tmp = 0; + if (tmp & BMSR_LSTATUS) current_link_up = 1; } @@ -7209,7 +7217,8 @@ u32 bmsr, adv_reg, tg3_ctrl; tg3_readphy(tp, MII_BMSR, &bmsr); - tg3_readphy(tp, MII_BMSR, &bmsr); + if (tg3_readphy(tp, MII_BMSR, &bmsr)) + bmsr = 0; if (bmsr & BMSR_LSTATUS) goto skip_phy_reset; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-01-07 3:48 ` David S. Miller @ 2005-01-07 5:30 ` Darren Williams 2005-01-07 5:28 ` David S. Miller 0 siblings, 1 reply; 19+ messages in thread From: Darren Williams @ 2005-01-07 5:30 UTC (permalink / raw) To: David S. Miller; +Cc: peterc, netdev, mchan Hi David On Thu, 06 Jan 2005, David S. Miller wrote: > On Fri, 7 Jan 2005 11:17:44 +1100 > Darren Williams <dsw@gelato.unsw.edu.au> wrote: > > > No, if I revert back to an earlier driver the link comes > > up OK printing: > > tg3: eth0: Link is up at 1000 Mbps, full duplex. > > tg3: eth0: Flow control is off for TX and off for RX. > > after a short delay. With the current dirver I do not > > see any prinks from tg3_link_report about the state > > of the link. > > Is the working verion with Peter's patch applied? No see below. > > In any case, please try this one instead which is a variant > of the second patch Peter proposed plus some signedness fixes > I've done. > No still no link. The patch that I have working is: tg3.c:v2.9 (March 8, 2004) that I took from 2.6.7 > Thanks. > > --- ../linus-2.6/drivers/net/tg3.c 2005-01-06 19:23:03.000000000 -0800 > +++ drivers/net/tg3.c 2005-01-06 19:22:53.000000000 -0800 > @@ -60,8 +60,8 @@ > > #define DRV_MODULE_NAME "tg3" > #define PFX DRV_MODULE_NAME ": " > -#define DRV_MODULE_VERSION "3.14" > -#define DRV_MODULE_RELDATE "November 15, 2004" > +#define DRV_MODULE_VERSION "3.15" > +#define DRV_MODULE_RELDATE "January 6, 2005" > > #define TG3_DEF_MAC_MODE 0 > #define TG3_DEF_RX_MODE 0 > @@ -493,7 +493,8 @@ > static int tg3_readphy(struct tg3 *tp, int reg, u32 *val) > { > u32 frame_val; > - int loops, ret; > + unsigned int loops; > + int ret; > > if ((tp->mi_mode & MAC_MI_MODE_AUTO_POLL) != 0) { > tw32_f(MAC_MI_MODE, > @@ -512,7 +513,7 @@ > tw32_f(MAC_MI_COM, frame_val); > > loops = PHY_BUSY_LOOPS; > - while (loops-- > 0) { > + while (loops != 0) { > udelay(10); > frame_val = tr32(MAC_MI_COM); > > @@ -521,10 +522,11 @@ > frame_val = tr32(MAC_MI_COM); > break; > } > + loops -= 1; > } > > ret = -EBUSY; > - if (loops > 0) { > + if (loops != 0) { > *val = frame_val & MI_COM_DATA_MASK; > ret = 0; > } > @@ -540,7 +542,8 @@ > static int tg3_writephy(struct tg3 *tp, int reg, u32 val) > { > u32 frame_val; > - int loops, ret; > + unsigned int loops; > + int ret; > > if ((tp->mi_mode & MAC_MI_MODE_AUTO_POLL) != 0) { > tw32_f(MAC_MI_MODE, > @@ -558,7 +561,7 @@ > tw32_f(MAC_MI_COM, frame_val); > > loops = PHY_BUSY_LOOPS; > - while (loops-- > 0) { > + while (loops != 0) { > udelay(10); > frame_val = tr32(MAC_MI_COM); > if ((frame_val & MI_COM_BUSY) == 0) { > @@ -566,10 +569,11 @@ > frame_val = tr32(MAC_MI_COM); > break; > } > + loops -= 1; > } > > ret = -EBUSY; > - if (loops > 0) > + if (loops != 0) > ret = 0; > > if ((tp->mi_mode & MAC_MI_MODE_AUTO_POLL) != 0) { > @@ -1557,7 +1561,9 @@ > bmsr = 0; > for (i = 0; i < 100; i++) { > tg3_readphy(tp, MII_BMSR, &bmsr); > - tg3_readphy(tp, MII_BMSR, &bmsr); > + if (tg3_readphy(tp, MII_BMSR, &bmsr)) > + bmsr = 0; > + > if (bmsr & BMSR_LSTATUS) > break; > udelay(40); > @@ -1640,7 +1646,9 @@ > tg3_phy_copper_begin(tp); > > tg3_readphy(tp, MII_BMSR, &tmp); > - tg3_readphy(tp, MII_BMSR, &tmp); > + if (tg3_readphy(tp, MII_BMSR, &tmp)) > + tmp = 0; > + > if (tmp & BMSR_LSTATUS) > current_link_up = 1; > } > @@ -7209,7 +7217,8 @@ > u32 bmsr, adv_reg, tg3_ctrl; > > tg3_readphy(tp, MII_BMSR, &bmsr); > - tg3_readphy(tp, MII_BMSR, &bmsr); > + if (tg3_readphy(tp, MII_BMSR, &bmsr)) > + bmsr = 0; > > if (bmsr & BMSR_LSTATUS) > goto skip_phy_reset; -------------------------------------------------- Darren Williams <dsw AT gelato.unsw.edu.au> Gelato@UNSW <www.gelato.unsw.edu.au> -------------------------------------------------- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-01-07 5:30 ` Darren Williams @ 2005-01-07 5:28 ` David S. Miller 2005-01-07 9:25 ` Darren Williams 0 siblings, 1 reply; 19+ messages in thread From: David S. Miller @ 2005-01-07 5:28 UTC (permalink / raw) To: Darren Williams; +Cc: peterc, netdev, mchan On Fri, 7 Jan 2005 16:30:09 +1100 Darren Williams <dsw@gelato.unsw.edu.au> wrote: > > In any case, please try this one instead which is a variant > > of the second patch Peter proposed plus some signedness fixes > > I've done. > > > No still no link. The patch that I have working is: > tg3.c:v2.9 (March 8, 2004) > that I took from 2.6.7 Darren, I believe you are confusing this entire situation. And I should have suspected this when you were responding to my test requests instead of Peter. Peter had a problem on a 16-way HP machine, and the patch that I just gave to you is nearly identical to his which he said makes his problem go away. Are you having the same problem on the same configuration or are you hoping this fixes some other tg3 bug that you happen to be seeing? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-01-07 5:28 ` David S. Miller @ 2005-01-07 9:25 ` Darren Williams 0 siblings, 0 replies; 19+ messages in thread From: Darren Williams @ 2005-01-07 9:25 UTC (permalink / raw) To: David S. Miller; +Cc: Darren Williams, peterc, netdev, mchan Hi David On Thu, 06 Jan 2005, David S. Miller wrote: > On Fri, 7 Jan 2005 16:30:09 +1100 > Darren Williams <dsw@gelato.unsw.edu.au> wrote: > > > > In any case, please try this one instead which is a variant > > > of the second patch Peter proposed plus some signedness fixes > > > I've done. > > > > > No still no link. The patch that I have working is: > > tg3.c:v2.9 (March 8, 2004) > > that I took from 2.6.7 > > Darren, I believe you are confusing this entire situation. > And I should have suspected this when you were responding > to my test requests instead of Peter. > > Peter had a problem on a 16-way HP machine, and the patch > that I just gave to you is nearly identical to his which > he said makes his problem go away. > Yes it the same machine. > Are you having the same problem on the same configuration or > are you hoping this fixes some other tg3 bug that you happen > to be seeing? I'm having the same problem on the same machine. Peter's original patch fixed it once as far as I know. I have patched the currently kernel with Peter's original patch though have had no success bringing the interface up. Thats why I started looking at older drivers since they play nicely with the switch. Peter is currently on holidays, and I am using the machine for benchmarking and assisting with solving this problem. I should of let you know to avoid the confusion. Darren -------------------------------------------------- Darren Williams <dsw AT gelato.unsw.edu.au> Gelato@UNSW <www.gelato.unsw.edu.au> -------------------------------------------------- ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way)
@ 2004-12-23 0:02 Michael Chan
2004-12-23 4:31 ` David S. Miller
2005-05-31 15:38 ` Grant Grundler
0 siblings, 2 replies; 19+ messages in thread
From: Michael Chan @ 2004-12-23 0:02 UTC (permalink / raw)
To: David S. Miller, Peter Chubb; +Cc: peterc, netdev
> Because callers are not supposed to depend upon the value
> being set to anything valid if tg3_readphy() returns an error.
>
> I thought it should never be returning an error at this spot.
> The PHY should always return a valid value within
> PHY_BUSY_LOOPS. If MI_COM_BUSY is staying set for such a
> long time that's a pretty serious problem.
>
> Taking a peek at the bcm5700 driver by Broadcom, they handle
> all PHY read timeouts the way your patch does in this one
> spot, by setting the returned value to zero. So it seems the
> device can time out like that in these situations, and your
> patch is something close to the correct fix.
>
> Good catch Peter, I'll think some more about this and
> probably end up using something similar to your second patch.
>
> Thanks.
>
David, While the 2nd patch or something similar should be applied, I
think the underlying cause of tg3_readphy() returning error should be
further investigated.
Peter, if you provide more information, such as registers MAC_MI_MODE
(0x454) and MAC_MI_COM (0x44c) after the failure, I can look into it.
Michael
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2004-12-23 0:02 Michael Chan @ 2004-12-23 4:31 ` David S. Miller 2005-05-31 15:38 ` Grant Grundler 1 sibling, 0 replies; 19+ messages in thread From: David S. Miller @ 2004-12-23 4:31 UTC (permalink / raw) To: Michael Chan; +Cc: peterc, netdev On Wed, 22 Dec 2004 16:02:44 -0800 "Michael Chan" <mchan@broadcom.com> wrote: > David, While the 2nd patch or something similar should be applied, I > think the underlying cause of tg3_readphy() returning error should be > further investigated. Would this condition be possible if something, such as ASF, were continually polling the PHY in parallel with the driver? On the other hand, it doesn't seem so foreign for the PHY to block out register accesses for long periods of time for various reasons. But yes I'd also like to know more about exactly what is going on in this case. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2004-12-23 0:02 Michael Chan 2004-12-23 4:31 ` David S. Miller @ 2005-05-31 15:38 ` Grant Grundler 2005-05-31 17:22 ` Michael Chan 1 sibling, 1 reply; 19+ messages in thread From: Grant Grundler @ 2005-05-31 15:38 UTC (permalink / raw) To: Michael Chan; +Cc: David S. Miller, Peter Chubb, netdev On Wed, Dec 22, 2004 at 04:02:44PM -0800, Michael Chan wrote: > David, While the 2nd patch or something similar should be applied, I > think the underlying cause of tg3_readphy() returning error should be > further investigated. > > Peter, if you provide more information, such as registers MAC_MI_MODE > (0x454) and MAC_MI_COM (0x44c) after the failure, I can look into it. Michael, Peter bounced this email to me after we talked about issues he was having with the rx8620 (HP 16-way ia64, sx1000 chipset) "IOX Core LAN". It sounded like the same problem I tracked down with rx8620 IOX Core LAN in March. Here is the summary : | In May, 2004, tg3 v3.4 changed how MAC_LED_CTRL (0x40c) was getting | programmed and how to determine what to program into LED_CTRL. The new | code trusted NIC_SRAM_DATA_CFG (0x00000b58) to indicate what to write | to LED_CTRL and MII EXT_CTRL registers. On "IOX Core Lan", SRAM was | saying MODE_MAC (0x0) and that doesn't work. HP is working with broadcom to figure out how to update the firmware. Ben Malavazi <malavazi@cup.hp.com> is talking with Pratapa Reddy Vaka <pvaka@broadcom.com> 'Johnny Ho' <jho@broadcom.com> I'm not sure they are aware of the root cause. But they can probably provide you with the process to update the onboard firmware image should this come up again. b57diag is one way. HP also has an FCUPDATE tool that's a bit less dangerous/more user friendly. FCUPDATE needs a newer EFI driver to operate correctly so it's non-trivial to get started as well. I'm hoping one of the two can be scripted to auto update. I'm told rx5670 Core LAN firmware was the original image used by the rx8620 IOX Core LAN team and may also exhibit the same problem. I've not heard any trouble reports so far on rx5670 but please forward them to me if you do. Please do NOT apply the appended patch - it's just a temp workaround until the owners of the "IOX Core LAN" and Broadcom can get their act together. The patch appended below is for RHEL3u4 (tg3 v3.10RH) but should apply (with fuzz) to any tg3 v3.4 or later. The patch adds a "quirk" (bug workaround) to overide the SRAM value and use LED_CTRL_MODE_PHY_1 (0x00000800) instead. This results in LNK3_LED_MODE (0x2) getting written to MII_TG3_EXT_CTRL (0x10) for bcm5700/5701 chips. I do NOT understand why programming the MII_TG3_EXT_CTRL register differently will change how auto-negotiation works. hth, grant --- orig/drivers/net/tg3.c 2005-03-19 21:39:25.000000000 -0600 +++ ggg/drivers/net/tg3.c 2005-03-20 13:17:50.000000000 -0600 @@ -7090,9 +7089,20 @@ static int __devinit tg3_phy_probe(struc }; if ((GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5700 || - GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701) && - tp->pdev->subsystem_vendor == PCI_VENDOR_ID_DELL) - tp->led_ctrl = LED_CTRL_MODE_PHY_2; + GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5701)) { + /* Quirks: firmware is wrong */ + switch (tp->pdev->subsystem_vendor) { + case PCI_VENDOR_ID_DELL: + tp->led_ctrl = LED_CTRL_MODE_PHY_2; + break; + + case PCI_VENDOR_ID_HP: + /* HP IOX Core Lan 1000Base-T [A7109AX] */ + if (tp->pdev->subsystem_device == 0x12c1) + tp->led_ctrl = LED_CTRL_MODE_PHY_1; + break; + } + } if (((GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5703) || (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5704) || ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-05-31 15:38 ` Grant Grundler @ 2005-05-31 17:22 ` Michael Chan 2005-05-31 18:36 ` Grant Grundler 0 siblings, 1 reply; 19+ messages in thread From: Michael Chan @ 2005-05-31 17:22 UTC (permalink / raw) To: Grant Grundler; +Cc: David S. Miller, Peter Chubb, netdev On Tue, 2005-05-31 at 08:38 -0700, Grant Grundler wrote: > Michael, > Peter bounced this email to me after we talked about issues he was > having with the rx8620 (HP 16-way ia64, sx1000 chipset) "IOX Core LAN". > It sounded like the same problem I tracked down with rx8620 IOX Core LAN > in March. Here is the summary : > > | In May, 2004, tg3 v3.4 changed how MAC_LED_CTRL (0x40c) was getting > | programmed and how to determine what to program into LED_CTRL. The new > | code trusted NIC_SRAM_DATA_CFG (0x00000b58) to indicate what to write > | to LED_CTRL and MII EXT_CTRL registers. On "IOX Core Lan", SRAM was > | saying MODE_MAC (0x0) and that doesn't work. > Thanks Grant, I'll chase this down and find a solution to this problem. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-05-31 17:22 ` Michael Chan @ 2005-05-31 18:36 ` Grant Grundler 2005-05-31 17:42 ` Michael Chan 0 siblings, 1 reply; 19+ messages in thread From: Grant Grundler @ 2005-05-31 18:36 UTC (permalink / raw) To: Michael Chan; +Cc: David S. Miller, Peter Chubb, netdev On Tue, May 31, 2005 at 10:22:37AM -0700, Michael Chan wrote: > > Michael, > > Peter bounced this email to me after we talked about issues he was > > having with the rx8620 (HP 16-way ia64, sx1000 chipset) "IOX Core LAN". > > It sounded like the same problem I tracked down with rx8620 IOX Core LAN > > in March. Here is the summary : > > > > | In May, 2004, tg3 v3.4 changed how MAC_LED_CTRL (0x40c) was getting > > | programmed and how to determine what to program into LED_CTRL. The new > > | code trusted NIC_SRAM_DATA_CFG (0x00000b58) to indicate what to write > > | to LED_CTRL and MII EXT_CTRL registers. On "IOX Core Lan", SRAM was > > | saying MODE_MAC (0x0) and that doesn't work. > > > > Thanks Grant, I'll chase this down and find a solution to this problem. You mean you'll help find a way to reflash the onboard eeprom? thanks, grant ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-05-31 18:36 ` Grant Grundler @ 2005-05-31 17:42 ` Michael Chan 2005-05-31 19:03 ` Grant Grundler 0 siblings, 1 reply; 19+ messages in thread From: Michael Chan @ 2005-05-31 17:42 UTC (permalink / raw) To: Grant Grundler; +Cc: David S. Miller, Peter Chubb, netdev On Tue, 2005-05-31 at 11:36 -0700, Grant Grundler wrote: > On Tue, May 31, 2005 at 10:22:37AM -0700, Michael Chan wrote: > > Thanks Grant, I'll chase this down and find a solution to this problem. > > You mean you'll help find a way to reflash the onboard eeprom? > No, I mean I will create a patch that will work for older devices that do not have the valid LED bits in eeprom. Thanks. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-05-31 17:42 ` Michael Chan @ 2005-05-31 19:03 ` Grant Grundler 2005-05-31 18:22 ` Michael Chan 0 siblings, 1 reply; 19+ messages in thread From: Grant Grundler @ 2005-05-31 19:03 UTC (permalink / raw) To: Michael Chan; +Cc: Grant Grundler, David S. Miller, Peter Chubb, netdev On Tue, May 31, 2005 at 10:42:45AM -0700, Michael Chan wrote: > No, I mean I will create a patch that will work for older devices that > do not have the valid LED bits in eeprom. Ah ok. How can you know if the LED bits are valid (or not)? I consider this a bug in the eeprom and have set things in motion (in HP) to correct it since the card is not an obsolete product. Otherwise, the patch I submitted should be sufficient. thanks, grant ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) 2005-05-31 19:03 ` Grant Grundler @ 2005-05-31 18:22 ` Michael Chan 0 siblings, 0 replies; 19+ messages in thread From: Michael Chan @ 2005-05-31 18:22 UTC (permalink / raw) To: Grant Grundler; +Cc: David S. Miller, Peter Chubb, netdev On Tue, 2005-05-31 at 12:03 -0700, Grant Grundler wrote: > On Tue, May 31, 2005 at 10:42:45AM -0700, Michael Chan wrote: > > No, I mean I will create a patch that will work for older devices that > > do not have the valid LED bits in eeprom. > > Ah ok. How can you know if the LED bits are valid (or not)? > There are version numbers and signatures that we may be able to use. I am working with the firmware engineer to see if we can find a way. > I consider this a bug in the eeprom and have set things in motion > (in HP) to correct it since the card is not an obsolete product. > > Otherwise, the patch I submitted should be sufficient. > I am hoping to have a patch that will work for all older devices instead of just checking for subvendor IDs, if possible. If not, we'll go with your patch. ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way)
@ 2004-12-23 8:14 Michael Chan
0 siblings, 0 replies; 19+ messages in thread
From: Michael Chan @ 2004-12-23 8:14 UTC (permalink / raw)
To: David S. Miller; +Cc: peterc, netdev
> Would this condition be possible if something, such as ASF, were
> continually polling the PHY in parallel with the driver?
Without proper handshake with ASF, I think it may be possible if ASF is in a very tight loop polling the PHY. The tg3_readphy() poll loop is not very tight so it is possible to continually hit the busy condition if ASF is polling PHY registers. If this is the case, even if tg3_readphy() eventually gets the data, the data will most likely be from a different PHY register (that ASF is trying to read).
But I don't think this is an ASF related problem because if it were, the patch would not have fixed it.
Michael
^ permalink raw reply [flat|nested] 19+ messages in threadend of thread, other threads:[~2005-05-31 19:03 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-12-21 0:12 TG3 fix for slow switches (Was: TG3 driver failure on HP 16-way) Peter Chubb 2004-12-21 0:15 ` David S. Miller 2004-12-21 1:11 ` Peter Chubb 2004-12-21 6:19 ` David S. Miller 2005-01-06 23:19 ` David S. Miller 2005-01-07 0:17 ` Darren Williams 2005-01-07 3:48 ` David S. Miller 2005-01-07 5:30 ` Darren Williams 2005-01-07 5:28 ` David S. Miller 2005-01-07 9:25 ` Darren Williams -- strict thread matches above, loose matches on Subject: below -- 2004-12-23 0:02 Michael Chan 2004-12-23 4:31 ` David S. Miller 2005-05-31 15:38 ` Grant Grundler 2005-05-31 17:22 ` Michael Chan 2005-05-31 18:36 ` Grant Grundler 2005-05-31 17:42 ` Michael Chan 2005-05-31 19:03 ` Grant Grundler 2005-05-31 18:22 ` Michael Chan 2004-12-23 8:14 Michael Chan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).