netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* e100: Wait for PHY reset to complete?
@ 2006-10-25 17:22 Anders Grafstrom
  2006-10-25 18:56 ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Anders Grafstrom @ 2006-10-25 17:22 UTC (permalink / raw)
  To: netdev

Shouldn't the e100 driver wait for PHY reset to complete before
proceeding in e100_set_settings()?

I'm asking because the interface (82551) sometimes ends up in loopback
when I try to set AUTONEG.

I'm thinking something like this patch.

Anders

--- linux-2.6.18.orig/drivers/net/e100.c	2006-09-20 05:42:06.000000000 +0200
+++ linux-2.6.18/drivers/net/e100.c	2006-10-17 17:43:47.000000000 +0200
@@ -2201,8 +2201,16 @@ static int e100_set_settings(struct net_
 {
 	struct nic *nic = netdev_priv(netdev);
 	int err;
+	int counter = 50;
 
 	mdio_write(netdev, nic->mii.phy_id, MII_BMCR, BMCR_RESET);
+
+	/* wait for possibly (ouch) 500ms */
+	while (mdio_read(netdev, nic->mii.phy_id, MII_BMCR) & BMCR_RESET) {
+		msleep(10);
+		if (!--counter) break;
+	}
+
 	err = mii_ethtool_sset(&nic->mii, cmd);
 	e100_exec_cb(nic, NULL, e100_configure);
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: e100: Wait for PHY reset to complete?
  2006-10-25 17:22 e100: Wait for PHY reset to complete? Anders Grafstrom
@ 2006-10-25 18:56 ` Francois Romieu
  2006-10-25 20:18   ` Auke Kok
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2006-10-25 18:56 UTC (permalink / raw)
  To: Anders Grafstrom; +Cc: netdev

Anders Grafstrom <grfstrm@users.sourceforge.net> :
[...]
> I'm thinking something like this patch.

I do not have the spec for the max duration at hand but it makes sense.

Can you:
- decrease the duration of each sleep and increase the count of iterations
- put the break on a line of its own
- add a Signed-off-by: Foo Bar <baz@nit.fol> line
- Cc: jeff@garzik.org and Auke Kok <auke-jan.h.kok@intel.com>

-- 
Ueimor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: e100: Wait for PHY reset to complete?
  2006-10-25 18:56 ` Francois Romieu
@ 2006-10-25 20:18   ` Auke Kok
  2006-10-25 21:26     ` Auke Kok
  0 siblings, 1 reply; 10+ messages in thread
From: Auke Kok @ 2006-10-25 20:18 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Anders Grafstrom, netdev

Francois Romieu wrote:
> Anders Grafstrom <grfstrm@users.sourceforge.net> :
> [...]
>> I'm thinking something like this patch.
> 
> I do not have the spec for the max duration at hand but it makes sense.
> 
> Can you:
> - decrease the duration of each sleep and increase the count of iterations
> - put the break on a line of its own
> - add a Signed-off-by: Foo Bar <baz@nit.fol> line
> - Cc: jeff@garzik.org and Auke Kok <auke-jan.h.kok@intel.com>

I'm already checking out specs, which, for 8255x devices, are available on 
http://e1000.sf.net/ .

Auke

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: e100: Wait for PHY reset to complete?
  2006-10-25 20:18   ` Auke Kok
@ 2006-10-25 21:26     ` Auke Kok
  2006-10-25 21:57       ` Francois Romieu
  2006-10-25 23:36       ` Anders Grafström
  0 siblings, 2 replies; 10+ messages in thread
From: Auke Kok @ 2006-10-25 21:26 UTC (permalink / raw)
  To: Anders Grafstrom; +Cc: Auke Kok, Francois Romieu, netdev, Jesse Brandeburg

Auke Kok wrote:
> Francois Romieu wrote:
>> Anders Grafstrom <grfstrm@users.sourceforge.net> :
>> [...]
>>> I'm thinking something like this patch.
>>
>> I do not have the spec for the max duration at hand but it makes sense.
>>
>> Can you:
>> - decrease the duration of each sleep and increase the count of 
>> iterations
>> - put the break on a line of its own
>> - add a Signed-off-by: Foo Bar <baz@nit.fol> line
>> - Cc: jeff@garzik.org and Auke Kok <auke-jan.h.kok@intel.com>
> 
> I'm already checking out specs, which, for 8255x devices, are available 
> on http://e1000.sf.net/ .

okay, I don't think this is needed at all:

Allthough the spec itself didn't talk about phy reset times, I've ran this patch with 
some debugging output on a few boxes and did some speed/duplex settings, and the PHY 
reset returned succesfull after the very first mdio_read, which is before any msleep(10) 
is executed. That is also expected behaviour.

I think you might be confusing this with a MAC reset, which has a documented 10usec 
timeout (see 8255x developers manual). The driver already adheres to this by doing a 
20usec delay after software/selective resets.

which gets us back to the original problem: how did your driver end up in loopback mode? 
(and, how did you figure out that it did??).

Cheers,

Auke

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: e100: Wait for PHY reset to complete?
  2006-10-25 21:26     ` Auke Kok
@ 2006-10-25 21:57       ` Francois Romieu
  2006-10-25 23:36       ` Anders Grafström
  1 sibling, 0 replies; 10+ messages in thread
From: Francois Romieu @ 2006-10-25 21:57 UTC (permalink / raw)
  To: Auke Kok; +Cc: Anders Grafstrom, netdev, Jesse Brandeburg

Auke Kok <auke-jan.h.kok@intel.com> :
[...]
> okay, I don't think this is needed at all:
> 
> Allthough the spec itself didn't talk about phy reset times, I've ran this 
> patch with some debugging output on a few boxes and did some speed/duplex 
> settings, and the PHY reset returned succesfull after the very first 
> mdio_read, which is before any msleep(10) is executed. That is also 
> expected behaviour.

Would it be expected as well that the PHY has performed reset before the
second mdio_write hits it when the mdio_read is not there at all ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: e100: Wait for PHY reset to complete?
  2006-10-25 21:26     ` Auke Kok
  2006-10-25 21:57       ` Francois Romieu
@ 2006-10-25 23:36       ` Anders Grafström
  2006-10-26  1:07         ` Auke Kok
  1 sibling, 1 reply; 10+ messages in thread
From: Anders Grafström @ 2006-10-25 23:36 UTC (permalink / raw)
  To: Auke Kok; +Cc: Francois Romieu, netdev, Jesse Brandeburg

Auke Kok wrote:
> Allthough the spec itself didn't talk about phy reset times, I've ran this
> patch with
> some debugging output on a few boxes and did some speed/duplex settings,
> and the PHY
> reset returned succesfull after the very first mdio_read, which is before
> any msleep(10)
> is executed. That is also expected behaviour.
>
> I think you might be confusing this with a MAC reset, which has a
> documented 10usec
> timeout (see 8255x developers manual). The driver already adheres to this
> by doing a
> 20usec delay after software/selective resets.
>
> which gets us back to the original problem: how did your driver end up in
> loopback mode?
> (and, how did you figure out that it did??).


This is what the 2.4.33.3 driver does:

void
e100_phy_reset(struct e100_private *bdp)
{
        u16 ctrl_reg;
        ctrl_reg = BMCR_RESET;
        e100_mdi_write(bdp, MII_BMCR, bdp->phy_addr, ctrl_reg);
        /* ieee 802.3 : The reset process shall be completed       */
        /* within 0.5 seconds from the settting of PHY reset bit.  */
        set_current_state(TASK_UNINTERRUPTIBLE);
        schedule_timeout(HZ / 2);
}

And here
http://www.cs.helsinki.fi/linux/linux-kernel/2003-23/1245.html
I found this entry:

<scott.feldman@intel.com> (03/06/08 1.1218)
[e100] misc
<...>
* Add 1/2 second delay after PHY reset to allow link partner to
see and respond to reset, per IEEE 802.3.


I ran mii-diag when the LEDs went out and the register dump
said it was in loopback. It is somewhat difficult reproduce.
It seems to be timing dependent, something else has to occur
at the same time.
I must confess I have only seen it with the 2.6.13 kernel.
I have not been able to reproduce it with 2.6.18.
But I have found no change in the driver that would fix it so
I suspect the problem is still there.

I have tried adding debug output to see if I can read back the
RESET bit in set state, but then the problem refuses to show
so I don't think I can rule out an unfinished PHY reset.

Anders



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: e100: Wait for PHY reset to complete?
  2006-10-25 23:36       ` Anders Grafström
@ 2006-10-26  1:07         ` Auke Kok
  2006-10-30 19:05           ` Auke Kok
  0 siblings, 1 reply; 10+ messages in thread
From: Auke Kok @ 2006-10-26  1:07 UTC (permalink / raw)
  To: Anders Grafström; +Cc: Francois Romieu, netdev, Jesse Brandeburg

Anders Grafström wrote:
> Auke Kok wrote:
>> Allthough the spec itself didn't talk about phy reset times, I've ran this
>> patch with
>> some debugging output on a few boxes and did some speed/duplex settings,
>> and the PHY
>> reset returned succesfull after the very first mdio_read, which is before
>> any msleep(10)
>> is executed. That is also expected behaviour.
>>
>> I think you might be confusing this with a MAC reset, which has a
>> documented 10usec
>> timeout (see 8255x developers manual). The driver already adheres to this
>> by doing a
>> 20usec delay after software/selective resets.
>>
>> which gets us back to the original problem: how did your driver end up in
>> loopback mode?
>> (and, how did you figure out that it did??).
> 
> 
> This is what the 2.4.33.3 driver does:
> 
> void
> e100_phy_reset(struct e100_private *bdp)
> {
>         u16 ctrl_reg;
>         ctrl_reg = BMCR_RESET;
>         e100_mdi_write(bdp, MII_BMCR, bdp->phy_addr, ctrl_reg);
>         /* ieee 802.3 : The reset process shall be completed       */
>         /* within 0.5 seconds from the settting of PHY reset bit.  */
>         set_current_state(TASK_UNINTERRUPTIBLE);
>         schedule_timeout(HZ / 2);
> }
> 
> And here
> http://www.cs.helsinki.fi/linux/linux-kernel/2003-23/1245.html
> I found this entry:
> 
> <scott.feldman@intel.com> (03/06/08 1.1218)
> [e100] misc
> <...>
> * Add 1/2 second delay after PHY reset to allow link partner to
> see and respond to reset, per IEEE 802.3.
> 
> 
> I ran mii-diag when the LEDs went out and the register dump
> said it was in loopback. It is somewhat difficult reproduce.
> It seems to be timing dependent, something else has to occur
> at the same time.
> I must confess I have only seen it with the 2.6.13 kernel.
> I have not been able to reproduce it with 2.6.18.
> But I have found no change in the driver that would fix it so
> I suspect the problem is still there.
> 
> I have tried adding debug output to see if I can read back the
> RESET bit in set state, but then the problem refuses to show
> so I don't think I can rule out an unfinished PHY reset.

theoretically, yes, the ieee spec PHY reset timeout is kind of silly: in no way do we 
assume that we have re-negotiated link after 1/2 a second! Other code in the driver 
should take care of that, and since it works I'll assume it does ;)

the mdio_read probably acts as a flush to the hardware too - masquerading problems, more 
goodness. Perhaps we should do a single read in all cases and forget about the timeout 
(is there an mdio_write_flush?)

Basically the timeout is wrong: a LINK reset is not a PHY reset. The PHY is back online 
and ready to respond in (probably) a single clock cycle. The link can take up to 3 
seconds in normal cases. Waiting for 1/2 a second does not fix anything there. Here's 
where the 8255x (PHY part) spec abandons us: I don't read anything about PHY reset 
timeouts in it.

Can you try to debug if your while () timeout loop is actually waiting for a significant 
amount? something like adding a printk(KERN_ERR "counted down to %d0 msec\n", counter); 
after the entire while{} loop should show you if there is variation in the PHY reset 
time needed for the PHY to be back online.

running mii-diag before the link comes back up might be causing the issue in the first 
place, and certainly suggests a small race.

Have you tried to run the e100-sbit branch from jgarzik's netdev-2.6 tree? We're still 
looking into merging this and I guess I should push it to -mm to have it receive some 
testing....

Cheers,

Auke

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: e100: Wait for PHY reset to complete?
  2006-10-26  1:07         ` Auke Kok
@ 2006-10-30 19:05           ` Auke Kok
  2006-10-31 18:10             ` Anders Grafström
  0 siblings, 1 reply; 10+ messages in thread
From: Auke Kok @ 2006-10-30 19:05 UTC (permalink / raw)
  To: Anders Grafström; +Cc: Francois Romieu, netdev, Jesse Brandeburg

Auke Kok wrote:
> Anders Grafström wrote:
>> Auke Kok wrote:
>>> Allthough the spec itself didn't talk about phy reset times, I've ran 
>>> this
>>> patch with
>>> some debugging output on a few boxes and did some speed/duplex settings,
>>> and the PHY
>>> reset returned succesfull after the very first mdio_read, which is 
>>> before
>>> any msleep(10)
>>> is executed. That is also expected behaviour.
>>>
>>> I think you might be confusing this with a MAC reset, which has a
>>> documented 10usec
>>> timeout (see 8255x developers manual). The driver already adheres to 
>>> this
>>> by doing a
>>> 20usec delay after software/selective resets.
>>>
>>> which gets us back to the original problem: how did your driver end 
>>> up in
>>> loopback mode?
>>> (and, how did you figure out that it did??).
>>
>>
>> This is what the 2.4.33.3 driver does:
>>
>> void
>> e100_phy_reset(struct e100_private *bdp)
>> {
>>         u16 ctrl_reg;
>>         ctrl_reg = BMCR_RESET;
>>         e100_mdi_write(bdp, MII_BMCR, bdp->phy_addr, ctrl_reg);
>>         /* ieee 802.3 : The reset process shall be completed       */
>>         /* within 0.5 seconds from the settting of PHY reset bit.  */
>>         set_current_state(TASK_UNINTERRUPTIBLE);
>>         schedule_timeout(HZ / 2);
>> }
>>
>> And here
>> http://www.cs.helsinki.fi/linux/linux-kernel/2003-23/1245.html
>> I found this entry:
>>
>> <scott.feldman@intel.com> (03/06/08 1.1218)
>> [e100] misc
>> <...>
>> * Add 1/2 second delay after PHY reset to allow link partner to
>> see and respond to reset, per IEEE 802.3.
>>
>>
>> I ran mii-diag when the LEDs went out and the register dump
>> said it was in loopback. It is somewhat difficult reproduce.
>> It seems to be timing dependent, something else has to occur
>> at the same time.
>> I must confess I have only seen it with the 2.6.13 kernel.
>> I have not been able to reproduce it with 2.6.18.
>> But I have found no change in the driver that would fix it so
>> I suspect the problem is still there.
>>
>> I have tried adding debug output to see if I can read back the
>> RESET bit in set state, but then the problem refuses to show
>> so I don't think I can rule out an unfinished PHY reset.
> 
> theoretically, yes, the ieee spec PHY reset timeout is kind of silly: in 
> no way do we assume that we have re-negotiated link after 1/2 a second! 
> Other code in the driver should take care of that, and since it works 
> I'll assume it does ;)
> 
> the mdio_read probably acts as a flush to the hardware too - 
> masquerading problems, more goodness. Perhaps we should do a single read 
> in all cases and forget about the timeout (is there an mdio_write_flush?)

Anders,

It appears that there isn't such a thing as an mdio_write_flush in e100 but we can force 
a flush by reading the status register. Not sure if it works 100% so please give this a 
try and let me know if this fixes the problem for you.

Basically, we're preventing mdio writes that might happen after our reset write from 
being reordered. This might have caused an undefined state.

Auke


---

e100: force mdio write flush after PHY reset

Make sure the PHY reset happens without delay by flushing the reset write.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>


diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 815eb29..04a52cd 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2197,6 +2197,8 @@ static int e100_set_settings(struct net_
  	int err;

  	mdio_write(netdev, nic->mii.phy_id, MII_BMCR, BMCR_RESET);
+	/* flush MDIO write */
+	mdio_read(netdev, nic->mii.phy_id, MII_BMSR);
  	err = mii_ethtool_sset(&nic->mii, cmd);
  	e100_exec_cb(nic, NULL, e100_configure);


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: e100: Wait for PHY reset to complete?
  2006-10-30 19:05           ` Auke Kok
@ 2006-10-31 18:10             ` Anders Grafström
  2006-10-31 18:34               ` Auke Kok
  0 siblings, 1 reply; 10+ messages in thread
From: Anders Grafström @ 2006-10-31 18:10 UTC (permalink / raw)
  To: Auke Kok; +Cc: Francois Romieu, netdev, Jesse Brandeburg

>> Anders Grafström wrote:
>>> I ran mii-diag when the LEDs went out and the register dump
>>> said it was in loopback. It is somewhat difficult reproduce.
>>> It seems to be timing dependent, something else has to occur
>>> at the same time.
>>> I must confess I have only seen it with the 2.6.13 kernel.
>>> I have not been able to reproduce it with 2.6.18.
>>> But I have found no change in the driver that would fix it so
>>> I suspect the problem is still there.

Now looking at mdio_ctrl(), which implements mdio_write() and mdio_read(),
I see that this patch
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm3/broken-out/corruption-during-e100-mdi-register-access.patch
most likely fixed the problem that I have experienced.

"although access to the MDI registers involves
multiple steps which must not be intermixed, nothing was defending against
two or more threads attempting simultaneous access.  The most obvious
situation where such interference could occur involves the watchdog versus
ioctl paths"

Sorry for the noise.
Thanks,
Anders



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: e100: Wait for PHY reset to complete?
  2006-10-31 18:10             ` Anders Grafström
@ 2006-10-31 18:34               ` Auke Kok
  0 siblings, 0 replies; 10+ messages in thread
From: Auke Kok @ 2006-10-31 18:34 UTC (permalink / raw)
  To: Anders Grafström; +Cc: Francois Romieu, netdev, Jesse Brandeburg

Anders Grafström wrote:
>>> Anders Grafström wrote:
>>>> I ran mii-diag when the LEDs went out and the register dump
>>>> said it was in loopback. It is somewhat difficult reproduce.
>>>> It seems to be timing dependent, something else has to occur
>>>> at the same time.
>>>> I must confess I have only seen it with the 2.6.13 kernel.
>>>> I have not been able to reproduce it with 2.6.18.
>>>> But I have found no change in the driver that would fix it so
>>>> I suspect the problem is still there.
> 
> Now looking at mdio_ctrl(), which implements mdio_write() and mdio_read(),
> I see that this patch
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm3/broken-out/corruption-during-e100-mdi-register-access.patch
> most likely fixed the problem that I have experienced.
> 
> "although access to the MDI registers involves
> multiple steps which must not be intermixed, nothing was defending against
> two or more threads attempting simultaneous access.  The most obvious
> situation where such interference could occur involves the watchdog versus
> ioctl paths"
> 
> Sorry for the noise.

cool, thanks for digging that up and taking the time to report back ;)

Auke

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-10-31 18:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-25 17:22 e100: Wait for PHY reset to complete? Anders Grafstrom
2006-10-25 18:56 ` Francois Romieu
2006-10-25 20:18   ` Auke Kok
2006-10-25 21:26     ` Auke Kok
2006-10-25 21:57       ` Francois Romieu
2006-10-25 23:36       ` Anders Grafström
2006-10-26  1:07         ` Auke Kok
2006-10-30 19:05           ` Auke Kok
2006-10-31 18:10             ` Anders Grafström
2006-10-31 18:34               ` Auke Kok

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).