netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* still having r8169 woes with XID 18000000
@ 2010-06-04 12:10 Timo Teräs
  2010-06-04 12:36 ` Phil Sutter
  0 siblings, 1 reply; 21+ messages in thread
From: Timo Teräs @ 2010-06-04 12:10 UTC (permalink / raw)
  To: françois romieu; +Cc: netdev

Hi,

After fixing the MAC issues earlier, I'm still seeing some weird trouble
with my RTL8169sc/8110sc / XID 18000000 boards.

The box(es) were originally running 2.6.30.x kernel and everything
worked without major problems. But after upgrading to 2.6.32.x (and even
with most of the newer fixes included too), it seems that the sometimes
(not too often) some of the interfaces just won't work after reboot
(cold or hard). It's a 3-in-1 board, and usually when this happens one
of the interfaces won't work but the other two do work.

Whenever an interface is "broken", the following conditions are true:
 - forcing it to 10mbit/s and disabling autoneg will make it work
 - when it's not working ethtool -S reports rx_errors and align_errors
increasing
 - when autoneg is on, ethtool says that "Link Detected: no"

I do see that between the above two kernel versions PHY init code was
introduced for this XID (RTL_GIGA_MAC_VER_05) in commit 2e955856ff. I
wonder if it makes sense trying to revert this. Should the NIC still
work? Do you see any other suspicious commits?

Unrelated, I notice that if I have 1GB links autonegotiated, 'ip link'
will show them as "state UNKNOWN". Forced manual links get "state UP" as
expected. This is for the operstate field.

- Timo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: still having r8169 woes with XID 18000000
  2010-06-04 12:10 still having r8169 woes with XID 18000000 Timo Teräs
@ 2010-06-04 12:36 ` Phil Sutter
  2010-06-04 13:02   ` Timo Teräs
  0 siblings, 1 reply; 21+ messages in thread
From: Phil Sutter @ 2010-06-04 12:36 UTC (permalink / raw)
  To: Timo Teräs; +Cc: françois romieu, netdev

Hi,

On Fri, Jun 04, 2010 at 03:10:47PM +0300, Timo Teräs wrote:
> After fixing the MAC issues earlier, I'm still seeing some weird trouble
> with my RTL8169sc/8110sc / XID 18000000 boards.
> 
> The box(es) were originally running 2.6.30.x kernel and everything
> worked without major problems. But after upgrading to 2.6.32.x (and even
> with most of the newer fixes included too), it seems that the sometimes
> (not too often) some of the interfaces just won't work after reboot
> (cold or hard). It's a 3-in-1 board, and usually when this happens one
> of the interfaces won't work but the other two do work.
> 
> Whenever an interface is "broken", the following conditions are true:
>  - forcing it to 10mbit/s and disabling autoneg will make it work
>  - when it's not working ethtool -S reports rx_errors and align_errors
> increasing
>  - when autoneg is on, ethtool says that "Link Detected: no"

This (your last point) is about what we were experiencing at work using
PCI-based Gigabit Realtek NICs. Our solution to the problem was to
switch to a different NIC (Intel e1000), which obviously solves any
problems. ;)

But I've done some tests before, mainly being inspired by these mails:
http://permalink.gmane.org/gmane.linux.network/160136
http://permalink.gmane.org/gmane.linux.network/160280
and after some feedback from the mainboard manufacturer I've tested the
out-of-tree driver Realtek provides (version 6.013.00), which seems to
not have this issue. Very interesting results show up when comparing
6.013 with 6.012 (citing myself):

Comparing r8169-6.013 with it's predecessor 6.012, you'll find a newly
enabled function rtl8169_phy_power_up() as well as some more invocations
of rtl8169_phy_power_down().

This is probably the solution to these (at least in our case) very
sporadic, but highly annoying, problems. In fact, when our NIC didn't
detect any link, it needed a full power-cycle (no success with
reset-button), so almost not workaroundable.

HTH, Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: still having r8169 woes with XID 18000000
  2010-06-04 12:36 ` Phil Sutter
@ 2010-06-04 13:02   ` Timo Teräs
  2010-06-04 13:43     ` Phil Sutter
  0 siblings, 1 reply; 21+ messages in thread
From: Timo Teräs @ 2010-06-04 13:02 UTC (permalink / raw)
  To: phil; +Cc: netdev, françois romieu

On 06/04/2010 03:36 PM, Phil Sutter wrote:
> On Fri, Jun 04, 2010 at 03:10:47PM +0300, Timo Teräs wrote:
>> After fixing the MAC issues earlier, I'm still seeing some weird trouble
>> with my RTL8169sc/8110sc / XID 18000000 boards.
>>
>> The box(es) were originally running 2.6.30.x kernel and everything
>> worked without major problems. But after upgrading to 2.6.32.x (and even
>> with most of the newer fixes included too), it seems that the sometimes
>> (not too often) some of the interfaces just won't work after reboot
>> (cold or hard). It's a 3-in-1 board, and usually when this happens one
>> of the interfaces won't work but the other two do work.
>>
>> Whenever an interface is "broken", the following conditions are true:
>>  - forcing it to 10mbit/s and disabling autoneg will make it work
>>  - when it's not working ethtool -S reports rx_errors and align_errors
>> increasing
>>  - when autoneg is on, ethtool says that "Link Detected: no"
> 
> This (your last point) is about what we were experiencing at work using
> PCI-based Gigabit Realtek NICs. Our solution to the problem was to
> switch to a different NIC (Intel e1000), which obviously solves any
> problems. ;)
> 
> But I've done some tests before, mainly being inspired by these mails:
> http://permalink.gmane.org/gmane.linux.network/160136
> http://permalink.gmane.org/gmane.linux.network/160280
> and after some feedback from the mainboard manufacturer I've tested the
> out-of-tree driver Realtek provides (version 6.013.00), which seems to
> not have this issue. Very interesting results show up when comparing
> 6.013 with 6.012 (citing myself):
> 
> Comparing r8169-6.013 with it's predecessor 6.012, you'll find a newly
> enabled function rtl8169_phy_power_up() as well as some more invocations
> of rtl8169_phy_power_down().
> 
> This is probably the solution to these (at least in our case) very
> sporadic, but highly annoying, problems. In fact, when our NIC didn't
> detect any link, it needed a full power-cycle (no success with
> reset-button), so almost not workaroundable.

Sounds very similar to the problem I have. Thanks for the pointers!

It looks like the r8169 driver does have phy power up code in it, but
it's only executed for specific versions of the chip. Realtek driver
seems to do it unconditionally.

The check seems to be:
                if ((tp->mac_version == RTL_GIGA_MAC_VER_11) ||
                    (tp->mac_version == RTL_GIGA_MAC_VER_12) ||
                    (tp->mac_version >= RTL_GIGA_MAC_VER_17)) {

I wonder if I should just add my mac version there (_VER_05) and test if
it'll make it better.

- Timo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: still having r8169 woes with XID 18000000
  2010-06-04 13:02   ` Timo Teräs
@ 2010-06-04 13:43     ` Phil Sutter
  2010-06-04 17:31       ` Timo Teräs
  0 siblings, 1 reply; 21+ messages in thread
From: Phil Sutter @ 2010-06-04 13:43 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netdev, françois romieu

Hi,

On Fri, Jun 04, 2010 at 04:02:11PM +0300, Timo Teräs wrote:
> > Comparing r8169-6.013 with it's predecessor 6.012, you'll find a newly
> > enabled function rtl8169_phy_power_up() as well as some more invocations
> > of rtl8169_phy_power_down().
> > 
> > This is probably the solution to these (at least in our case) very
> > sporadic, but highly annoying, problems. In fact, when our NIC didn't
> > detect any link, it needed a full power-cycle (no success with
> > reset-button), so almost not workaroundable.
> 
> Sounds very similar to the problem I have. Thanks for the pointers!
> 
> It looks like the r8169 driver does have phy power up code in it, but
> it's only executed for specific versions of the chip. Realtek driver
> seems to do it unconditionally.

Hmm. I actually never looked at the corresponding parts of the
in-tree-driver, but that would have definitely been the next step in
order to fix it.

> The check seems to be:
>                 if ((tp->mac_version == RTL_GIGA_MAC_VER_11) ||
>                     (tp->mac_version == RTL_GIGA_MAC_VER_12) ||
>                     (tp->mac_version >= RTL_GIGA_MAC_VER_17)) {
> 
> I wonder if I should just add my mac version there (_VER_05) and test if
> it'll make it better.

Surely worth a try. On the other hand, looking at the sheer mass of
problem reports regarding this driver, making it worse is rather hard to
do I guess. :)

(Good night and) good luck, Phil


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: still having r8169 woes with XID 18000000
  2010-06-04 13:43     ` Phil Sutter
@ 2010-06-04 17:31       ` Timo Teräs
  2010-06-04 20:24         ` Timo Teräs
  0 siblings, 1 reply; 21+ messages in thread
From: Timo Teräs @ 2010-06-04 17:31 UTC (permalink / raw)
  To: Phil Sutter; +Cc: françois romieu, netdev

On 06/04/2010 04:43 PM, Phil Sutter wrote:
> On Fri, Jun 04, 2010 at 04:02:11PM +0300, Timo Teräs wrote:
>>> Comparing r8169-6.013 with it's predecessor 6.012, you'll find a newly
>>> enabled function rtl8169_phy_power_up() as well as some more invocations
>>> of rtl8169_phy_power_down().
>>>
>>> This is probably the solution to these (at least in our case) very
>>> sporadic, but highly annoying, problems. In fact, when our NIC didn't
>>> detect any link, it needed a full power-cycle (no success with
>>> reset-button), so almost not workaroundable.
>>
>> Sounds very similar to the problem I have. Thanks for the pointers!
>>
>> It looks like the r8169 driver does have phy power up code in it, but
>> it's only executed for specific versions of the chip. Realtek driver
>> seems to do it unconditionally.
> 
> Hmm. I actually never looked at the corresponding parts of the
> in-tree-driver, but that would have definitely been the next step in
> order to fix it.
> 
>> The check seems to be:
>>                 if ((tp->mac_version == RTL_GIGA_MAC_VER_11) ||
>>                     (tp->mac_version == RTL_GIGA_MAC_VER_12) ||
>>                     (tp->mac_version >= RTL_GIGA_MAC_VER_17)) {
>>
>> I wonder if I should just add my mac version there (_VER_05) and test if
>> it'll make it better.
> 
> Surely worth a try. On the other hand, looking at the sheer mass of
> problem reports regarding this driver, making it worse is rather hard to
> do I guess. :)

Ok. The issue is semi-reliably producible with just removing the
kernel driver and reloading it. The problem occurs maybe with 10-20% chance.

So far, it looks like the phy wakeup does not really help. Adding the
_VER_05 check did not help.

However, removing the specific phy config code
(rtl8169scd_hw_phy_config) which was introduced by commit 2e955856ff
seems to solve it. At least I was not able to reproduce the failure with
20-30 module reloads.

One more curiosity: if i do a hard power reset, the NIC has green link
indicator led after power up. When loading the kernel module it goes to
orange/red. I wonder why the difference.

- Timo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: still having r8169 woes with XID 18000000
  2010-06-04 17:31       ` Timo Teräs
@ 2010-06-04 20:24         ` Timo Teräs
  2010-06-05  7:39           ` [PATCH] " Timo Teräs
  0 siblings, 1 reply; 21+ messages in thread
From: Timo Teräs @ 2010-06-04 20:24 UTC (permalink / raw)
  To: Phil Sutter; +Cc: françois romieu, netdev

On 06/04/2010 08:31 PM, Timo Teräs wrote:
> On 06/04/2010 04:43 PM, Phil Sutter wrote:
>> On Fri, Jun 04, 2010 at 04:02:11PM +0300, Timo Teräs wrote:
>>>> Comparing r8169-6.013 with it's predecessor 6.012, you'll find a newly
>>>> enabled function rtl8169_phy_power_up() as well as some more invocations
>>>> of rtl8169_phy_power_down().
>>>>
>>>> This is probably the solution to these (at least in our case) very
>>>> sporadic, but highly annoying, problems. In fact, when our NIC didn't
>>>> detect any link, it needed a full power-cycle (no success with
>>>> reset-button), so almost not workaroundable.
>>>
> However, removing the specific phy config code
> (rtl8169scd_hw_phy_config) which was introduced by commit 2e955856ff
> seems to solve it. At least I was not able to reproduce the failure with
> 20-30 module reloads.

Ok, I figured that either the data the phy config writes is bad, or mdio
io is failing, so I added some additional checks to mdio_write and
mdio_read. More loops (upto 2000 iterations) and debug print if it
ultimately failed. And it did!

At bootup I got this:

r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:09.0: PCI->APIC IRQ transform: INT A -> IRQ 18
r8169 0000:00:09.0: no PCI Express capability
eth0: RTL8169sc/8110sc at 0xf835c000, 00:30:18:a6:2b:6c, XID 18000000 IRQ 18
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:0b.0: PCI->APIC IRQ transform: INT A -> IRQ 19
r8169 0000:00:0b.0: no PCI Express capability
eth1: RTL8169sc/8110sc at 0xf8360000, 00:30:18:a6:2b:6d, XID 18000000 IRQ 19
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:0c.0: PCI->APIC IRQ transform: INT A -> IRQ 16
r8169 0000:00:0c.0: no PCI Express capability
eth2: RTL8169sc/8110sc at 0xf8364000, 00:30:18:a6:2b:6e, XID 18000000 IRQ 16
r8169: mdio_write(f8364000, 0x00000003, 0000000a1) required 2000 cycles
r8169: mdio_write(f8364000, 0x00000000, 000001000) required 2000 cycles
r8169: mdio_write(f8364000, 0x00000000, 00000a0ff) required 2000 cycles
r8169: mdio_write(f8364000, 0x00000014, 00000fb54) required 2000 cycles

And eth2 was not working. Reloading the module gave a lot of other
mdio_write and mdio_read errors.

It seems to be pretty random when the errors occur, but that's the
reason why the NIC stops working: mdio_write() fails (one or more times)
at some crucial point of the board specific phy config code resulting in
bad state.

Any ideas how to debug this further?

I guess next step is to compile the Realtek driver and see if that works
right.

> One more curiosity: if i do a hard power reset, the NIC has green link
> indicator led after power up. When loading the kernel module it goes to
> orange/red. I wonder why the difference.

Figured this. At startup it goes to 100mbit/s fixed mode. After module
load it gets 1gb/s. Setting it manually to 100mbit/s changes the color
back to green. So it's just a speed indicator.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] Re: still having r8169 woes with XID 18000000
  2010-06-04 20:24         ` Timo Teräs
@ 2010-06-05  7:39           ` Timo Teräs
  2010-06-05  9:02             ` David Miller
  2010-06-05  9:13             ` Eric Dumazet
  0 siblings, 2 replies; 21+ messages in thread
From: Timo Teräs @ 2010-06-05  7:39 UTC (permalink / raw)
  To: Phil Sutter, françois romieu, netdev

On 06/04/2010 11:24 PM, Timo Teräs wrote:
> On 06/04/2010 08:31 PM, Timo Teräs wrote:
> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> r8169 0000:00:09.0: PCI->APIC IRQ transform: INT A -> IRQ 18
> r8169 0000:00:09.0: no PCI Express capability
> eth0: RTL8169sc/8110sc at 0xf835c000, 00:30:18:a6:2b:6c, XID 18000000 IRQ 18
> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> r8169 0000:00:0b.0: PCI->APIC IRQ transform: INT A -> IRQ 19
> r8169 0000:00:0b.0: no PCI Express capability
> eth1: RTL8169sc/8110sc at 0xf8360000, 00:30:18:a6:2b:6d, XID 18000000 IRQ 19
> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> r8169 0000:00:0c.0: PCI->APIC IRQ transform: INT A -> IRQ 16
> r8169 0000:00:0c.0: no PCI Express capability
> eth2: RTL8169sc/8110sc at 0xf8364000, 00:30:18:a6:2b:6e, XID 18000000 IRQ 16
> r8169: mdio_write(f8364000, 0x00000003, 0000000a1) required 2000 cycles
> r8169: mdio_write(f8364000, 0x00000000, 000001000) required 2000 cycles
> r8169: mdio_write(f8364000, 0x00000000, 00000a0ff) required 2000 cycles
> r8169: mdio_write(f8364000, 0x00000014, 00000fb54) required 2000 cycles
> 
> And eth2 was not working. Reloading the module gave a lot of other
> mdio_write and mdio_read errors.
> 
> It seems to be pretty random when the errors occur, but that's the
> reason why the NIC stops working: mdio_write() fails (one or more times)
> at some crucial point of the board specific phy config code resulting in
> bad state.
> 
> Any ideas how to debug this further?

Ok, I compared Realtek's and the in-tree driver. The only essential
difference is that Realtek driver uses udelay(100) in mdio_write()'s
busy polling loop where as the in-tree uses udelay(25). And that seems
to be the magic difference! Using udelay(100) fixes this!

I'm guessing that the phy needs slight delay between consecutive
mdio_write's even if it has advertised that the write has been
completed. And yes, just adding a small delay in the end of mdio_write
does seem to work too.

Francois, you think the below patch is ok? Should I send it as properly
formatted commit?

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 217e709..6db62bf 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -559,6 +559,7 @@ static void mdio_write(void __iomem *ioaddr,
 			break;
 		udelay(25);
 	}
+	udelay(25);
 }

 static int mdio_read(void __iomem *ioaddr, int reg_addr)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] Re: still having r8169 woes with XID 18000000
  2010-06-05  7:39           ` [PATCH] " Timo Teräs
@ 2010-06-05  9:02             ` David Miller
  2010-06-05  9:13             ` Eric Dumazet
  1 sibling, 0 replies; 21+ messages in thread
From: David Miller @ 2010-06-05  9:02 UTC (permalink / raw)
  To: timo.teras; +Cc: phil, romieu, netdev

From: Timo Teräs <timo.teras@iki.fi>
Date: Sat, 05 Jun 2010 10:39:55 +0300

> Ok, I compared Realtek's and the in-tree driver. The only essential
> difference is that Realtek driver uses udelay(100) in mdio_write()'s
> busy polling loop where as the in-tree uses udelay(25). And that seems
> to be the magic difference! Using udelay(100) fixes this!
> 
> I'm guessing that the phy needs slight delay between consecutive
> mdio_write's even if it has advertised that the write has been
> completed. And yes, just adding a small delay in the end of mdio_write
> does seem to work too.
> 
> Francois, you think the below patch is ok? Should I send it as properly
> formatted commit?

Excellent detective work, Francois please review!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Re: still having r8169 woes with XID 18000000
  2010-06-05  7:39           ` [PATCH] " Timo Teräs
  2010-06-05  9:02             ` David Miller
@ 2010-06-05  9:13             ` Eric Dumazet
  2010-06-05  9:21               ` Timo Teräs
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-06-05  9:13 UTC (permalink / raw)
  To: Timo Teräs; +Cc: Phil Sutter, françois romieu, netdev

Le samedi 05 juin 2010 à 10:39 +0300, Timo Teräs a écrit :

> Ok, I compared Realtek's and the in-tree driver. The only essential
> difference is that Realtek driver uses udelay(100) in mdio_write()'s
> busy polling loop where as the in-tree uses udelay(25). And that seems
> to be the magic difference! Using udelay(100) fixes this!
> 
> I'm guessing that the phy needs slight delay between consecutive
> mdio_write's even if it has advertised that the write has been
> completed. And yes, just adding a small delay in the end of mdio_write
> does seem to work too.
> 
> Francois, you think the below patch is ok? Should I send it as properly
> formatted commit?
> 
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 217e709..6db62bf 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -559,6 +559,7 @@ static void mdio_write(void __iomem *ioaddr,
>  			break;
>  		udelay(25);
>  	}
> +	udelay(25);
>  }
> 
>  static int mdio_read(void __iomem *ioaddr, int reg_addr)
> --

Sure this deserves an official patch with all prereqs, but please add a
comment in mdio_write() why this extra udelay(25) is needed, especially
since you say of udelay(100) 'fixes the bug'.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] Re: still having r8169 woes with XID 18000000
  2010-06-05  9:13             ` Eric Dumazet
@ 2010-06-05  9:21               ` Timo Teräs
  2010-06-05 10:21                 ` [PATCH] r8169: fix random mdio_write failures Timo Teräs
  0 siblings, 1 reply; 21+ messages in thread
From: Timo Teräs @ 2010-06-05  9:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Phil Sutter, françois romieu, netdev

On 06/05/2010 12:13 PM, Eric Dumazet wrote:
> Le samedi 05 juin 2010 à 10:39 +0300, Timo Teräs a écrit :
>> Ok, I compared Realtek's and the in-tree driver. The only essential
>> difference is that Realtek driver uses udelay(100) in mdio_write()'s
>> busy polling loop where as the in-tree uses udelay(25). And that seems
>> to be the magic difference! Using udelay(100) fixes this!
>>
>> I'm guessing that the phy needs slight delay between consecutive
>> mdio_write's even if it has advertised that the write has been
>> completed. And yes, just adding a small delay in the end of mdio_write
>> does seem to work too.
>>
>> Francois, you think the below patch is ok? Should I send it as properly
>> formatted commit?
>>
>> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
>> index 217e709..6db62bf 100644
>> --- a/drivers/net/r8169.c
>> +++ b/drivers/net/r8169.c
>> @@ -559,6 +559,7 @@ static void mdio_write(void __iomem *ioaddr,
>>  			break;
>>  		udelay(25);
>>  	}
>> +	udelay(25);
>>  }
>>
>>  static int mdio_read(void __iomem *ioaddr, int reg_addr)
>> --
> 
> Sure this deserves an official patch with all prereqs, but please add a
> comment in mdio_write() why this extra udelay(25) is needed, especially
> since you say of udelay(100) 'fixes the bug'.

Uh, yes. The intention was to get feedback if this is the proper place
for the delay. I first thought that maybe we could just add the delay to
the phy config function writing the values in tight loop... but as it
appears even some other mdio_writes seem to fail, this seems to be the
logical and correct place.

Alternatively, I was thinking if someone had specs to check if they say
anything about delay needed between mdio writes.

Reason why adding new delay in the end of the function is better is
because using udelay(100) in the loop means that anything between
0..100us has elapsed after the "write complete" is acked; if very
unlucky the "write complete" happens just after our udelay has ended and
there would be no delay between next mdio write.

Adding the additional udelay(25) in the end guarantees small delay
between "write complete" ack and the next write. And yes, it needs a
comment.

I'll send a new patch shortly.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] r8169: fix random mdio_write failures
  2010-06-05  9:21               ` Timo Teräs
@ 2010-06-05 10:21                 ` Timo Teräs
  2010-06-05 12:41                   ` Francois Romieu
  0 siblings, 1 reply; 21+ messages in thread
From: Timo Teräs @ 2010-06-05 10:21 UTC (permalink / raw)
  To: netdev; +Cc: Timo Teräs, françois romieu, Edward Hsu

Some configurations need delay between the "write completed" indication
and new write to work reliably.

Realtek driver seems to use longer delay when polling the "write complete"
bit, so it waits long enough between writes with high probability (but
could probably break too). This patch adds a new udelay to make sure we
wait unconditionally some time after the write complete indication.

This caused a regression with XID 18000000 boards when the board specific
phy configuration writing many mdio registers was added in commit
2e955856ff (r8169: phy init for the 8169scd). Some of the configration
mdio writes would almost always fail, and depending on failure might leave
the PHY in non-working state.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Cc: françois romieu <romieu@fr.zoreil.com>
Cc: Edward Hsu <edward_hsu@realtek.com.tw>
---
 drivers/net/r8169.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 217e709..03a8318 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -559,6 +559,11 @@ static void mdio_write(void __iomem *ioaddr, int reg_addr, int value)
 			break;
 		udelay(25);
 	}
+	/*
+	 * Some configurations require a small delay even after the write
+	 * completed indication or the next write might fail.
+	 */
+	udelay(25);
 }
 
 static int mdio_read(void __iomem *ioaddr, int reg_addr)
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] r8169: fix random mdio_write failures
  2010-06-05 10:21                 ` [PATCH] r8169: fix random mdio_write failures Timo Teräs
@ 2010-06-05 12:41                   ` Francois Romieu
  2010-06-06 22:39                     ` David Miller
  2010-06-07  9:26                     ` hayeswang
  0 siblings, 2 replies; 21+ messages in thread
From: Francois Romieu @ 2010-06-05 12:41 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netdev, Edward Hsu, Hayes, davem

Timo Teräs <timo.teras@iki.fi> :
[...]
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 217e709..03a8318 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -559,6 +559,11 @@ static void mdio_write(void __iomem *ioaddr, int reg_addr, int value)
>  			break;
>  		udelay(25);
>  	}
> +	/*
> +	 * Some configurations require a small delay even after the write
> +	 * completed indication or the next write might fail.
> +	 */
> +	udelay(25);

Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>

Good work.

I wonder if increasing the in-loop delay as well would help the write
succeed faster (or slower ?).

-- 
Ueimor

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] r8169: fix random mdio_write failures
  2010-06-05 12:41                   ` Francois Romieu
@ 2010-06-06 22:39                     ` David Miller
  2010-06-07  9:26                     ` hayeswang
  1 sibling, 0 replies; 21+ messages in thread
From: David Miller @ 2010-06-06 22:39 UTC (permalink / raw)
  To: romieu; +Cc: timo.teras, netdev, edward_hsu, hayeswang

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Sat, 5 Jun 2010 14:41:03 +0200

> Timo Teräs <timo.teras@iki.fi> :
> [...]
>> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
>> index 217e709..03a8318 100644
>> --- a/drivers/net/r8169.c
>> +++ b/drivers/net/r8169.c
>> @@ -559,6 +559,11 @@ static void mdio_write(void __iomem *ioaddr, int reg_addr, int value)
>>  			break;
>>  		udelay(25);
>>  	}
>> +	/*
>> +	 * Some configurations require a small delay even after the write
>> +	 * completed indication or the next write might fail.
>> +	 */
>> +	udelay(25);
> 
> Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] r8169: fix random mdio_write failures
  2010-06-05 12:41                   ` Francois Romieu
  2010-06-06 22:39                     ` David Miller
@ 2010-06-07  9:26                     ` hayeswang
  2010-06-07 21:51                       ` Francois Romieu
  1 sibling, 1 reply; 21+ messages in thread
From: hayeswang @ 2010-06-07  9:26 UTC (permalink / raw)
  To: 'Francois Romieu', 'Timo Teräs'; +Cc: netdev, davem

Our hardware engineer suggests that check the completed indication
per 100 micro seconds. And it needs 20 micro seconds delay after the 
completed indication for the next command.
 
Best Regards,
Hayes


-----Original Message-----
From: Francois Romieu [mailto:romieu@fr.zoreil.com] 
Sent: Saturday, June 05, 2010 8:41 PM
To: Timo Teräs
Cc: netdev@vger.kernel.org; Edward Hsu; Hayeswang; davem@davemloft.net
Subject: Re: [PATCH] r8169: fix random mdio_write failures

Timo Teräs <timo.teras@iki.fi> :
[...]
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index 
> 217e709..03a8318 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -559,6 +559,11 @@ static void mdio_write(void __iomem *ioaddr, int
reg_addr, int value)
>  			break;
>  		udelay(25);
>  	}
> +	/*
> +	 * Some configurations require a small delay even after the write
> +	 * completed indication or the next write might fail.
> +	 */
> +	udelay(25);

Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>

Good work.

I wonder if increasing the in-loop delay as well would help the write
succeed faster (or slower ?).

--
Ueimor


------Please consider the environment before printing this e-mail. 




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] r8169: fix random mdio_write failures
  2010-06-07  9:26                     ` hayeswang
@ 2010-06-07 21:51                       ` Francois Romieu
  2010-06-08  6:06                         ` Timo Teräs
  0 siblings, 1 reply; 21+ messages in thread
From: Francois Romieu @ 2010-06-07 21:51 UTC (permalink / raw)
  To: hayeswang; +Cc: 'Timo Teräs', netdev, davem

hayeswang <hayeswang@realtek.com> :
> Our hardware engineer suggests that check the completed indication
> per 100 micro seconds. And it needs 20 micro seconds delay after the 
> completed indication for the next command.

Should we do the same for mdio_read as well (100 us per iteration + 
an extra 20 us) ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] r8169: fix random mdio_write failures
  2010-06-07 21:51                       ` Francois Romieu
@ 2010-06-08  6:06                         ` Timo Teräs
  2010-06-08  6:26                           ` Francois Romieu
  2010-06-09  2:47                           ` hayeswang
  0 siblings, 2 replies; 21+ messages in thread
From: Timo Teräs @ 2010-06-08  6:06 UTC (permalink / raw)
  To: Francois Romieu; +Cc: hayeswang, netdev, davem

On 06/08/2010 12:51 AM, Francois Romieu wrote:
> hayeswang <hayeswang@realtek.com> :
>> Our hardware engineer suggests that check the completed indication
>> per 100 micro seconds. And it needs 20 micro seconds delay after the 
>> completed indication for the next command.
> 
> Should we do the same for mdio_read as well (100 us per iteration + 
> an extra 20 us) ?

Well, doing 100us per iteration will increase the latency that the code
notices "write complete" which slows down things. It'll also slightly
decrease bus traffic which is good. But I'd be just fine with 25us per
iteration. It sounds unlikely that polling the status register would
slow down the actual write operation (if that is the case then 100us
would be desirable).

Changing my 25us to 20us would good. The original 25us was just a guess.
The comment should be probably also updated that those delays are from
realtek hw specs then.

Would you like me to send a patch?

- Timo



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] r8169: fix random mdio_write failures
  2010-06-08  6:06                         ` Timo Teräs
@ 2010-06-08  6:26                           ` Francois Romieu
  2010-06-09  2:47                           ` hayeswang
  1 sibling, 0 replies; 21+ messages in thread
From: Francois Romieu @ 2010-06-08  6:26 UTC (permalink / raw)
  To: Timo Teräs; +Cc: hayeswang, netdev, davem

Timo Teräs <timo.teras@iki.fi> :
[ok]
> iteration. It sounds unlikely that polling the status register would
> slow down the actual write operation (if that is the case then 100us
> would be desirable).

I would not be that surprized.

> Changing my 25us to 20us would good. The original 25us was just a guess.
> The comment should be probably also updated that those delays are from
> realtek hw specs then.

Yes.

> Would you like me to send a patch?

Of course. Some comment from Hayes regarding mdio_read would be
welcome beforehand though.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] r8169: fix random mdio_write failures
  2010-06-08  6:06                         ` Timo Teräs
  2010-06-08  6:26                           ` Francois Romieu
@ 2010-06-09  2:47                           ` hayeswang
  2010-06-09  5:22                             ` [PATCH] r8169: fix mdio_read and update mdio_write according to hw specs Timo Teräs
  1 sibling, 1 reply; 21+ messages in thread
From: hayeswang @ 2010-06-09  2:47 UTC (permalink / raw)
  To: 'Timo Teräs', 'Francois Romieu'; +Cc: netdev, davem

I think the major point is that it needs a delay (20 us) after both read and
write completed indiacation before next command.
It needs almost 100 us for the indication to be completed, so our engineer
suggests us to check the indication per 100 us.
However, it is  ok to reduce the timer from 100 us to 25 us for checking .
 
Best Regards,
Hayes


-----Original Message-----
From: Timo Teräs [mailto:timo.teras@gmail.com] On Behalf Of Timo Teras
Sent: Tuesday, June 08, 2010 2:06 PM
To: Francois Romieu
Cc: Hayeswang; netdev@vger.kernel.org; davem@davemloft.net
Subject: Re: [PATCH] r8169: fix random mdio_write failures

On 06/08/2010 12:51 AM, Francois Romieu wrote:
> hayeswang <hayeswang@realtek.com> :
>> Our hardware engineer suggests that check the completed indication 
>> per 100 micro seconds. And it needs 20 micro seconds delay after the 
>> completed indication for the next command.
> 
> Should we do the same for mdio_read as well (100 us per iteration + an 
> extra 20 us) ?

Well, doing 100us per iteration will increase the latency that the code
notices "write complete" which slows down things. It'll also slightly
decrease bus traffic which is good. But I'd be just fine with 25us per
iteration. It sounds unlikely that polling the status register would slow
down the actual write operation (if that is the case then 100us would be
desirable).

Changing my 25us to 20us would good. The original 25us was just a guess.
The comment should be probably also updated that those delays are from
realtek hw specs then.

Would you like me to send a patch?

- Timo




------Please consider the environment before printing this e-mail. 




^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] r8169: fix mdio_read and update mdio_write according to hw specs
  2010-06-09  2:47                           ` hayeswang
@ 2010-06-09  5:22                             ` Timo Teräs
  2010-06-09  6:18                               ` Francois Romieu
  0 siblings, 1 reply; 21+ messages in thread
From: Timo Teräs @ 2010-06-09  5:22 UTC (permalink / raw)
  To: netdev; +Cc: Timo Teräs, Francois Romieu, Hayeswang

Realtek confirmed that a 20us delay is needed after mdio_read and
mdio_write operations. Reduce the delay in mdio_write, and add it
to mdio_read too. Also add a comment that the 20us is from hw specs.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayeswang <hayeswang@realtek.com>
---
 drivers/net/r8169.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 03a8318..96b6cfb 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -560,10 +560,10 @@ static void mdio_write(void __iomem *ioaddr, int reg_addr, int value)
 		udelay(25);
 	}
 	/*
-	 * Some configurations require a small delay even after the write
-	 * completed indication or the next write might fail.
+	 * According to hardware specs a 20us delay is required after write
+	 * complete indication, but before sending next command.
 	 */
-	udelay(25);
+	udelay(20);
 }
 
 static int mdio_read(void __iomem *ioaddr, int reg_addr)
@@ -583,6 +583,12 @@ static int mdio_read(void __iomem *ioaddr, int reg_addr)
 		}
 		udelay(25);
 	}
+	/*
+	 * According to hardware specs a 20us delay is required after read
+	 * complete indication, but before sending next command.
+	 */
+	udelay(20);
+
 	return value;
 }
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] r8169: fix mdio_read and update mdio_write according to hw specs
  2010-06-09  5:22                             ` [PATCH] r8169: fix mdio_read and update mdio_write according to hw specs Timo Teräs
@ 2010-06-09  6:18                               ` Francois Romieu
  2010-06-10  0:32                                 ` David Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Francois Romieu @ 2010-06-09  6:18 UTC (permalink / raw)
  To: Timo Teräs; +Cc: netdev, Hayeswang

Timo Teräs <timo.teras@iki.fi> :
> Realtek confirmed that a 20us delay is needed after mdio_read and
> mdio_write operations. Reduce the delay in mdio_write, and add it
> to mdio_read too. Also add a comment that the 20us is from hw specs.
> 
> Signed-off-by: Timo Teräs <timo.teras@iki.fi>

Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] r8169: fix mdio_read and update mdio_write according to hw specs
  2010-06-09  6:18                               ` Francois Romieu
@ 2010-06-10  0:32                                 ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2010-06-10  0:32 UTC (permalink / raw)
  To: romieu; +Cc: timo.teras, netdev, hayeswang

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 9 Jun 2010 08:18:25 +0200

> Timo Teräs <timo.teras@iki.fi> :
>> Realtek confirmed that a 20us delay is needed after mdio_read and
>> mdio_write operations. Reduce the delay in mdio_write, and add it
>> to mdio_read too. Also add a comment that the 20us is from hw specs.
>> 
>> Signed-off-by: Timo Teräs <timo.teras@iki.fi>
> 
> Acked-off-by: Francois Romieu <romieu@fr.zoreil.com>

I think you meant "Acked-by: " :-)  I fixed this and committed
Timo's patch, thanks everyone!

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-06-10  0:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-04 12:10 still having r8169 woes with XID 18000000 Timo Teräs
2010-06-04 12:36 ` Phil Sutter
2010-06-04 13:02   ` Timo Teräs
2010-06-04 13:43     ` Phil Sutter
2010-06-04 17:31       ` Timo Teräs
2010-06-04 20:24         ` Timo Teräs
2010-06-05  7:39           ` [PATCH] " Timo Teräs
2010-06-05  9:02             ` David Miller
2010-06-05  9:13             ` Eric Dumazet
2010-06-05  9:21               ` Timo Teräs
2010-06-05 10:21                 ` [PATCH] r8169: fix random mdio_write failures Timo Teräs
2010-06-05 12:41                   ` Francois Romieu
2010-06-06 22:39                     ` David Miller
2010-06-07  9:26                     ` hayeswang
2010-06-07 21:51                       ` Francois Romieu
2010-06-08  6:06                         ` Timo Teräs
2010-06-08  6:26                           ` Francois Romieu
2010-06-09  2:47                           ` hayeswang
2010-06-09  5:22                             ` [PATCH] r8169: fix mdio_read and update mdio_write according to hw specs Timo Teräs
2010-06-09  6:18                               ` Francois Romieu
2010-06-10  0:32                                 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).