netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* r8169 tx problem (1s pause with ping)
@ 2007-06-13  0:41 Benjamin LaHaise
  2007-06-13 21:18 ` Francois Romieu
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin LaHaise @ 2007-06-13  0:41 UTC (permalink / raw)
  To: netdev

Hello folks,

I'm seeing something odd with r8169 on FC7: doing a ping -s 1600 alternates 
between a 1s latency and sub 1ms.  Has anyone else seen anything like this?  
The system in question is an Asus M2A-VM with an onboard RTL8111 (I think).  
NAPI doesn't seem to make a difference.  The kernel in question is currently 
a vanilla 2.6.21.5.  Sub-mtu sized packets behave normally.

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)

PING 1.2.3.4 (1.2.3.4) 1600(1628) bytes of data.
1608 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1000 ms
1608 bytes from 1.2.3.4: icmp_seq=2 ttl=64 time=0.816 ms
1608 bytes from 1.2.3.4: icmp_seq=3 ttl=64 time=1000 ms
1608 bytes from 1.2.3.4: icmp_seq=4 ttl=64 time=0.661 ms

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <zyntrop@kvack.org>.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: r8169 tx problem (1s pause with ping)
  2007-06-13  0:41 r8169 tx problem (1s pause with ping) Benjamin LaHaise
@ 2007-06-13 21:18 ` Francois Romieu
  2007-06-14 15:33   ` David Gundersen
  0 siblings, 1 reply; 9+ messages in thread
From: Francois Romieu @ 2007-06-13 21:18 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: netdev

Benjamin LaHaise <bcrl@kvack.org> :
[...]
> I'm seeing something odd with r8169 on FC7: doing a ping -s 1600 alternates 
> between a 1s latency and sub 1ms.  Has anyone else seen anything like this?  
> The system in question is an Asus M2A-VM with an onboard RTL8111 (I think).  
> NAPI doesn't seem to make a difference.  The kernel in question is currently 
> a vanilla 2.6.21.5.  Sub-mtu sized packets behave normally.

Same thing here for my 8168 rev 01 (asrock 945G dvi LOM) with 2.6.22-rc4
and 2.6.22-rc3 + r816x patchkit.

Wonderful.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: r8169 tx problem (1s pause with ping)
  2007-06-13 21:18 ` Francois Romieu
@ 2007-06-14 15:33   ` David Gundersen
  2007-06-14 22:00     ` Francois Romieu
  2007-06-18 15:14     ` Benjamin LaHaise
  0 siblings, 2 replies; 9+ messages in thread
From: David Gundersen @ 2007-06-14 15:33 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1717 bytes --]

Francois Romieu wrote:
> Benjamin LaHaise <bcrl@kvack.org> :
> [...]
>> I'm seeing something odd with r8169 on FC7: doing a ping -s 1600 alternates 
>> between a 1s latency and sub 1ms.  Has anyone else seen anything like this?  
>> The system in question is an Asus M2A-VM with an onboard RTL8111 (I think).  
>> NAPI doesn't seem to make a difference.  The kernel in question is currently 
>> a vanilla 2.6.21.5.  Sub-mtu sized packets behave normally.
> 
> Same thing here for my 8168 rev 01 (asrock 945G dvi LOM) with 2.6.22-rc4
> and 2.6.22-rc3 + r816x patchkit.
> 
> Wonderful.
> 

I've got a modified version of the latest realtek (r8168) driver running 
here that doesn't seem to exhibit those symptoms.

The trouble I have is that I've been playing with multiple sections of 
the code and I'm not 100% sure what part(s) might impact that particular 
test.  The bits I know I've messed with are the bits that set the 
First/Last fragment flags and the NPQ flagging section (as described in 
my previous emails).

I know it's not particularly scientific of me to be changing multiple 
sections of the driver at once and I'm sorry about that but it's fairly 
late here and I really should get some sleep :).  I might do some more 
thorough testing on the weekend to find out what the minimal changes 
required are to get things working.

In the mean-time I'll attach my patch for the r8168-8.001.00 realtek 
driver here in case anybody else wants to have a play with it and see if 
it helps them out.

Also, It might be a silly question, but have you tried taking packet 
captures from the perspective of the box with the realtek chipset & 
another box during the pinging and comparing the two?

Regards,
Dave.

[-- Attachment #2: r8168-8.001.00-dg.patch.gz --]
[-- Type: application/x-gzip, Size: 2237 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: r8169 tx problem (1s pause with ping)
  2007-06-14 15:33   ` David Gundersen
@ 2007-06-14 22:00     ` Francois Romieu
  2007-06-19 23:06       ` Francois Romieu
  2007-06-18 15:14     ` Benjamin LaHaise
  1 sibling, 1 reply; 9+ messages in thread
From: Francois Romieu @ 2007-06-14 22:00 UTC (permalink / raw)
  To: David Gundersen; +Cc: netdev, Benjamin LaHaise

David Gundersen <gundy@iinet.net.au> :
[...]
> I might do some more thorough testing on the weekend to find out what
> the minimal changes required are to get things working.

In your patch, the first chunk of data (which is handed back to the
nic outside of rtl8169_xmit_frags) will not have is First fragment
bit set when nr_frags != 0. It makes me a bit nervous.

The NPQ part of your patch actually forces the start_xmit handler to
wait for all previously queued packets to be sent before telling the
nic that a packet may be available. I can understand that it will fix
the symptoms but it will implicitly shorten the TX queue a lot.

A pesky fever apart, I should be able to come with a patch before
the week end.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: r8169 tx problem (1s pause with ping)
  2007-06-14 15:33   ` David Gundersen
  2007-06-14 22:00     ` Francois Romieu
@ 2007-06-18 15:14     ` Benjamin LaHaise
  2007-06-19 10:45       ` David Gundersen
  1 sibling, 1 reply; 9+ messages in thread
From: Benjamin LaHaise @ 2007-06-18 15:14 UTC (permalink / raw)
  To: David Gundersen; +Cc: netdev

On Fri, Jun 15, 2007 at 01:33:14AM +1000, David Gundersen wrote:
> In the mean-time I'll attach my patch for the r8168-8.001.00 realtek 
> driver here in case anybody else wants to have a play with it and see if 
> it helps them out.

Out of curiousity, does it work if you just do a single read (ie 
RTL_R8(TxPoll);) of the register before writing to it?  That would clear 
things up if it is a PCI posting problem.

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <zyntrop@kvack.org>.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: r8169 tx problem (1s pause with ping)
  2007-06-18 15:14     ` Benjamin LaHaise
@ 2007-06-19 10:45       ` David Gundersen
  0 siblings, 0 replies; 9+ messages in thread
From: David Gundersen @ 2007-06-19 10:45 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: netdev


> Out of curiousity, does it work if you just do a single read (ie 
> RTL_R8(TxPoll);) of the register before writing to it?  That would clear 
> things up if it is a PCI posting problem.

Hi Ben,

I tried your suggestion but it didn't seem to make any difference :(

I tried the following combinations:

  - realtek original                          [broken]
  - realtek original with the RTL_R8(TxPoll)
    before RTL_W8(TxPoll, NPQ);               [broken]
  - my patched version without the ndelay
    loop but including the RTL_R8(TxPoll)
    (to see if my messing with the frag logic
     was having any impact)                   [broken]
  - my patched version including the
    ndelay loop                               [full speed transfers]


Also, I'm not sure if I made it clear in my first post, but I'm testing 
these changes on a 8168B (it's built in to my GA-945G-S3 motherboard).

I'm not sure if we can assume that the same change applied to the 8169 
driver would have the same effect on the 8169 too?   (is the 8168 just a 
PCI express version of the [pci] 8169?)


Dave.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: r8169 tx problem (1s pause with ping)
  2007-06-14 22:00     ` Francois Romieu
@ 2007-06-19 23:06       ` Francois Romieu
  2007-06-20  8:57         ` David Gundersen
  0 siblings, 1 reply; 9+ messages in thread
From: Francois Romieu @ 2007-06-19 23:06 UTC (permalink / raw)
  To: David Gundersen; +Cc: netdev, Benjamin LaHaise

Francois Romieu <romieu@fr.zoreil.com> :
[...]
> A pesky fever apart, I should be able to come with a patch before
> the week end.

(oops, late again)

You can try the hack below as a temporary fix.

It makes things way better on top of 2.6.22-rc5 + current r8169 patchkit at
http://www.fr.zoreil.com/people/francois/misc/20070619-2.6.22-rc5-r8169-test.patch

It applies against 2.6.22-rc5 too.

I should come with something else soon.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 8f3e0da..8c0851f 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2682,6 +2688,8 @@ static void rtl8169_tx_interrupt(struct net_device *dev,
 		    (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
 			netif_wake_queue(dev);
 		}
+		if (tp->dirty_tx != tp->cur_rx)
+			RTL_W8(TxPoll, NPQ);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: r8169 tx problem (1s pause with ping)
  2007-06-19 23:06       ` Francois Romieu
@ 2007-06-20  8:57         ` David Gundersen
  2007-06-20 21:15           ` Francois Romieu
  0 siblings, 1 reply; 9+ messages in thread
From: David Gundersen @ 2007-06-20  8:57 UTC (permalink / raw)
  To: netdev; +Cc: Francois Romieu, Benjamin LaHaise


> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 8f3e0da..8c0851f 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -2682,6 +2688,8 @@ static void rtl8169_tx_interrupt(struct net_device *dev,
>  		    (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
>  			netif_wake_queue(dev);
>  		}
> +		if (tp->dirty_tx != tp->cur_rx)
> +			RTL_W8(TxPoll, NPQ);
>  	}
>  }
>  

Hi Francois,

A few things:

- Why are you checking dirty_tx against cur_rx (shouldn't it be cur_tx?)?

- Is there a possibility that the driver could be triggering the card to 
send invalid packets with that code?

I'm thinking in _start_xmit, the cur_tx pointer (assuming that's what 
you meant to include above) gets incremented when the packet is sent to 
the card (the RTL_W8(TxPoll,NPQ)) to indicate that the card _should_ be 
able to process packets up to that point in the queue.

The interrupt routine comes along later to clean up the buffers between 
dirty_tx (the last packet that the driver knows the card has sent) and 
cur_tx (the point that the card could potentially be up to).   What 
seems could happen with the code above is that the card gets told that a 
packet is ready to be sent when really it's not.  I'm not sure how it 
would deal with such a situation.

Regards,
Dave.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: r8169 tx problem (1s pause with ping)
  2007-06-20  8:57         ` David Gundersen
@ 2007-06-20 21:15           ` Francois Romieu
  0 siblings, 0 replies; 9+ messages in thread
From: Francois Romieu @ 2007-06-20 21:15 UTC (permalink / raw)
  To: David Gundersen; +Cc: netdev, Benjamin LaHaise

David Gundersen <gundy@iinet.net.au> :
[...]
> - Why are you checking dirty_tx against cur_rx (shouldn't it be cur_tx?)?

Usual "try something, send something else" bug.

> - Is there a possibility that the driver could be triggering the card to 
> send invalid packets with that code?
[snip]

I do not know but I doubt that it matters.

That being said, the patch below should make things better (though still
not perfect):


diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 8f3e0da..35054e8 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2514,7 +2514,7 @@ static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev)
 static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
-	unsigned int frags, entry = tp->cur_tx % NUM_TX_DESC;
+	unsigned int frags, cur_tx, entry = tp->cur_tx % NUM_TX_DESC;
 	struct TxDesc *txd = tp->TxDescArray + entry;
 	void __iomem *ioaddr = tp->mmio_addr;
 	dma_addr_t mapping;
@@ -2567,13 +2567,17 @@ static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	dev->trans_start = jiffies;
 
+	cur_tx = tp->cur_tx;
+
 	tp->cur_tx += frags + 1;
 
-	smp_wmb();
+	mmiowb();
 
-	RTL_W8(TxPoll, NPQ);	/* set polling bit */
+	smp_mb();
 
-	if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
+	if (cur_tx == tp->dirty_tx)
+		RTL_W8(TxPoll, NPQ);
+	else if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) {
 		netif_stop_queue(dev);
 		smp_rmb();
 		if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)
@@ -2677,10 +2681,18 @@ static void rtl8169_tx_interrupt(struct net_device *dev,
 
 	if (tp->dirty_tx != dirty_tx) {
 		tp->dirty_tx = dirty_tx;
-		smp_wmb();
-		if (netif_queue_stopped(dev) &&
-		    (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) {
-			netif_wake_queue(dev);
+		smp_mb();
+		if (unlikely(netif_queue_stopped(dev))) {
+			netif_tx_lock(dev);
+		    	if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)
+				netif_wake_queue(dev);
+			if (dirty_tx != tp->cur_tx)
+				RTL_W8(TxPoll, NPQ);
+			netif_tx_unlock(dev);
+		} else if (dirty_tx != tp->cur_tx) {
+			netif_tx_lock(dev);
+			RTL_W8(TxPoll, NPQ);
+			netif_tx_unlock(dev);
 		}
 	}
 }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-06-20 21:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-13  0:41 r8169 tx problem (1s pause with ping) Benjamin LaHaise
2007-06-13 21:18 ` Francois Romieu
2007-06-14 15:33   ` David Gundersen
2007-06-14 22:00     ` Francois Romieu
2007-06-19 23:06       ` Francois Romieu
2007-06-20  8:57         ` David Gundersen
2007-06-20 21:15           ` Francois Romieu
2007-06-18 15:14     ` Benjamin LaHaise
2007-06-19 10:45       ` David Gundersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).