* r8169 tx problem (1s pause with ping) @ 2007-06-13 0:41 Benjamin LaHaise 2007-06-13 21:18 ` Francois Romieu 0 siblings, 1 reply; 9+ messages in thread From: Benjamin LaHaise @ 2007-06-13 0:41 UTC (permalink / raw) To: netdev Hello folks, I'm seeing something odd with r8169 on FC7: doing a ping -s 1600 alternates between a 1s latency and sub 1ms. Has anyone else seen anything like this? The system in question is an Asus M2A-VM with an onboard RTL8111 (I think). NAPI doesn't seem to make a difference. The kernel in question is currently a vanilla 2.6.21.5. Sub-mtu sized packets behave normally. 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) PING 1.2.3.4 (1.2.3.4) 1600(1628) bytes of data. 1608 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1000 ms 1608 bytes from 1.2.3.4: icmp_seq=2 ttl=64 time=0.816 ms 1608 bytes from 1.2.3.4: icmp_seq=3 ttl=64 time=1000 ms 1608 bytes from 1.2.3.4: icmp_seq=4 ttl=64 time=0.661 ms -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <zyntrop@kvack.org>. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: r8169 tx problem (1s pause with ping) 2007-06-13 0:41 r8169 tx problem (1s pause with ping) Benjamin LaHaise @ 2007-06-13 21:18 ` Francois Romieu 2007-06-14 15:33 ` David Gundersen 0 siblings, 1 reply; 9+ messages in thread From: Francois Romieu @ 2007-06-13 21:18 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: netdev Benjamin LaHaise <bcrl@kvack.org> : [...] > I'm seeing something odd with r8169 on FC7: doing a ping -s 1600 alternates > between a 1s latency and sub 1ms. Has anyone else seen anything like this? > The system in question is an Asus M2A-VM with an onboard RTL8111 (I think). > NAPI doesn't seem to make a difference. The kernel in question is currently > a vanilla 2.6.21.5. Sub-mtu sized packets behave normally. Same thing here for my 8168 rev 01 (asrock 945G dvi LOM) with 2.6.22-rc4 and 2.6.22-rc3 + r816x patchkit. Wonderful. -- Ueimor ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: r8169 tx problem (1s pause with ping) 2007-06-13 21:18 ` Francois Romieu @ 2007-06-14 15:33 ` David Gundersen 2007-06-14 22:00 ` Francois Romieu 2007-06-18 15:14 ` Benjamin LaHaise 0 siblings, 2 replies; 9+ messages in thread From: David Gundersen @ 2007-06-14 15:33 UTC (permalink / raw) To: netdev [-- Attachment #1: Type: text/plain, Size: 1717 bytes --] Francois Romieu wrote: > Benjamin LaHaise <bcrl@kvack.org> : > [...] >> I'm seeing something odd with r8169 on FC7: doing a ping -s 1600 alternates >> between a 1s latency and sub 1ms. Has anyone else seen anything like this? >> The system in question is an Asus M2A-VM with an onboard RTL8111 (I think). >> NAPI doesn't seem to make a difference. The kernel in question is currently >> a vanilla 2.6.21.5. Sub-mtu sized packets behave normally. > > Same thing here for my 8168 rev 01 (asrock 945G dvi LOM) with 2.6.22-rc4 > and 2.6.22-rc3 + r816x patchkit. > > Wonderful. > I've got a modified version of the latest realtek (r8168) driver running here that doesn't seem to exhibit those symptoms. The trouble I have is that I've been playing with multiple sections of the code and I'm not 100% sure what part(s) might impact that particular test. The bits I know I've messed with are the bits that set the First/Last fragment flags and the NPQ flagging section (as described in my previous emails). I know it's not particularly scientific of me to be changing multiple sections of the driver at once and I'm sorry about that but it's fairly late here and I really should get some sleep :). I might do some more thorough testing on the weekend to find out what the minimal changes required are to get things working. In the mean-time I'll attach my patch for the r8168-8.001.00 realtek driver here in case anybody else wants to have a play with it and see if it helps them out. Also, It might be a silly question, but have you tried taking packet captures from the perspective of the box with the realtek chipset & another box during the pinging and comparing the two? Regards, Dave. [-- Attachment #2: r8168-8.001.00-dg.patch.gz --] [-- Type: application/x-gzip, Size: 2237 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: r8169 tx problem (1s pause with ping) 2007-06-14 15:33 ` David Gundersen @ 2007-06-14 22:00 ` Francois Romieu 2007-06-19 23:06 ` Francois Romieu 2007-06-18 15:14 ` Benjamin LaHaise 1 sibling, 1 reply; 9+ messages in thread From: Francois Romieu @ 2007-06-14 22:00 UTC (permalink / raw) To: David Gundersen; +Cc: netdev, Benjamin LaHaise David Gundersen <gundy@iinet.net.au> : [...] > I might do some more thorough testing on the weekend to find out what > the minimal changes required are to get things working. In your patch, the first chunk of data (which is handed back to the nic outside of rtl8169_xmit_frags) will not have is First fragment bit set when nr_frags != 0. It makes me a bit nervous. The NPQ part of your patch actually forces the start_xmit handler to wait for all previously queued packets to be sent before telling the nic that a packet may be available. I can understand that it will fix the symptoms but it will implicitly shorten the TX queue a lot. A pesky fever apart, I should be able to come with a patch before the week end. -- Ueimor ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: r8169 tx problem (1s pause with ping) 2007-06-14 22:00 ` Francois Romieu @ 2007-06-19 23:06 ` Francois Romieu 2007-06-20 8:57 ` David Gundersen 0 siblings, 1 reply; 9+ messages in thread From: Francois Romieu @ 2007-06-19 23:06 UTC (permalink / raw) To: David Gundersen; +Cc: netdev, Benjamin LaHaise Francois Romieu <romieu@fr.zoreil.com> : [...] > A pesky fever apart, I should be able to come with a patch before > the week end. (oops, late again) You can try the hack below as a temporary fix. It makes things way better on top of 2.6.22-rc5 + current r8169 patchkit at http://www.fr.zoreil.com/people/francois/misc/20070619-2.6.22-rc5-r8169-test.patch It applies against 2.6.22-rc5 too. I should come with something else soon. diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index 8f3e0da..8c0851f 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -2682,6 +2688,8 @@ static void rtl8169_tx_interrupt(struct net_device *dev, (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) { netif_wake_queue(dev); } + if (tp->dirty_tx != tp->cur_rx) + RTL_W8(TxPoll, NPQ); } } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: r8169 tx problem (1s pause with ping) 2007-06-19 23:06 ` Francois Romieu @ 2007-06-20 8:57 ` David Gundersen 2007-06-20 21:15 ` Francois Romieu 0 siblings, 1 reply; 9+ messages in thread From: David Gundersen @ 2007-06-20 8:57 UTC (permalink / raw) To: netdev; +Cc: Francois Romieu, Benjamin LaHaise > diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c > index 8f3e0da..8c0851f 100644 > --- a/drivers/net/r8169.c > +++ b/drivers/net/r8169.c > @@ -2682,6 +2688,8 @@ static void rtl8169_tx_interrupt(struct net_device *dev, > (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) { > netif_wake_queue(dev); > } > + if (tp->dirty_tx != tp->cur_rx) > + RTL_W8(TxPoll, NPQ); > } > } > Hi Francois, A few things: - Why are you checking dirty_tx against cur_rx (shouldn't it be cur_tx?)? - Is there a possibility that the driver could be triggering the card to send invalid packets with that code? I'm thinking in _start_xmit, the cur_tx pointer (assuming that's what you meant to include above) gets incremented when the packet is sent to the card (the RTL_W8(TxPoll,NPQ)) to indicate that the card _should_ be able to process packets up to that point in the queue. The interrupt routine comes along later to clean up the buffers between dirty_tx (the last packet that the driver knows the card has sent) and cur_tx (the point that the card could potentially be up to). What seems could happen with the code above is that the card gets told that a packet is ready to be sent when really it's not. I'm not sure how it would deal with such a situation. Regards, Dave. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: r8169 tx problem (1s pause with ping) 2007-06-20 8:57 ` David Gundersen @ 2007-06-20 21:15 ` Francois Romieu 0 siblings, 0 replies; 9+ messages in thread From: Francois Romieu @ 2007-06-20 21:15 UTC (permalink / raw) To: David Gundersen; +Cc: netdev, Benjamin LaHaise David Gundersen <gundy@iinet.net.au> : [...] > - Why are you checking dirty_tx against cur_rx (shouldn't it be cur_tx?)? Usual "try something, send something else" bug. > - Is there a possibility that the driver could be triggering the card to > send invalid packets with that code? [snip] I do not know but I doubt that it matters. That being said, the patch below should make things better (though still not perfect): diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index 8f3e0da..35054e8 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -2514,7 +2514,7 @@ static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev) static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); - unsigned int frags, entry = tp->cur_tx % NUM_TX_DESC; + unsigned int frags, cur_tx, entry = tp->cur_tx % NUM_TX_DESC; struct TxDesc *txd = tp->TxDescArray + entry; void __iomem *ioaddr = tp->mmio_addr; dma_addr_t mapping; @@ -2567,13 +2567,17 @@ static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev) dev->trans_start = jiffies; + cur_tx = tp->cur_tx; + tp->cur_tx += frags + 1; - smp_wmb(); + mmiowb(); - RTL_W8(TxPoll, NPQ); /* set polling bit */ + smp_mb(); - if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) { + if (cur_tx == tp->dirty_tx) + RTL_W8(TxPoll, NPQ); + else if (TX_BUFFS_AVAIL(tp) < MAX_SKB_FRAGS) { netif_stop_queue(dev); smp_rmb(); if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS) @@ -2677,10 +2681,18 @@ static void rtl8169_tx_interrupt(struct net_device *dev, if (tp->dirty_tx != dirty_tx) { tp->dirty_tx = dirty_tx; - smp_wmb(); - if (netif_queue_stopped(dev) && - (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS)) { - netif_wake_queue(dev); + smp_mb(); + if (unlikely(netif_queue_stopped(dev))) { + netif_tx_lock(dev); + if (TX_BUFFS_AVAIL(tp) >= MAX_SKB_FRAGS) + netif_wake_queue(dev); + if (dirty_tx != tp->cur_tx) + RTL_W8(TxPoll, NPQ); + netif_tx_unlock(dev); + } else if (dirty_tx != tp->cur_tx) { + netif_tx_lock(dev); + RTL_W8(TxPoll, NPQ); + netif_tx_unlock(dev); } } } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: r8169 tx problem (1s pause with ping) 2007-06-14 15:33 ` David Gundersen 2007-06-14 22:00 ` Francois Romieu @ 2007-06-18 15:14 ` Benjamin LaHaise 2007-06-19 10:45 ` David Gundersen 1 sibling, 1 reply; 9+ messages in thread From: Benjamin LaHaise @ 2007-06-18 15:14 UTC (permalink / raw) To: David Gundersen; +Cc: netdev On Fri, Jun 15, 2007 at 01:33:14AM +1000, David Gundersen wrote: > In the mean-time I'll attach my patch for the r8168-8.001.00 realtek > driver here in case anybody else wants to have a play with it and see if > it helps them out. Out of curiousity, does it work if you just do a single read (ie RTL_R8(TxPoll);) of the register before writing to it? That would clear things up if it is a PCI posting problem. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <zyntrop@kvack.org>. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: r8169 tx problem (1s pause with ping) 2007-06-18 15:14 ` Benjamin LaHaise @ 2007-06-19 10:45 ` David Gundersen 0 siblings, 0 replies; 9+ messages in thread From: David Gundersen @ 2007-06-19 10:45 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: netdev > Out of curiousity, does it work if you just do a single read (ie > RTL_R8(TxPoll);) of the register before writing to it? That would clear > things up if it is a PCI posting problem. Hi Ben, I tried your suggestion but it didn't seem to make any difference :( I tried the following combinations: - realtek original [broken] - realtek original with the RTL_R8(TxPoll) before RTL_W8(TxPoll, NPQ); [broken] - my patched version without the ndelay loop but including the RTL_R8(TxPoll) (to see if my messing with the frag logic was having any impact) [broken] - my patched version including the ndelay loop [full speed transfers] Also, I'm not sure if I made it clear in my first post, but I'm testing these changes on a 8168B (it's built in to my GA-945G-S3 motherboard). I'm not sure if we can assume that the same change applied to the 8169 driver would have the same effect on the 8169 too? (is the 8168 just a PCI express version of the [pci] 8169?) Dave. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-06-20 21:17 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-06-13 0:41 r8169 tx problem (1s pause with ping) Benjamin LaHaise 2007-06-13 21:18 ` Francois Romieu 2007-06-14 15:33 ` David Gundersen 2007-06-14 22:00 ` Francois Romieu 2007-06-19 23:06 ` Francois Romieu 2007-06-20 8:57 ` David Gundersen 2007-06-20 21:15 ` Francois Romieu 2007-06-18 15:14 ` Benjamin LaHaise 2007-06-19 10:45 ` David Gundersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).