* atl1c issues on 3.8.2
@ 2013-03-12 15:17 Michael Büsch
2013-03-12 15:45 ` Eric Dumazet
0 siblings, 1 reply; 9+ messages in thread
From: Michael Büsch @ 2013-03-12 15:17 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linux-netdev
[-- Attachment #1: Type: text/plain, Size: 3202 bytes --]
Hi,
Starting with 3.8.x scp stalls the atl1c based interface on my Asus Eeepc 1011px.
iperf (for example) does not do that. But after scp stalled the interface,
iperf transfers fail, too.
0mb@milhouse:~$ iperf -c 192.168.4.2 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.4.2, TCP port 5001
TCP window size: 96.0 KByte (default)
------------------------------------------------------------
[ 5] local 192.168.4.1 port 41558 connected with 192.168.4.2 port 5001
[ 4] local 192.168.4.1 port 5001 connected with 192.168.4.2 port 58296
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 111 MBytes 93.0 Mbits/sec
[ 4] 0.0-10.1 sec 105 MBytes 87.2 Mbits/sec
0mb@milhouse:~$ scp testfile mb.marge:
Enter passphrase for key '/home/mb/.ssh/key':
testfile 12% 6912KB 1.8MB/s - stalled -^testfile 12% 6912KB 1.6MB/s - stalled -1mb@milhouse:~$ ^C
130mb@milhouse:~$ iperf -c 192.168.4.2 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
connect failed: No route to host
dmesg is spammed with these messages:
> [51069.954315] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51069.954409] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [51155.933162] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Down
> [51157.441946] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51276.049211] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51276.049371] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51290.233447] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51290.233641] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51305.025257] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51305.025419] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51323.305245] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51323.305405] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51338.393216] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51338.393375] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51350.739196] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Down
> [51353.810485] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51376.817238] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51376.817399] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
> [51391.425209] atl1c 0000:01:00.0: irq 46 for MSI/MSI-X
> [51391.425371] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
This did not happen with earlier kernels. (But 3.7 has other issues as well. See my other mail)
Any ideas what's so special about scp?
--
Michael
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: atl1c issues on 3.8.2 2013-03-12 15:17 atl1c issues on 3.8.2 Michael Büsch @ 2013-03-12 15:45 ` Eric Dumazet [not found] ` <20130312180942.4198e88e@milhouse> 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2013-03-12 15:45 UTC (permalink / raw) To: Michael Büsch; +Cc: Eric Dumazet, linux-netdev On Tue, 2013-03-12 at 16:17 +0100, Michael Büsch wrote: > Hi, > > Starting with 3.8.x scp stalls the atl1c based interface on my Asus Eeepc 1011px. > iperf (for example) does not do that. But after scp stalled the interface, > iperf transfers fail, too. I am pretty sure David stable list contains the needed fix http://patchwork.ozlabs.org/bundle/davem/stable/?state=* Should be included in 3.8.3 Detail : http://patchwork.ozlabs.org/patch/221737/ ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20130312180942.4198e88e@milhouse>]
* Re: atl1c issues on 3.8.2 [not found] ` <20130312180942.4198e88e@milhouse> @ 2013-03-13 5:57 ` Eric Dumazet 2013-03-14 14:31 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2013-03-13 5:57 UTC (permalink / raw) To: Michael Büsch; +Cc: Eric Dumazet, linux-netdev, David S.Miller On Tue, 2013-03-12 at 18:09 +0100, Michael Büsch wrote: > On Tue, 12 Mar 2013 16:45:44 +0100 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > On Tue, 2013-03-12 at 16:17 +0100, Michael Büsch wrote: > > > Hi, > > > > > > Starting with 3.8.x scp stalls the atl1c based interface on my Asus Eeepc 1011px. > > > iperf (for example) does not do that. But after scp stalled the interface, > > > iperf transfers fail, too. > > > > I am pretty sure David stable list contains the needed fix > > > > http://patchwork.ozlabs.org/bundle/davem/stable/?state=* > > No this didn't fix it. > > However, I tried to revert 69b08f62e17439ee3d436faf0b9a7ca6fffb78db again, > which already caused trouble for me in 3.7 > and this fixed the issue. > > So it seems that this still is the same or a related issue that I reported > for 3.7. I just wrongly stated that the problem was fixed in 3.8, because my > simple ping test doesn't catch it on 3.8. > kmalloc(2000) never had the guarantee that the result would not span two 4K pages. Apparently the NIC doesn't allow a rx descriptor spanning two 4K pages or has a particular hardware bug that I can not possibly find myself. (I don't have atl1c nor any documentation) atl1c driver authors will need to find the bug and fix the driver. Drivers that deal with this kind of hardware limitation allocates page themselves and provide skbs with a fragment to upper stack, or use build_skb() once the frame is received. drivers/net/ethernet/intel/igb/igb_main.c is a an example. Could you try (on net-next tree) different values for the NETDEV_FRAG_PAGE_MAX_ORDER constant, as it might give to Atheros some hints ? (8192 & 16384) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 821c7f4..769fdac 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1844,7 +1844,7 @@ static inline void __skb_queue_purge(struct sk_buff_head *list) kfree_skb(skb); } -#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(32768) +#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(8192) #define NETDEV_FRAG_PAGE_MAX_SIZE (PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER) #define NETDEV_PAGECNT_MAX_BIAS NETDEV_FRAG_PAGE_MAX_SIZE ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: atl1c issues on 3.8.2 2013-03-13 5:57 ` Eric Dumazet @ 2013-03-14 14:31 ` Eric Dumazet 2013-03-14 22:17 ` Michael Büsch 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2013-03-14 14:31 UTC (permalink / raw) To: Michael Büsch, Pavel Emelyanov Cc: Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman On Wed, 2013-03-13 at 06:57 +0100, Eric Dumazet wrote: > On Tue, 2013-03-12 at 18:09 +0100, Michael Büsch wrote: > > On Tue, 12 Mar 2013 16:45:44 +0100 > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > > On Tue, 2013-03-12 at 16:17 +0100, Michael Büsch wrote: > > > > Hi, > > > > > > > > Starting with 3.8.x scp stalls the atl1c based interface on my Asus Eeepc 1011px. > > > > iperf (for example) does not do that. But after scp stalled the interface, > > > > iperf transfers fail, too. > > > > > > I am pretty sure David stable list contains the needed fix > > > > > > http://patchwork.ozlabs.org/bundle/davem/stable/?state=* > > > > No this didn't fix it. > > > > However, I tried to revert 69b08f62e17439ee3d436faf0b9a7ca6fffb78db again, > > which already caused trouble for me in 3.7 > > and this fixed the issue. > > > > So it seems that this still is the same or a related issue that I reported > > for 3.7. I just wrongly stated that the problem was fixed in 3.8, because my > > simple ping test doesn't catch it on 3.8. > > > > And it seems the possible fix is here : http://patchwork.ozlabs.org/patch/227666/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: atl1c issues on 3.8.2 2013-03-14 14:31 ` Eric Dumazet @ 2013-03-14 22:17 ` Michael Büsch 2013-03-14 23:06 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Michael Büsch @ 2013-03-14 22:17 UTC (permalink / raw) To: Eric Dumazet Cc: Pavel Emelyanov, Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman [-- Attachment #1: Type: text/plain, Size: 566 bytes --] On Thu, 14 Mar 2013 15:31:00 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > And it seems the possible fix is here : > > http://patchwork.ozlabs.org/patch/227666/ I can still reproduce with this fix applied. However, I noticed that I cannot reproduce, if the wireless interface (ath9k) of the netbook is down while testing the ethernet. The wireless does not carry any test traffic. It's just idle. I do not know if this always had been the case, because wireless was always up (and mostly idle) in my previous ethernet tests. -- Michael [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: atl1c issues on 3.8.2 2013-03-14 22:17 ` Michael Büsch @ 2013-03-14 23:06 ` Eric Dumazet 2013-03-15 19:44 ` Michael Büsch 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2013-03-14 23:06 UTC (permalink / raw) To: Michael Büsch Cc: Pavel Emelyanov, Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman On Thu, 2013-03-14 at 23:17 +0100, Michael Büsch wrote: > I can still reproduce with this fix applied. > > However, I noticed that I cannot reproduce, if the wireless interface (ath9k) of > the netbook is down while testing the ethernet. The wireless does not carry any > test traffic. It's just idle. > I do not know if this always had been the case, because wireless was always up (and mostly > idle) in my previous ethernet tests. > OK, then it must be kind of corruption issue in ath9k, or whatever ? You could try various DEBUGing stuff, like CONFIG_DEBUG_PAGEALLOC and CONFIG_SLUB_DEBUG_ON ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: atl1c issues on 3.8.2 2013-03-14 23:06 ` Eric Dumazet @ 2013-03-15 19:44 ` Michael Büsch 2013-03-22 11:28 ` Michael Büsch 0 siblings, 1 reply; 9+ messages in thread From: Michael Büsch @ 2013-03-15 19:44 UTC (permalink / raw) To: Eric Dumazet Cc: Pavel Emelyanov, Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman [-- Attachment #1: Type: text/plain, Size: 978 bytes --] On Fri, 15 Mar 2013 00:06:02 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > You could try various DEBUGing stuff, like CONFIG_DEBUG_PAGEALLOC and > CONFIG_SLUB_DEBUG_ON This bug is so weird, so I did some double-checking. Just to minimize the mistakes on my side. I compiled a kernel without the revert of the original commit and without the skb fix you suggested. It turns out that I am only able to reproduce the issue, if the ath9k interface is up while testing the atl1c ethernet. And I also double-checked that reverting the original commit fixes the issue. No stalls with up or down ath9k then. So that confirms my previous results. I tried to enable pagealloc debug and slub debug on a kernel with the suggested skb fix, but without the revert of the commit. Nothing special appeared in the logs. I'm currently building a kernel with almost all debugging options turned on. I will test that tomorrow. Thanks for your help. -- Michael [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: atl1c issues on 3.8.2 2013-03-15 19:44 ` Michael Büsch @ 2013-03-22 11:28 ` Michael Büsch 2013-05-27 16:43 ` Michael Büsch 0 siblings, 1 reply; 9+ messages in thread From: Michael Büsch @ 2013-03-22 11:28 UTC (permalink / raw) To: Michael Büsch Cc: Eric Dumazet, Pavel Emelyanov, Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman [-- Attachment #1: Type: text/plain, Size: 365 bytes --] On Fri, 15 Mar 2013 20:44:57 +0100 Michael Büsch <m@bues.ch> wrote: > I'm currently building a kernel with almost all debugging options > turned on. I will test that tomorrow. It took me a little bit longer than expected, but running the tests on a kernel with almost all debugging options enabled shows no additional kernel messages. :/ -- Michael [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: atl1c issues on 3.8.2 2013-03-22 11:28 ` Michael Büsch @ 2013-05-27 16:43 ` Michael Büsch 0 siblings, 0 replies; 9+ messages in thread From: Michael Büsch @ 2013-05-27 16:43 UTC (permalink / raw) To: Eric Dumazet Cc: Pavel Emelyanov, Eric Dumazet, linux-netdev, David S.Miller, Mel Gorman [-- Attachment #1: Type: text/plain, Size: 265 bytes --] Any news on this? Am I still the only one with this issue? It's still 100% reproducible and I can workaround it by reverting 69b08f62e17439ee3d436faf0b9a7ca6fffb78db It can't possibly be that I'm the only one on this planet seeing this... -- Michael [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-05-27 17:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-12 15:17 atl1c issues on 3.8.2 Michael Büsch 2013-03-12 15:45 ` Eric Dumazet [not found] ` <20130312180942.4198e88e@milhouse> 2013-03-13 5:57 ` Eric Dumazet 2013-03-14 14:31 ` Eric Dumazet 2013-03-14 22:17 ` Michael Büsch 2013-03-14 23:06 ` Eric Dumazet 2013-03-15 19:44 ` Michael Büsch 2013-03-22 11:28 ` Michael Büsch 2013-05-27 16:43 ` Michael Büsch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).