* 2.6.29 & network stack strangeness @ 2009-06-05 15:15 Matthew Lear 2009-06-05 15:49 ` Finn Thain 0 siblings, 1 reply; 8+ messages in thread From: Matthew Lear @ 2009-06-05 15:15 UTC (permalink / raw) To: linux-m68k Hello all, I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform and I'm having some throughput problems when running network tests. The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains a lease from the dhcp server and configures eth0. Network connectivity is ok. I can ping the target from the host and vice versa. 1/ If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after several mins, the kernel reports 'unexpected interrupt from 24' which is the vector for a spurious interrupt. This message will repeat randomly (from what I saw it appeared ~ 20 times when running the ping test above for 40 mins). The mcf54455 reference manual describes a possible cause for spurious interrupts. However, this test very rarely reports any packet loss, although the max time to receive a packet can be very large indeed. 2/ If I reboot, start again and run a ping flood test (ping -f) from host pc -> target, all icmp requests are acknowledged - for a while. Before the target begins to fail to respond to the icmp requests, running top shows that the ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in processing the deferred tasks of processing data fired up to the network stack. So when the target beings to stop responding to icmp, if I then stop the ping flood and try to ping the host from the target, there is no reply indicated by ping. However, if you do this with a packet sniffer running (eg wireshark) you can see that data is still being transmitted from the target -> host and you can see the icmp reply, only the reply from the host appears to be received ok by the fec driver but is processed by the network stack target. When in this state, a proc entry that I added to the fec driver shows that the last return value from netif_rx() (called in the fec rx interrupt handling routine) is 1, indicating that the last packet was dropped by the network stack, e.g. ~ # cat /proc/driver/fec total interrupts: 1421619 last interrupt type: 2 [1=tx, 2=rx, 3=mii] total tx interrupts: 709148 total rx interrupts: 712472 total mii interrupts: 1 last interrupt event: 0x2000000 total eberr interrupts: 0 total hberr interrupts: 0 tx loop current count: 0 tx loop last count: 1 rx loop current count: 0 rx loop last count: 1 rx last cbd ctrl/status: 0x800 rx last cbd len: 346 rx last cbd buff addr: 0x40410000 rx last netif_rx status: 1 Strangely, wireshark still shows data being transmitted from the target -> host. I can see ARP requests and I can also see DHCP discovery packets being sent by the target when its DHCP lease expires. This all looks ok, only the reply from host -> target is never processed by the target as the network stack is in a state where it is dropping all incoming data provided to it by the driver. I believe udhcpc utilises the network device directly, ie it does not require an intermediate network protocol being implemented in the kernel (tcpdump is similar). The fec driver still seems to be running ok because I can see the ring buffer address changing when data is received. Everything seems to be ok apart from the network stack. Very strange indeed. Running network throughput tests between host and target with netcat or netperf only run for a few seconds before activity ceases. Has anybody experienced anything similar? Why does the network stack appear to be stuck and constantly dropping packets? Any feedback appreciated. Rgds, -- Matt ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness 2009-06-05 15:15 2.6.29 & network stack strangeness Matthew Lear @ 2009-06-05 15:49 ` Finn Thain 2009-06-05 16:17 ` Matthew Lear 0 siblings, 1 reply; 8+ messages in thread From: Finn Thain @ 2009-06-05 15:49 UTC (permalink / raw) To: Matthew Lear; +Cc: linux-m68k Does the problem manifest only when the DHCP lease expires? Can you reproduce the problem with a static IP? Finn On Fri, 5 Jun 2009, Matthew Lear wrote: > Hello all, > > I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform > and I'm having some throughput problems when running network tests. > > The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains > a lease from the dhcp server and configures eth0. Network connectivity is ok. I > can ping the target from the host and vice versa. > > 1/ > If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after > several mins, the kernel reports 'unexpected interrupt from 24' which is the > vector for a spurious interrupt. This message will repeat randomly (from what I > saw it appeared ~ 20 times when running the ping test above for 40 mins). The > mcf54455 reference manual describes a possible cause for spurious interrupts. > However, this test very rarely reports any packet loss, although the max time to > receive a packet can be very large indeed. > > 2/ > If I reboot, start again and run a ping flood test (ping -f) from host pc -> > target, all icmp requests are acknowledged - for a while. Before the target > begins to fail to respond to the icmp requests, running top shows that the > ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in > processing the deferred tasks of processing data fired up to the network stack. > So when the target beings to stop responding to icmp, if I then stop the ping > flood and try to ping the host from the target, there is no reply indicated by > ping. However, if you do this with a packet sniffer running (eg wireshark) you > can see that data is still being transmitted from the target -> host and you can > see the icmp reply, only the reply from the host appears to be received ok by > the fec driver but is processed by the network stack target. > > When in this state, a proc entry that I added to the fec driver shows that the > last return value from netif_rx() (called in the fec rx interrupt handling > routine) is 1, indicating that the last packet was dropped by the network stack, > e.g. > > ~ # cat /proc/driver/fec > total interrupts: 1421619 > last interrupt type: 2 [1=tx, 2=rx, 3=mii] > total tx interrupts: 709148 > total rx interrupts: 712472 > total mii interrupts: 1 > last interrupt event: 0x2000000 > total eberr interrupts: 0 > total hberr interrupts: 0 > tx loop current count: 0 > tx loop last count: 1 > rx loop current count: 0 > rx loop last count: 1 > rx last cbd ctrl/status: 0x800 > rx last cbd len: 346 > rx last cbd buff addr: 0x40410000 > rx last netif_rx status: 1 > > Strangely, wireshark still shows data being transmitted from the target > -> host. I can see ARP requests and I can also see DHCP discovery packets being > sent by the target when its DHCP lease expires. This all looks ok, only the > reply from host -> target is never processed by the target as the network stack > is in a state where it is dropping all incoming data provided to it by the driver. > > I believe udhcpc utilises the network device directly, ie it does not require an > intermediate network protocol being implemented in the kernel (tcpdump is > similar). > > The fec driver still seems to be running ok because I can see the ring buffer > address changing when data is received. Everything seems to be ok apart from the > network stack. Very strange indeed. > > Running network throughput tests between host and target with netcat or netperf > only run for a few seconds before activity ceases. > > Has anybody experienced anything similar? Why does the network stack appear to > be stuck and constantly dropping packets? > > Any feedback appreciated. > > Rgds, > -- Matt > -- > To unsubscribe from this list: send the line "unsubscribe linux-m68k" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness 2009-06-05 15:49 ` Finn Thain @ 2009-06-05 16:17 ` Matthew Lear 2009-06-05 16:37 ` Finn Thain 0 siblings, 1 reply; 8+ messages in thread From: Matthew Lear @ 2009-06-05 16:17 UTC (permalink / raw) To: Finn Thain; +Cc: linux-m68k Hi - thanks for your reply. The problem doesn't manifest only when the DHCP lease expires and I can still reproduce the problem with a static IP. With or without DHCP makes no difference. It seems to effect socket comms quite seriously (and quickly). If I run a simple server program on the host that listens on a socket and writes a response string to the socket when it receives data, and on the target I run a simple client program which writes a string to the socket, reads and prints the response sent the server, I only have to send data from client to server with a delay of 1ms between transmissions for a few seconds and the client program hangs on calling read() on the socket fd. If I run a simple netcat test, eg on target: nc -l -p 3333 > /dev/null on host: dd if=/dev/zero | nc <target-ip> 3333 ...strangely, once activity on the ethernet link as a result of the netcat test ceases, running netstat -a on the target hangs for several seconds, eg: ~ # nc -l -p 3333 > /dev/null & ~ # netstat -a Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 *:login *:* LISTEN tcp 0 0 *:shell *:* LISTEN tcp 0 0 *:sunrpc *:* LISTEN tcp 0 0 *:finger *:* LISTEN tcp 0 0 *:auth *:* LISTEN tcp 0 0 *:ftp *:* LISTEN tcp 0 0 *:telnet *:* LISTEN <system hangs for several seconds here> tcp 0 0 192.168.0.11:3333 gateway0:45645 ESTABLISHED udp 0 0 *:ntalk *:* udp 0 0 *:sunrpc *:* Active UNIX domain sockets (servers and established) Proto RefCnt Flags Type State I-Node Path unix 4 [ ] DGRAM 111 /dev/log unix 3 [ ] STREAM CONNECTED 123 unix 3 [ ] STREAM CONNECTED 122 unix 2 [ ] DGRAM 120 unix 2 [ ] DGRAM 114 ~ # I thought this was interesting. Also, after this, I have trouble entering characters over the serial port / console. It seems like interrupts may having trouble getting serviced but this may be a side-effect... If you run the same netstat command with strace, you can see that the delay is caused by polling the socket following calling send: ... ... gettimeofday({366, 470000}, NULL) = 0 poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1 send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43, 0x4000) = 43 poll( <delay is here> [{fd=4, events=POLLIN}], 1, 5000) = 0 ... ... -- Matt Finn Thain wrote: > Does the problem manifest only when the DHCP lease expires? > Can you reproduce the problem with a static IP? > > Finn > > > On Fri, 5 Jun 2009, Matthew Lear wrote: > >> Hello all, >> >> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform >> and I'm having some throughput problems when running network tests. >> >> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains >> a lease from the dhcp server and configures eth0. Network connectivity is ok. I >> can ping the target from the host and vice versa. >> >> 1/ >> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after >> several mins, the kernel reports 'unexpected interrupt from 24' which is the >> vector for a spurious interrupt. This message will repeat randomly (from what I >> saw it appeared ~ 20 times when running the ping test above for 40 mins). The >> mcf54455 reference manual describes a possible cause for spurious interrupts. >> However, this test very rarely reports any packet loss, although the max time to >> receive a packet can be very large indeed. >> >> 2/ >> If I reboot, start again and run a ping flood test (ping -f) from host pc -> >> target, all icmp requests are acknowledged - for a while. Before the target >> begins to fail to respond to the icmp requests, running top shows that the >> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in >> processing the deferred tasks of processing data fired up to the network stack. >> So when the target beings to stop responding to icmp, if I then stop the ping >> flood and try to ping the host from the target, there is no reply indicated by >> ping. However, if you do this with a packet sniffer running (eg wireshark) you >> can see that data is still being transmitted from the target -> host and you can >> see the icmp reply, only the reply from the host appears to be received ok by >> the fec driver but is processed by the network stack target. >> >> When in this state, a proc entry that I added to the fec driver shows that the >> last return value from netif_rx() (called in the fec rx interrupt handling >> routine) is 1, indicating that the last packet was dropped by the network stack, >> e.g. >> >> ~ # cat /proc/driver/fec >> total interrupts: 1421619 >> last interrupt type: 2 [1=tx, 2=rx, 3=mii] >> total tx interrupts: 709148 >> total rx interrupts: 712472 >> total mii interrupts: 1 >> last interrupt event: 0x2000000 >> total eberr interrupts: 0 >> total hberr interrupts: 0 >> tx loop current count: 0 >> tx loop last count: 1 >> rx loop current count: 0 >> rx loop last count: 1 >> rx last cbd ctrl/status: 0x800 >> rx last cbd len: 346 >> rx last cbd buff addr: 0x40410000 >> rx last netif_rx status: 1 >> >> Strangely, wireshark still shows data being transmitted from the target >> -> host. I can see ARP requests and I can also see DHCP discovery packets being >> sent by the target when its DHCP lease expires. This all looks ok, only the >> reply from host -> target is never processed by the target as the network stack >> is in a state where it is dropping all incoming data provided to it by the driver. >> >> I believe udhcpc utilises the network device directly, ie it does not require an >> intermediate network protocol being implemented in the kernel (tcpdump is >> similar). >> >> The fec driver still seems to be running ok because I can see the ring buffer >> address changing when data is received. Everything seems to be ok apart from the >> network stack. Very strange indeed. >> >> Running network throughput tests between host and target with netcat or netperf >> only run for a few seconds before activity ceases. >> >> Has anybody experienced anything similar? Why does the network stack appear to >> be stuck and constantly dropping packets? >> >> Any feedback appreciated. >> >> Rgds, >> -- Matt >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness 2009-06-05 16:17 ` Matthew Lear @ 2009-06-05 16:37 ` Finn Thain 2009-06-05 16:44 ` Matthew Lear 0 siblings, 1 reply; 8+ messages in thread From: Finn Thain @ 2009-06-05 16:37 UTC (permalink / raw) To: Matthew Lear; +Cc: linux-m68k My only guess would be that the network stack delayed work queues depend upon working timer interrupts... But since I have no knowledge of your hardware, I don't think I'll be a lot of help with that. Finn On Fri, 5 Jun 2009, Matthew Lear wrote: > Hi - thanks for your reply. > > The problem doesn't manifest only when the DHCP lease expires and I can still > reproduce the problem with a static IP. With or without DHCP makes no difference. > > It seems to effect socket comms quite seriously (and quickly). If I run a simple > server program on the host that listens on a socket and writes a response string > to the socket when it receives data, and on the target I run a simple client > program which writes a string to the socket, reads and prints the response sent > the server, I only have to send data from client to server with a delay of 1ms > between transmissions for a few seconds and the client program hangs on calling > read() on the socket fd. > > If I run a simple netcat test, eg > > on target: nc -l -p 3333 > /dev/null > on host: dd if=/dev/zero | nc <target-ip> 3333 > > ...strangely, once activity on the ethernet link as a result of the netcat test > ceases, running netstat -a on the target hangs for several seconds, eg: > > > ~ # nc -l -p 3333 > /dev/null & > ~ # netstat -a > Active Internet connections (servers and established) > Proto Recv-Q Send-Q Local Address Foreign Address State > tcp 0 0 *:login *:* LISTEN > tcp 0 0 *:shell *:* LISTEN > tcp 0 0 *:sunrpc *:* LISTEN > tcp 0 0 *:finger *:* LISTEN > tcp 0 0 *:auth *:* LISTEN > tcp 0 0 *:ftp *:* LISTEN > tcp 0 0 *:telnet *:* LISTEN > > <system hangs for several seconds here> > > tcp 0 0 192.168.0.11:3333 gateway0:45645 > ESTABLISHED > udp 0 0 *:ntalk *:* > udp 0 0 *:sunrpc *:* > Active UNIX domain sockets (servers and established) > Proto RefCnt Flags Type State I-Node Path > unix 4 [ ] DGRAM 111 /dev/log > unix 3 [ ] STREAM CONNECTED 123 > unix 3 [ ] STREAM CONNECTED 122 > unix 2 [ ] DGRAM 120 > unix 2 [ ] DGRAM 114 > ~ # > > I thought this was interesting. Also, after this, I have trouble entering > characters over the serial port / console. It seems like interrupts may having > trouble getting serviced but this may be a side-effect... > > If you run the same netstat command with strace, you can see that the delay is > caused by polling the socket following calling send: > > ... > ... > gettimeofday({366, 470000}, NULL) = 0 > poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1 > send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43, > 0x4000) = 43 > poll( > > > <delay is here> > > > [{fd=4, events=POLLIN}], 1, 5000) = 0 > ... > ... > > -- Matt > > > Finn Thain wrote: > > Does the problem manifest only when the DHCP lease expires? > > Can you reproduce the problem with a static IP? > > > > Finn > > > > > > On Fri, 5 Jun 2009, Matthew Lear wrote: > > > >> Hello all, > >> > >> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform > >> and I'm having some throughput problems when running network tests. > >> > >> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains > >> a lease from the dhcp server and configures eth0. Network connectivity is ok. I > >> can ping the target from the host and vice versa. > >> > >> 1/ > >> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after > >> several mins, the kernel reports 'unexpected interrupt from 24' which is the > >> vector for a spurious interrupt. This message will repeat randomly (from what I > >> saw it appeared ~ 20 times when running the ping test above for 40 mins). The > >> mcf54455 reference manual describes a possible cause for spurious interrupts. > >> However, this test very rarely reports any packet loss, although the max time to > >> receive a packet can be very large indeed. > >> > >> 2/ > >> If I reboot, start again and run a ping flood test (ping -f) from host pc -> > >> target, all icmp requests are acknowledged - for a while. Before the target > >> begins to fail to respond to the icmp requests, running top shows that the > >> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in > >> processing the deferred tasks of processing data fired up to the network stack. > >> So when the target beings to stop responding to icmp, if I then stop the ping > >> flood and try to ping the host from the target, there is no reply indicated by > >> ping. However, if you do this with a packet sniffer running (eg wireshark) you > >> can see that data is still being transmitted from the target -> host and you can > >> see the icmp reply, only the reply from the host appears to be received ok by > >> the fec driver but is processed by the network stack target. > >> > >> When in this state, a proc entry that I added to the fec driver shows that the > >> last return value from netif_rx() (called in the fec rx interrupt handling > >> routine) is 1, indicating that the last packet was dropped by the network stack, > >> e.g. > >> > >> ~ # cat /proc/driver/fec > >> total interrupts: 1421619 > >> last interrupt type: 2 [1=tx, 2=rx, 3=mii] > >> total tx interrupts: 709148 > >> total rx interrupts: 712472 > >> total mii interrupts: 1 > >> last interrupt event: 0x2000000 > >> total eberr interrupts: 0 > >> total hberr interrupts: 0 > >> tx loop current count: 0 > >> tx loop last count: 1 > >> rx loop current count: 0 > >> rx loop last count: 1 > >> rx last cbd ctrl/status: 0x800 > >> rx last cbd len: 346 > >> rx last cbd buff addr: 0x40410000 > >> rx last netif_rx status: 1 > >> > >> Strangely, wireshark still shows data being transmitted from the target > >> -> host. I can see ARP requests and I can also see DHCP discovery packets being > >> sent by the target when its DHCP lease expires. This all looks ok, only the > >> reply from host -> target is never processed by the target as the network stack > >> is in a state where it is dropping all incoming data provided to it by the driver. > >> > >> I believe udhcpc utilises the network device directly, ie it does not require an > >> intermediate network protocol being implemented in the kernel (tcpdump is > >> similar). > >> > >> The fec driver still seems to be running ok because I can see the ring buffer > >> address changing when data is received. Everything seems to be ok apart from the > >> network stack. Very strange indeed. > >> > >> Running network throughput tests between host and target with netcat or netperf > >> only run for a few seconds before activity ceases. > >> > >> Has anybody experienced anything similar? Why does the network stack appear to > >> be stuck and constantly dropping packets? > >> > >> Any feedback appreciated. > >> > >> Rgds, > >> -- Matt > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-m68k" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.29 & network stack strangeness 2009-06-05 16:37 ` Finn Thain @ 2009-06-05 16:44 ` Matthew Lear [not found] ` <4A2DC70F.7080401@freescale.com> 0 siblings, 1 reply; 8+ messages in thread From: Matthew Lear @ 2009-06-05 16:44 UTC (permalink / raw) To: Finn Thain; +Cc: linux-m68k Yes. I was suspecting that all may not be well in that area... Current set up is a 10ms tick with CONFIG_HZ set to 100. Further investigation is required I think. -- Matt Finn Thain wrote: > My only guess would be that the network stack delayed work queues depend > upon working timer interrupts... > > But since I have no knowledge of your hardware, I don't think I'll be a > lot of help with that. > > Finn > > > On Fri, 5 Jun 2009, Matthew Lear wrote: > >> Hi - thanks for your reply. >> >> The problem doesn't manifest only when the DHCP lease expires and I can still >> reproduce the problem with a static IP. With or without DHCP makes no difference. >> >> It seems to effect socket comms quite seriously (and quickly). If I run a simple >> server program on the host that listens on a socket and writes a response string >> to the socket when it receives data, and on the target I run a simple client >> program which writes a string to the socket, reads and prints the response sent >> the server, I only have to send data from client to server with a delay of 1ms >> between transmissions for a few seconds and the client program hangs on calling >> read() on the socket fd. >> >> If I run a simple netcat test, eg >> >> on target: nc -l -p 3333 > /dev/null >> on host: dd if=/dev/zero | nc <target-ip> 3333 >> >> ...strangely, once activity on the ethernet link as a result of the netcat test >> ceases, running netstat -a on the target hangs for several seconds, eg: >> >> >> ~ # nc -l -p 3333 > /dev/null & >> ~ # netstat -a >> Active Internet connections (servers and established) >> Proto Recv-Q Send-Q Local Address Foreign Address State >> tcp 0 0 *:login *:* LISTEN >> tcp 0 0 *:shell *:* LISTEN >> tcp 0 0 *:sunrpc *:* LISTEN >> tcp 0 0 *:finger *:* LISTEN >> tcp 0 0 *:auth *:* LISTEN >> tcp 0 0 *:ftp *:* LISTEN >> tcp 0 0 *:telnet *:* LISTEN >> >> <system hangs for several seconds here> >> >> tcp 0 0 192.168.0.11:3333 gateway0:45645 >> ESTABLISHED >> udp 0 0 *:ntalk *:* >> udp 0 0 *:sunrpc *:* >> Active UNIX domain sockets (servers and established) >> Proto RefCnt Flags Type State I-Node Path >> unix 4 [ ] DGRAM 111 /dev/log >> unix 3 [ ] STREAM CONNECTED 123 >> unix 3 [ ] STREAM CONNECTED 122 >> unix 2 [ ] DGRAM 120 >> unix 2 [ ] DGRAM 114 >> ~ # >> >> I thought this was interesting. Also, after this, I have trouble entering >> characters over the serial port / console. It seems like interrupts may having >> trouble getting serviced but this may be a side-effect... >> >> If you run the same netstat command with strace, you can see that the delay is >> caused by polling the socket following calling send: >> >> ... >> ... >> gettimeofday({366, 470000}, NULL) = 0 >> poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1 >> send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43, >> 0x4000) = 43 >> poll( >> >> >> <delay is here> >> >> >> [{fd=4, events=POLLIN}], 1, 5000) = 0 >> ... >> ... >> >> -- Matt >> >> >> Finn Thain wrote: >>> Does the problem manifest only when the DHCP lease expires? >>> Can you reproduce the problem with a static IP? >>> >>> Finn >>> >>> >>> On Fri, 5 Jun 2009, Matthew Lear wrote: >>> >>>> Hello all, >>>> >>>> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform >>>> and I'm having some throughput problems when running network tests. >>>> >>>> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains >>>> a lease from the dhcp server and configures eth0. Network connectivity is ok. I >>>> can ping the target from the host and vice versa. >>>> >>>> 1/ >>>> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after >>>> several mins, the kernel reports 'unexpected interrupt from 24' which is the >>>> vector for a spurious interrupt. This message will repeat randomly (from what I >>>> saw it appeared ~ 20 times when running the ping test above for 40 mins). The >>>> mcf54455 reference manual describes a possible cause for spurious interrupts. >>>> However, this test very rarely reports any packet loss, although the max time to >>>> receive a packet can be very large indeed. >>>> >>>> 2/ >>>> If I reboot, start again and run a ping flood test (ping -f) from host pc -> >>>> target, all icmp requests are acknowledged - for a while. Before the target >>>> begins to fail to respond to the icmp requests, running top shows that the >>>> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in >>>> processing the deferred tasks of processing data fired up to the network stack. >>>> So when the target beings to stop responding to icmp, if I then stop the ping >>>> flood and try to ping the host from the target, there is no reply indicated by >>>> ping. However, if you do this with a packet sniffer running (eg wireshark) you >>>> can see that data is still being transmitted from the target -> host and you can >>>> see the icmp reply, only the reply from the host appears to be received ok by >>>> the fec driver but is processed by the network stack target. >>>> >>>> When in this state, a proc entry that I added to the fec driver shows that the >>>> last return value from netif_rx() (called in the fec rx interrupt handling >>>> routine) is 1, indicating that the last packet was dropped by the network stack, >>>> e.g. >>>> >>>> ~ # cat /proc/driver/fec >>>> total interrupts: 1421619 >>>> last interrupt type: 2 [1=tx, 2=rx, 3=mii] >>>> total tx interrupts: 709148 >>>> total rx interrupts: 712472 >>>> total mii interrupts: 1 >>>> last interrupt event: 0x2000000 >>>> total eberr interrupts: 0 >>>> total hberr interrupts: 0 >>>> tx loop current count: 0 >>>> tx loop last count: 1 >>>> rx loop current count: 0 >>>> rx loop last count: 1 >>>> rx last cbd ctrl/status: 0x800 >>>> rx last cbd len: 346 >>>> rx last cbd buff addr: 0x40410000 >>>> rx last netif_rx status: 1 >>>> >>>> Strangely, wireshark still shows data being transmitted from the target >>>> -> host. I can see ARP requests and I can also see DHCP discovery packets being >>>> sent by the target when its DHCP lease expires. This all looks ok, only the >>>> reply from host -> target is never processed by the target as the network stack >>>> is in a state where it is dropping all incoming data provided to it by the driver. >>>> >>>> I believe udhcpc utilises the network device directly, ie it does not require an >>>> intermediate network protocol being implemented in the kernel (tcpdump is >>>> similar). >>>> >>>> The fec driver still seems to be running ok because I can see the ring buffer >>>> address changing when data is received. Everything seems to be ok apart from the >>>> network stack. Very strange indeed. >>>> >>>> Running network throughput tests between host and target with netcat or netperf >>>> only run for a few seconds before activity ceases. >>>> >>>> Has anybody experienced anything similar? Why does the network stack appear to >>>> be stuck and constantly dropping packets? >>>> >>>> Any feedback appreciated. >>>> >>>> Rgds, >>>> -- Matt >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <4A2DC70F.7080401@freescale.com>]
* Re: 2.6.29 & network stack strangeness [not found] ` <4A2DC70F.7080401@freescale.com> @ 2009-06-09 10:04 ` Matthew Lear [not found] ` <4A2F1C74.1010800@freescale.com> 0 siblings, 1 reply; 8+ messages in thread From: Matthew Lear @ 2009-06-09 10:04 UTC (permalink / raw) To: Lanttor; +Cc: linux-m68k > Could you run "top" command and print out its information? > "cat /proc/stat" && "cat /proc/interrupts" information are also > appreciated. Sure. Info below. Before running netcat test: top - 00:02:37 up 2 min, 1 user, load average: 0.07, 0.07, 0.02 Tasks: 24 total, 1 running, 23 sleeping, 0 stopped, 0 zombie Cpu(s): 0.7%us, 0.0%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.7%hi, 0.0%si, 0.0%st Mem: 254912k total, 6680k used, 248232k free, 0k buffers Swap: 0k total, 0k used, 0k free, 3008k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 359 root 20 0 2856 1160 944 R 1.0 0.5 0:01.49 top 3 root 15 -5 0 0 0 S 0.3 0.0 0:00.11 ksoftirqd/0 1 root 20 0 3280 776 656 S 0.0 0.3 0:01.46 init 2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 events/0 6 root 15 -5 0 0 0 S 0.0 0.0 0:00.06 khelper 56 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kblockd/0 62 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod 80 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush 81 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush 82 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kswapd0 83 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 84 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 nfsiod 215 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 mtdblockd 227 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 hid_compat 230 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 rpciod/0 ~ # cat /proc/stat cpu 209 0 0 16892 0 757 0 0 0 cpu0 209 0 0 16892 0 757 0 0 0 intr 79221 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 61353 0 0 0 0 0 17858 0 0 5 2 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 72511 btime 0 processes 360 procs_running 1 procs_blocked 0 ~ # ~ # cat /proc/interrupts M5445X 90: 62109 UART M5445X 96: 19793 timer M5445X 99: 6 timer M5445X 100: 2 fec(TXF) M5445X 101: 0 fec(TXB) M5445X 102: 0 fec(TXFIFO) M5445X 103: 0 fec(TXCR) M5445X 104: 2 fec(RXF) M5445X 105: 0 fec(RXB) M5445X 106: 1 fec(MII) M5445X 107: 0 fec(LC) M5445X 108: 0 fec(HBERR) M5445X 109: 0 fec(GRA) M5445X 110: 0 fec(EBERR) M5445X 111: 0 fec(BABT) M5445X 112: 0 fec(BABR) ~ # Run netcat test: target: nc -l -p 3333 > /dev/null & host: dd if=/dev/zero | nc <target-ip> 3333 Netcat test activity on ethernet ceases after only a few seconds. Wireshark concurs. top - 00:05:04 up 5 min, 1 user, load average: 0.00, 0.04, 0.01 Tasks: 25 total, 1 running, 24 sleeping, 0 stopped, 0 zombie Cpu(s): 0.7%us, 0.0%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.7%hi, 0.0%si, 0.0%st Mem: 254912k total, 6904k used, 248008k free, 0k buffers Swap: 0k total, 0k used, 0k free, 3008k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 363 root 20 0 2856 1160 944 R 1.0 0.5 0:00.27 top 362 root 20 0 3280 688 576 S 0.3 0.3 0:00.05 nc 1 root 20 0 3280 776 656 S 0.0 0.3 0:01.46 init 2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 15 -5 0 0 0 S 0.0 0.0 0:00.12 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 events/0 6 root 15 -5 0 0 0 S 0.0 0.0 0:00.06 khelper 56 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kblockd/0 62 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod 80 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush 81 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush 82 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kswapd0 83 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 84 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 nfsiod 215 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 mtdblockd 227 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 hid_compat ~ # cat /proc/stat cpu 228 0 0 31924 0 778 0 0 0 cpu0 228 0 0 31924 0 778 0 0 0 intr 106918 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73732 0 0 0 0 0 32930 0 0 10 96 0 0 0 149 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 110292 btime 0 processes 364 procs_running 1 procs_blocked 0 ~ # ~ # cat /proc/interrupts M5445X 90: 74523 UART M5445X 96: 34522 timer M5445X 99: 10 timer M5445X 100: 100 fec(TXF) M5445X 101: 0 fec(TXB) M5445X 102: 0 fec(TXFIFO) M5445X 103: 0 fec(TXCR) M5445X 104: 155 fec(RXF) M5445X 105: 0 fec(RXB) M5445X 106: 1 fec(MII) M5445X 107: 0 fec(LC) M5445X 108: 0 fec(HBERR) M5445X 109: 0 fec(GRA) M5445X 110: 0 fec(EBERR) M5445X 111: 0 fec(BABT) M5445X 112: 0 fec(BABR) ~ # At this point, my proc entry in the fec driver does not indicate any problem: ~ # cat /proc/driver/fec total interrupts: 284 last interrupt type: 2 [1=tx, 2=rx, 3=mii] total tx interrupts: 111 total rx interrupts: 172 total mii interrupts: 1 last interrupt event: 0x2000000 total eberr interrupts: 0 total hberr interrupts: 0 tx loop current count: 0 tx loop last count: 1 rx loop current count: 0 rx loop last count: 1 rx last cbd ctrl/status: 0x800 rx last cbd len: 64 rx last cbd buff addr: 0x40407800 rx last netif_rx status: 0 ~ # When starting the netcat test, activity on the ethernet ceases after only a few seconds. After stopping the dd/nc on the host, I cannot ping the host from the target or vice versa and the rs-232 console running on the target does not respond to every key press (usually every second key press). I instrumented some of the raw packet handling in the network stack to attempt to glean a little more info about ping. Running ping from target to host shows: ~ # ping 192.168.0.2 PING 192.168.0.2 (192.168.0.2): 56 data bytes [ 490.489407] ping:367 /mnt/nht/linux-2.6.29/net/core/datagram.c:99 wait_for_packet - interrupted [ 490.500812] ping:367 /mnt/nht/linux-2.6.29/net/core/datagram.c:110 wait_for_packet - timeo=2147483647, err=-512 [ 490.513637] ping:367 /mnt/nht/linux-2.6.29/net/ipv4/raw.c:668 raw_recvmsg - skb_recv_datagram [ 490.524790] ping:367 /mnt/nht/linux-2.6.29/net/ipv4/raw.c:701 raw_recvmsg - returning err -512 [ 490.536035] ping:367 /mnt/nht/linux-2.6.29/net/core/sock.c:1869 sock_common_recvmsg - recvmsg returned -512 [ 490.548497] ping:367 /mnt/nht/linux-2.6.29/arch/m68k/coldfire/signal.c:804 ERESTART Other tests which induce similar problems in the network stack include using wget to pull a large file onto the target. I can also force netif_rx() to eventually constantly return 1 (pkt dropped) by using a flood ping at high frequency (see previous post). This is a different scenario that netcat/wget but possible related to same cause. -- Matt ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <4A2F1C74.1010800@freescale.com>]
* Re: 2.6.29 & network stack strangeness [not found] ` <4A2F1C74.1010800@freescale.com> @ 2009-06-10 9:36 ` Matthew Lear [not found] ` <4A2F8318.5080206@freescale.com> 0 siblings, 1 reply; 8+ messages in thread From: Matthew Lear @ 2009-06-10 9:36 UTC (permalink / raw) To: Lanttor; +Cc: linux-m68k Lanttor wrote: > I met the interrupt issue before on mcf5445x platform, following is my fixed patch, but I am not sure it could help resolve your problem, just have a try :-) Thanks but this makes no difference. From what I can tell, the patch to entry.S is essentially benign because ret_from_interrupt() is not invoked anywhere. Also, the patch to ints.c adds invocations of irq_enter() and irq_exit() to process_int(). These macros really just force entry/exit to/from hard interrupt context and increment/decrement a count (used for tracking nested interrupts). Given that process_int() is called by inthandler() in entry.S, I think this is always going to be in interrupt context anyway, so I can't see the advantage of this. Maybe I'm missing something? > > For your mcf54455 platform, if you use one ether net port, try to disable second ether net port (FEC2) on the menuconfig to see whether it would affect the net working. Yes. I've already got the second net port disabled. I have had some level of success by calling flush_cache_all() immediately after calling netif_rx() in the rx part of the fec interrupt handler - fec_enet_rx() in fec.c. This plus a hack to netif_rx() seems to vastly improve things in that I can wget large files, run netcat tests although in the cast of netcat there is still a high number of rx overruns in the fec driver. Hack: --- a/net/core/dev.c 2009-05-12 07:02:24.000000000 +0100 +++ b/net/core/dev.c 2009-06-10 10:26:47.000000000 +0100 @@ -1969,16 +1969,18 @@ __skb_queue_tail(&queue->input_pkt_queue, skb); local_irq_restore(flags); return NET_RX_SUCCESS; } napi_schedule(&queue->backlog); goto enqueue; } + else + napi_schedule(&queue->backlog); __get_cpu_var(netdev_rx_stat).dropped++; local_irq_restore(flags); kfree_skb(skb); return NET_RX_DROP; } This effectively ignores the backlog check and schedules the receive function to run. I think all this points to is that there is a timing problem in the system. That said, I can believe the cache coherency issue as I've read some postings on the net about similar issues with ethernet tx on ppc... -- Matt ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <4A2F8318.5080206@freescale.com>]
* Re: 2.6.29 & network stack strangeness [not found] ` <4A2F8318.5080206@freescale.com> @ 2009-06-10 10:59 ` Matthew Lear 0 siblings, 0 replies; 8+ messages in thread From: Matthew Lear @ 2009-06-10 10:59 UTC (permalink / raw) To: Lanttor; +Cc: linux-m68k Lanttor wrote: >> Thanks but this makes no difference. From what I can tell, the patch to entry.S >> is essentially benign because ret_from_interrupt() is not invoked anywhere. > > ret_from_interrupt() is the part of inthandler(), and it's called by inthandler() at end. > Indeed it is. My apologies. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-06-10 10:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-05 15:15 2.6.29 & network stack strangeness Matthew Lear
2009-06-05 15:49 ` Finn Thain
2009-06-05 16:17 ` Matthew Lear
2009-06-05 16:37 ` Finn Thain
2009-06-05 16:44 ` Matthew Lear
[not found] ` <4A2DC70F.7080401@freescale.com>
2009-06-09 10:04 ` Matthew Lear
[not found] ` <4A2F1C74.1010800@freescale.com>
2009-06-10 9:36 ` Matthew Lear
[not found] ` <4A2F8318.5080206@freescale.com>
2009-06-10 10:59 ` Matthew Lear
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox