From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matthew Lear Subject: Re: 2.6.29 & network stack strangeness Date: Fri, 05 Jun 2009 17:44:19 +0100 Message-ID: <4A294B63.7010404@bubblegen.co.uk> References: <4A2936AF.4080601@bubblegen.co.uk> <4A294532.7030904@bubblegen.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: Received: from relay.ptn-ipout01.plus.net ([212.159.7.35]:37882 "EHLO relay.ptn-ipout01.plus.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751079AbZFEQoU (ORCPT ); Fri, 5 Jun 2009 12:44:20 -0400 In-Reply-To: Sender: linux-m68k-owner@vger.kernel.org List-Id: linux-m68k@vger.kernel.org To: Finn Thain Cc: linux-m68k@vger.kernel.org Yes. I was suspecting that all may not be well in that area... Current set up is a 10ms tick with CONFIG_HZ set to 100. Further investigation is required I think. -- Matt Finn Thain wrote: > My only guess would be that the network stack delayed work queues depend > upon working timer interrupts... > > But since I have no knowledge of your hardware, I don't think I'll be a > lot of help with that. > > Finn > > > On Fri, 5 Jun 2009, Matthew Lear wrote: > >> Hi - thanks for your reply. >> >> The problem doesn't manifest only when the DHCP lease expires and I can still >> reproduce the problem with a static IP. With or without DHCP makes no difference. >> >> It seems to effect socket comms quite seriously (and quickly). If I run a simple >> server program on the host that listens on a socket and writes a response string >> to the socket when it receives data, and on the target I run a simple client >> program which writes a string to the socket, reads and prints the response sent >> the server, I only have to send data from client to server with a delay of 1ms >> between transmissions for a few seconds and the client program hangs on calling >> read() on the socket fd. >> >> If I run a simple netcat test, eg >> >> on target: nc -l -p 3333 > /dev/null >> on host: dd if=/dev/zero | nc 3333 >> >> ...strangely, once activity on the ethernet link as a result of the netcat test >> ceases, running netstat -a on the target hangs for several seconds, eg: >> >> >> ~ # nc -l -p 3333 > /dev/null & >> ~ # netstat -a >> Active Internet connections (servers and established) >> Proto Recv-Q Send-Q Local Address Foreign Address State >> tcp 0 0 *:login *:* LISTEN >> tcp 0 0 *:shell *:* LISTEN >> tcp 0 0 *:sunrpc *:* LISTEN >> tcp 0 0 *:finger *:* LISTEN >> tcp 0 0 *:auth *:* LISTEN >> tcp 0 0 *:ftp *:* LISTEN >> tcp 0 0 *:telnet *:* LISTEN >> >> >> >> tcp 0 0 192.168.0.11:3333 gateway0:45645 >> ESTABLISHED >> udp 0 0 *:ntalk *:* >> udp 0 0 *:sunrpc *:* >> Active UNIX domain sockets (servers and established) >> Proto RefCnt Flags Type State I-Node Path >> unix 4 [ ] DGRAM 111 /dev/log >> unix 3 [ ] STREAM CONNECTED 123 >> unix 3 [ ] STREAM CONNECTED 122 >> unix 2 [ ] DGRAM 120 >> unix 2 [ ] DGRAM 114 >> ~ # >> >> I thought this was interesting. Also, after this, I have trouble entering >> characters over the serial port / console. It seems like interrupts may having >> trouble getting serviced but this may be a side-effect... >> >> If you run the same netstat command with strace, you can see that the delay is >> caused by polling the socket following calling send: >> >> ... >> ... >> gettimeofday({366, 470000}, NULL) = 0 >> poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1 >> send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43, >> 0x4000) = 43 >> poll( >> >> >> >> >> >> [{fd=4, events=POLLIN}], 1, 5000) = 0 >> ... >> ... >> >> -- Matt >> >> >> Finn Thain wrote: >>> Does the problem manifest only when the DHCP lease expires? >>> Can you reproduce the problem with a static IP? >>> >>> Finn >>> >>> >>> On Fri, 5 Jun 2009, Matthew Lear wrote: >>> >>>> Hello all, >>>> >>>> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform >>>> and I'm having some throughput problems when running network tests. >>>> >>>> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains >>>> a lease from the dhcp server and configures eth0. Network connectivity is ok. I >>>> can ping the target from the host and vice versa. >>>> >>>> 1/ >>>> If I run ping -s 1500 -i 0.0001 on the host pc, after >>>> several mins, the kernel reports 'unexpected interrupt from 24' which is the >>>> vector for a spurious interrupt. This message will repeat randomly (from what I >>>> saw it appeared ~ 20 times when running the ping test above for 40 mins). The >>>> mcf54455 reference manual describes a possible cause for spurious interrupts. >>>> However, this test very rarely reports any packet loss, although the max time to >>>> receive a packet can be very large indeed. >>>> >>>> 2/ >>>> If I reboot, start again and run a ping flood test (ping -f) from host pc -> >>>> target, all icmp requests are acknowledged - for a while. Before the target >>>> begins to fail to respond to the icmp requests, running top shows that the >>>> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in >>>> processing the deferred tasks of processing data fired up to the network stack. >>>> So when the target beings to stop responding to icmp, if I then stop the ping >>>> flood and try to ping the host from the target, there is no reply indicated by >>>> ping. However, if you do this with a packet sniffer running (eg wireshark) you >>>> can see that data is still being transmitted from the target -> host and you can >>>> see the icmp reply, only the reply from the host appears to be received ok by >>>> the fec driver but is processed by the network stack target. >>>> >>>> When in this state, a proc entry that I added to the fec driver shows that the >>>> last return value from netif_rx() (called in the fec rx interrupt handling >>>> routine) is 1, indicating that the last packet was dropped by the network stack, >>>> e.g. >>>> >>>> ~ # cat /proc/driver/fec >>>> total interrupts: 1421619 >>>> last interrupt type: 2 [1=tx, 2=rx, 3=mii] >>>> total tx interrupts: 709148 >>>> total rx interrupts: 712472 >>>> total mii interrupts: 1 >>>> last interrupt event: 0x2000000 >>>> total eberr interrupts: 0 >>>> total hberr interrupts: 0 >>>> tx loop current count: 0 >>>> tx loop last count: 1 >>>> rx loop current count: 0 >>>> rx loop last count: 1 >>>> rx last cbd ctrl/status: 0x800 >>>> rx last cbd len: 346 >>>> rx last cbd buff addr: 0x40410000 >>>> rx last netif_rx status: 1 >>>> >>>> Strangely, wireshark still shows data being transmitted from the target >>>> -> host. I can see ARP requests and I can also see DHCP discovery packets being >>>> sent by the target when its DHCP lease expires. This all looks ok, only the >>>> reply from host -> target is never processed by the target as the network stack >>>> is in a state where it is dropping all incoming data provided to it by the driver. >>>> >>>> I believe udhcpc utilises the network device directly, ie it does not require an >>>> intermediate network protocol being implemented in the kernel (tcpdump is >>>> similar). >>>> >>>> The fec driver still seems to be running ok because I can see the ring buffer >>>> address changing when data is received. Everything seems to be ok apart from the >>>> network stack. Very strange indeed. >>>> >>>> Running network throughput tests between host and target with netcat or netperf >>>> only run for a few seconds before activity ceases. >>>> >>>> Has anybody experienced anything similar? Why does the network stack appear to >>>> be stuck and constantly dropping packets? >>>> >>>> Any feedback appreciated. >>>> >>>> Rgds, >>>> -- Matt >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >