From: Matthew Lear <matt@bubblegen.co.uk>
To: Finn Thain <fthain@telegraphics.com.au>
Cc: linux-m68k@vger.kernel.org
Subject: Re: 2.6.29 & network stack strangeness
Date: Fri, 05 Jun 2009 17:17:54 +0100 [thread overview]
Message-ID: <4A294532.7030904@bubblegen.co.uk> (raw)
In-Reply-To: <Pine.LNX.4.64.0906060149130.16687@loopy.telegraphics.com.au>
Hi - thanks for your reply.
The problem doesn't manifest only when the DHCP lease expires and I can still
reproduce the problem with a static IP. With or without DHCP makes no difference.
It seems to effect socket comms quite seriously (and quickly). If I run a simple
server program on the host that listens on a socket and writes a response string
to the socket when it receives data, and on the target I run a simple client
program which writes a string to the socket, reads and prints the response sent
the server, I only have to send data from client to server with a delay of 1ms
between transmissions for a few seconds and the client program hangs on calling
read() on the socket fd.
If I run a simple netcat test, eg
on target: nc -l -p 3333 > /dev/null
on host: dd if=/dev/zero | nc <target-ip> 3333
...strangely, once activity on the ethernet link as a result of the netcat test
ceases, running netstat -a on the target hangs for several seconds, eg:
~ # nc -l -p 3333 > /dev/null &
~ # netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:login *:* LISTEN
tcp 0 0 *:shell *:* LISTEN
tcp 0 0 *:sunrpc *:* LISTEN
tcp 0 0 *:finger *:* LISTEN
tcp 0 0 *:auth *:* LISTEN
tcp 0 0 *:ftp *:* LISTEN
tcp 0 0 *:telnet *:* LISTEN
<system hangs for several seconds here>
tcp 0 0 192.168.0.11:3333 gateway0:45645
ESTABLISHED
udp 0 0 *:ntalk *:*
udp 0 0 *:sunrpc *:*
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 4 [ ] DGRAM 111 /dev/log
unix 3 [ ] STREAM CONNECTED 123
unix 3 [ ] STREAM CONNECTED 122
unix 2 [ ] DGRAM 120
unix 2 [ ] DGRAM 114
~ #
I thought this was interesting. Also, after this, I have trouble entering
characters over the serial port / console. It seems like interrupts may having
trouble getting serviced but this may be a side-effect...
If you run the same netstat command with strace, you can see that the delay is
caused by polling the socket following calling send:
...
...
gettimeofday({366, 470000}, NULL) = 0
poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43,
0x4000) = 43
poll(
<delay is here>
[{fd=4, events=POLLIN}], 1, 5000) = 0
...
...
-- Matt
Finn Thain wrote:
> Does the problem manifest only when the DHCP lease expires?
> Can you reproduce the problem with a static IP?
>
> Finn
>
>
> On Fri, 5 Jun 2009, Matthew Lear wrote:
>
>> Hello all,
>>
>> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
>> and I'm having some throughput problems when running network tests.
>>
>> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
>> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
>> can ping the target from the host and vice versa.
>>
>> 1/
>> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
>> several mins, the kernel reports 'unexpected interrupt from 24' which is the
>> vector for a spurious interrupt. This message will repeat randomly (from what I
>> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
>> mcf54455 reference manual describes a possible cause for spurious interrupts.
>> However, this test very rarely reports any packet loss, although the max time to
>> receive a packet can be very large indeed.
>>
>> 2/
>> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
>> target, all icmp requests are acknowledged - for a while. Before the target
>> begins to fail to respond to the icmp requests, running top shows that the
>> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
>> processing the deferred tasks of processing data fired up to the network stack.
>> So when the target beings to stop responding to icmp, if I then stop the ping
>> flood and try to ping the host from the target, there is no reply indicated by
>> ping. However, if you do this with a packet sniffer running (eg wireshark) you
>> can see that data is still being transmitted from the target -> host and you can
>> see the icmp reply, only the reply from the host appears to be received ok by
>> the fec driver but is processed by the network stack target.
>>
>> When in this state, a proc entry that I added to the fec driver shows that the
>> last return value from netif_rx() (called in the fec rx interrupt handling
>> routine) is 1, indicating that the last packet was dropped by the network stack,
>> e.g.
>>
>> ~ # cat /proc/driver/fec
>> total interrupts: 1421619
>> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
>> total tx interrupts: 709148
>> total rx interrupts: 712472
>> total mii interrupts: 1
>> last interrupt event: 0x2000000
>> total eberr interrupts: 0
>> total hberr interrupts: 0
>> tx loop current count: 0
>> tx loop last count: 1
>> rx loop current count: 0
>> rx loop last count: 1
>> rx last cbd ctrl/status: 0x800
>> rx last cbd len: 346
>> rx last cbd buff addr: 0x40410000
>> rx last netif_rx status: 1
>>
>> Strangely, wireshark still shows data being transmitted from the target
>> -> host. I can see ARP requests and I can also see DHCP discovery packets being
>> sent by the target when its DHCP lease expires. This all looks ok, only the
>> reply from host -> target is never processed by the target as the network stack
>> is in a state where it is dropping all incoming data provided to it by the driver.
>>
>> I believe udhcpc utilises the network device directly, ie it does not require an
>> intermediate network protocol being implemented in the kernel (tcpdump is
>> similar).
>>
>> The fec driver still seems to be running ok because I can see the ring buffer
>> address changing when data is received. Everything seems to be ok apart from the
>> network stack. Very strange indeed.
>>
>> Running network throughput tests between host and target with netcat or netperf
>> only run for a few seconds before activity ceases.
>>
>> Has anybody experienced anything similar? Why does the network stack appear to
>> be stuck and constantly dropping packets?
>>
>> Any feedback appreciated.
>>
>> Rgds,
>> -- Matt
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
next prev parent reply other threads:[~2009-06-05 16:17 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-05 15:15 2.6.29 & network stack strangeness Matthew Lear
2009-06-05 15:49 ` Finn Thain
2009-06-05 16:17 ` Matthew Lear [this message]
2009-06-05 16:37 ` Finn Thain
2009-06-05 16:44 ` Matthew Lear
[not found] ` <4A2DC70F.7080401@freescale.com>
2009-06-09 10:04 ` Matthew Lear
[not found] ` <4A2F1C74.1010800@freescale.com>
2009-06-10 9:36 ` Matthew Lear
[not found] ` <4A2F8318.5080206@freescale.com>
2009-06-10 10:59 ` Matthew Lear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A294532.7030904@bubblegen.co.uk \
--to=matt@bubblegen.co.uk \
--cc=fthain@telegraphics.com.au \
--cc=linux-m68k@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.