Re: 2.6.29 & network stack strangeness

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Lear <matt@bubblegen.co.uk>
To: Finn Thain <fthain@telegraphics.com.au>
Cc: linux-m68k@vger.kernel.org
Subject: Re: 2.6.29 & network stack strangeness
Date: Fri, 05 Jun 2009 17:44:19 +0100	[thread overview]
Message-ID: <4A294B63.7010404@bubblegen.co.uk> (raw)
In-Reply-To: <Pine.LNX.4.64.0906060233071.16687@loopy.telegraphics.com.au>

Yes. I was suspecting that all may not be well in that area... Current set up is
a 10ms tick with CONFIG_HZ set to 100. Further investigation is required I think.
--  Matt

Finn Thain wrote:
> My only guess would be that the network stack delayed work queues depend 
> upon working timer interrupts...
> 
> But since I have no knowledge of your hardware, I don't think I'll be a 
> lot of help with that.
> 
> Finn
> 
> 
> On Fri, 5 Jun 2009, Matthew Lear wrote:
> 
>> Hi - thanks for your reply.
>>
>> The problem doesn't manifest only when the DHCP lease expires and I can still
>> reproduce the problem with a static IP. With or without DHCP makes no difference.
>>
>> It seems to effect socket comms quite seriously (and quickly). If I run a simple
>> server program on the host that listens on a socket and writes a response string
>> to the socket when it receives data, and on the target I run a simple client
>> program which writes a string to the socket, reads and prints the response sent
>> the server, I only have to send data from client to server with a delay of 1ms
>> between transmissions for a few seconds and the client program hangs on calling
>> read() on the socket fd.
>>
>> If I run a simple netcat test, eg
>>
>> on target: nc -l -p 3333 > /dev/null
>> on host: dd if=/dev/zero | nc <target-ip> 3333
>>
>> ...strangely, once activity on the ethernet link as a result of the netcat test
>> ceases, running netstat -a on the target hangs for several seconds, eg:
>>
>>
>> ~ # nc -l -p 3333 > /dev/null &
>> ~ # netstat -a
>> Active Internet connections (servers and established)
>> Proto Recv-Q Send-Q Local Address           Foreign Address         State
>> tcp        0      0 *:login                 *:*                     LISTEN
>> tcp        0      0 *:shell                 *:*                     LISTEN
>> tcp        0      0 *:sunrpc                *:*                     LISTEN
>> tcp        0      0 *:finger                *:*                     LISTEN
>> tcp        0      0 *:auth                  *:*                     LISTEN
>> tcp        0      0 *:ftp                   *:*                     LISTEN
>> tcp        0      0 *:telnet                *:*                     LISTEN
>>
>> <system hangs for several seconds here>
>>
>> tcp        0      0 192.168.0.11:3333       gateway0:45645
>> ESTABLISHED
>> udp        0      0 *:ntalk                 *:*
>> udp        0      0 *:sunrpc                *:*
>> Active UNIX domain sockets (servers and established)
>> Proto RefCnt Flags       Type       State         I-Node Path
>> unix  4      [ ]         DGRAM                    111    /dev/log
>> unix  3      [ ]         STREAM     CONNECTED     123
>> unix  3      [ ]         STREAM     CONNECTED     122
>> unix  2      [ ]         DGRAM                    120
>> unix  2      [ ]         DGRAM                    114
>> ~ #
>>
>> I thought this was interesting. Also, after this, I have trouble entering
>> characters over the serial port / console. It seems like interrupts may having
>> trouble getting serviced but this may be a side-effect...
>>
>> If you run the same netstat command with strace, you can see that the delay is
>> caused by polling the socket following calling send:
>>
>> ...
>> ...
>> gettimeofday({366, 470000}, NULL)       = 0
>> poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
>> send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43,
>> 0x4000) = 43
>> poll(
>>
>>
>> <delay is here>
>>
>>
>> [{fd=4, events=POLLIN}], 1, 5000)  = 0
>> ...
>> ...
>>
>> --  Matt
>>
>>
>> Finn Thain wrote:
>>> Does the problem manifest only when the DHCP lease expires?
>>> Can you reproduce the problem with a static IP?
>>>
>>> Finn
>>>
>>>
>>> On Fri, 5 Jun 2009, Matthew Lear wrote:
>>>
>>>> Hello all,
>>>>
>>>> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
>>>> and I'm having some throughput problems when running network tests.
>>>>
>>>> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
>>>> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
>>>> can ping the target from the host and vice versa.
>>>>
>>>> 1/
>>>> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
>>>> several mins, the kernel reports 'unexpected interrupt from 24' which is the
>>>> vector for a spurious interrupt. This message will repeat randomly (from what I
>>>> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
>>>> mcf54455 reference manual describes a possible cause for spurious interrupts.
>>>> However, this test very rarely reports any packet loss, although the max time to
>>>> receive a packet can be very large indeed.
>>>>
>>>> 2/
>>>> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
>>>> target, all icmp requests are acknowledged - for a while. Before the target
>>>> begins to fail to respond to the icmp requests, running top shows that the
>>>> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
>>>> processing the deferred tasks of processing data fired up to the network stack.
>>>> So when the target beings to stop responding to icmp, if I then stop the ping
>>>> flood and try to ping the host from the target, there is no reply indicated by
>>>> ping. However, if you do this with a packet sniffer running (eg wireshark) you
>>>> can see that data is still being transmitted from the target -> host and you can
>>>> see the icmp reply, only the reply from the host appears to be received ok by
>>>> the fec driver but is processed by the network stack target.
>>>>
>>>> When in this state, a proc entry that I added to the fec driver shows that the
>>>> last return value from netif_rx() (called in the fec rx interrupt handling
>>>> routine) is 1, indicating that the last packet was dropped by the network stack,
>>>> e.g.
>>>>
>>>> ~ # cat /proc/driver/fec
>>>> total interrupts: 1421619
>>>> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
>>>> total tx interrupts: 709148
>>>> total rx interrupts: 712472
>>>> total mii interrupts: 1
>>>> last interrupt event: 0x2000000
>>>> total eberr interrupts: 0
>>>> total hberr interrupts: 0
>>>> tx loop current count: 0
>>>> tx loop last count: 1
>>>> rx loop current count: 0
>>>> rx loop last count: 1
>>>> rx last cbd ctrl/status: 0x800
>>>> rx last cbd len: 346
>>>> rx last cbd buff addr: 0x40410000
>>>> rx last netif_rx status: 1
>>>>
>>>> Strangely, wireshark still shows data being transmitted from the target
>>>> -> host. I can see ARP requests and I can also see DHCP discovery packets being
>>>> sent by the target when its DHCP lease expires. This all looks ok, only the
>>>> reply from host -> target is never processed by the target as the network stack
>>>> is in a state where it is dropping all incoming data provided to it by the driver.
>>>>
>>>> I believe udhcpc utilises the network device directly, ie it does not require an
>>>> intermediate network protocol being implemented in the kernel (tcpdump is
>>>> similar).
>>>>
>>>> The fec driver still seems to be running ok because I can see the ring buffer
>>>> address changing when data is received. Everything seems to be ok apart from the
>>>> network stack. Very strange indeed.
>>>>
>>>> Running network throughput tests between host and target with netcat or netperf
>>>> only run for a few seconds before activity ceases.
>>>>
>>>> Has anybody experienced anything similar? Why does the network stack appear to
>>>> be stuck and constantly dropping packets?
>>>>
>>>> Any feedback appreciated.
>>>>
>>>> Rgds,
>>>> --  Matt
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

next prev parent reply	other threads:[~2009-06-05 16:44 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-05 15:15 2.6.29 & network stack strangeness Matthew Lear
2009-06-05 15:49 ` Finn Thain
2009-06-05 16:17   ` Matthew Lear
2009-06-05 16:37     ` Finn Thain
2009-06-05 16:44       ` Matthew Lear [this message]
     [not found]         ` <4A2DC70F.7080401@freescale.com>
2009-06-09 10:04           ` Matthew Lear
     [not found]             ` <4A2F1C74.1010800@freescale.com>
2009-06-10  9:36               ` Matthew Lear
     [not found]                 ` <4A2F8318.5080206@freescale.com>
2009-06-10 10:59                   ` Matthew Lear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A294B63.7010404@bubblegen.co.uk \
    --to=matt@bubblegen.co.uk \
    --cc=fthain@telegraphics.com.au \
    --cc=linux-m68k@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.