2.6.29 & network stack strangeness

public inbox for linux-m68k@lists.linux-m68k.org
 help / color / mirror / Atom feed

* 2.6.29 & network stack strangeness
@ 2009-06-05 15:15 Matthew Lear
  2009-06-05 15:49 ` Finn Thain
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-05 15:15 UTC (permalink / raw)
  To: linux-m68k

Hello all,

I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
and I'm having some throughput problems when running network tests.

The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
a lease from the dhcp server and configures eth0. Network connectivity is ok. I
can ping the target from the host and vice versa.

1/
If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
several mins, the kernel reports 'unexpected interrupt from 24' which is the
vector for a spurious interrupt. This message will repeat randomly (from what I
saw it appeared ~ 20 times when running the ping test above for 40 mins). The
mcf54455 reference manual describes a possible cause for spurious interrupts.
However, this test very rarely reports any packet loss, although the max time to
receive a packet can be very large indeed.

2/
If I reboot, start again and run a ping flood test (ping -f) from host pc ->
target, all icmp requests are acknowledged - for a while. Before the target
begins to fail to respond to the icmp requests, running top shows that the
ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
processing the deferred tasks of processing data fired up to the network stack.
So when the target beings to stop responding to icmp, if I then stop the ping
flood and try to ping the host from the target, there is no reply indicated by
ping. However, if you do this with a packet sniffer running (eg wireshark) you
can see that data is still being transmitted from the target -> host and you can
see the icmp reply, only the reply from the host appears to be received ok by
the fec driver but is processed by the network stack target.

When in this state, a proc entry that I added to the fec driver shows that the
last return value from netif_rx() (called in the fec rx interrupt handling
routine) is 1, indicating that the last packet was dropped by the network stack,
e.g.

~ # cat /proc/driver/fec
total interrupts: 1421619
last interrupt type: 2 [1=tx, 2=rx, 3=mii]
total tx interrupts: 709148
total rx interrupts: 712472
total mii interrupts: 1
last interrupt event: 0x2000000
total eberr interrupts: 0
total hberr interrupts: 0
tx loop current count: 0
tx loop last count: 1
rx loop current count: 0
rx loop last count: 1
rx last cbd ctrl/status: 0x800
rx last cbd len: 346
rx last cbd buff addr: 0x40410000
rx last netif_rx status: 1

Strangely, wireshark still shows data being transmitted from the target
-> host. I can see ARP requests and I can also see DHCP discovery packets being
sent by the target when its DHCP lease expires. This all looks ok, only the
reply from host -> target is never processed by the target as the network stack
is in a state where it is dropping all incoming data provided to it by the driver.

I believe udhcpc utilises the network device directly, ie it does not require an
intermediate network protocol being implemented in the kernel (tcpdump is
similar).

The fec driver still seems to be running ok because I can see the ring buffer
address changing when data is received. Everything seems to be ok apart from the
network stack. Very strange indeed.

Running network throughput tests between host and target with netcat or netperf
only run for a few seconds before activity ceases.

Has anybody experienced anything similar? Why does the network stack appear to
be stuck and constantly dropping packets?

Any feedback appreciated.

Rgds,
--  Matt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.29 & network stack strangeness
  2009-06-05 15:15 2.6.29 & network stack strangeness Matthew Lear
@ 2009-06-05 15:49 ` Finn Thain
  2009-06-05 16:17   ` Matthew Lear
  0 siblings, 1 reply; 8+ messages in thread
From: Finn Thain @ 2009-06-05 15:49 UTC (permalink / raw)
  To: Matthew Lear; +Cc: linux-m68k


Does the problem manifest only when the DHCP lease expires?
Can you reproduce the problem with a static IP?

Finn


On Fri, 5 Jun 2009, Matthew Lear wrote:

> Hello all,
> 
> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
> and I'm having some throughput problems when running network tests.
> 
> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
> can ping the target from the host and vice versa.
> 
> 1/
> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
> several mins, the kernel reports 'unexpected interrupt from 24' which is the
> vector for a spurious interrupt. This message will repeat randomly (from what I
> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
> mcf54455 reference manual describes a possible cause for spurious interrupts.
> However, this test very rarely reports any packet loss, although the max time to
> receive a packet can be very large indeed.
> 
> 2/
> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
> target, all icmp requests are acknowledged - for a while. Before the target
> begins to fail to respond to the icmp requests, running top shows that the
> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
> processing the deferred tasks of processing data fired up to the network stack.
> So when the target beings to stop responding to icmp, if I then stop the ping
> flood and try to ping the host from the target, there is no reply indicated by
> ping. However, if you do this with a packet sniffer running (eg wireshark) you
> can see that data is still being transmitted from the target -> host and you can
> see the icmp reply, only the reply from the host appears to be received ok by
> the fec driver but is processed by the network stack target.
> 
> When in this state, a proc entry that I added to the fec driver shows that the
> last return value from netif_rx() (called in the fec rx interrupt handling
> routine) is 1, indicating that the last packet was dropped by the network stack,
> e.g.
> 
> ~ # cat /proc/driver/fec
> total interrupts: 1421619
> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
> total tx interrupts: 709148
> total rx interrupts: 712472
> total mii interrupts: 1
> last interrupt event: 0x2000000
> total eberr interrupts: 0
> total hberr interrupts: 0
> tx loop current count: 0
> tx loop last count: 1
> rx loop current count: 0
> rx loop last count: 1
> rx last cbd ctrl/status: 0x800
> rx last cbd len: 346
> rx last cbd buff addr: 0x40410000
> rx last netif_rx status: 1
> 
> Strangely, wireshark still shows data being transmitted from the target
> -> host. I can see ARP requests and I can also see DHCP discovery packets being
> sent by the target when its DHCP lease expires. This all looks ok, only the
> reply from host -> target is never processed by the target as the network stack
> is in a state where it is dropping all incoming data provided to it by the driver.
> 
> I believe udhcpc utilises the network device directly, ie it does not require an
> intermediate network protocol being implemented in the kernel (tcpdump is
> similar).
> 
> The fec driver still seems to be running ok because I can see the ring buffer
> address changing when data is received. Everything seems to be ok apart from the
> network stack. Very strange indeed.
> 
> Running network throughput tests between host and target with netcat or netperf
> only run for a few seconds before activity ceases.
> 
> Has anybody experienced anything similar? Why does the network stack appear to
> be stuck and constantly dropping packets?
> 
> Any feedback appreciated.
> 
> Rgds,
> --  Matt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.29 & network stack strangeness
  2009-06-05 15:49 ` Finn Thain
@ 2009-06-05 16:17   ` Matthew Lear
  2009-06-05 16:37     ` Finn Thain
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-05 16:17 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Hi - thanks for your reply.

The problem doesn't manifest only when the DHCP lease expires and I can still
reproduce the problem with a static IP. With or without DHCP makes no difference.

It seems to effect socket comms quite seriously (and quickly). If I run a simple
server program on the host that listens on a socket and writes a response string
to the socket when it receives data, and on the target I run a simple client
program which writes a string to the socket, reads and prints the response sent
the server, I only have to send data from client to server with a delay of 1ms
between transmissions for a few seconds and the client program hangs on calling
read() on the socket fd.

If I run a simple netcat test, eg

on target: nc -l -p 3333 > /dev/null
on host: dd if=/dev/zero | nc <target-ip> 3333

...strangely, once activity on the ethernet link as a result of the netcat test
ceases, running netstat -a on the target hangs for several seconds, eg:


~ # nc -l -p 3333 > /dev/null &
~ # netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 *:login                 *:*                     LISTEN
tcp        0      0 *:shell                 *:*                     LISTEN
tcp        0      0 *:sunrpc                *:*                     LISTEN
tcp        0      0 *:finger                *:*                     LISTEN
tcp        0      0 *:auth                  *:*                     LISTEN
tcp        0      0 *:ftp                   *:*                     LISTEN
tcp        0      0 *:telnet                *:*                     LISTEN

<system hangs for several seconds here>

tcp        0      0 192.168.0.11:3333       gateway0:45645
ESTABLISHED
udp        0      0 *:ntalk                 *:*
udp        0      0 *:sunrpc                *:*
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  4      [ ]         DGRAM                    111    /dev/log
unix  3      [ ]         STREAM     CONNECTED     123
unix  3      [ ]         STREAM     CONNECTED     122
unix  2      [ ]         DGRAM                    120
unix  2      [ ]         DGRAM                    114
~ #

I thought this was interesting. Also, after this, I have trouble entering
characters over the serial port / console. It seems like interrupts may having
trouble getting serviced but this may be a side-effect...

If you run the same netstat command with strace, you can see that the delay is
caused by polling the socket following calling send:

...
...
gettimeofday({366, 470000}, NULL)       = 0
poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43,
0x4000) = 43
poll(


<delay is here>


[{fd=4, events=POLLIN}], 1, 5000)  = 0
...
...

--  Matt


Finn Thain wrote:
> Does the problem manifest only when the DHCP lease expires?
> Can you reproduce the problem with a static IP?
> 
> Finn
> 
> 
> On Fri, 5 Jun 2009, Matthew Lear wrote:
> 
>> Hello all,
>>
>> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
>> and I'm having some throughput problems when running network tests.
>>
>> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
>> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
>> can ping the target from the host and vice versa.
>>
>> 1/
>> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
>> several mins, the kernel reports 'unexpected interrupt from 24' which is the
>> vector for a spurious interrupt. This message will repeat randomly (from what I
>> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
>> mcf54455 reference manual describes a possible cause for spurious interrupts.
>> However, this test very rarely reports any packet loss, although the max time to
>> receive a packet can be very large indeed.
>>
>> 2/
>> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
>> target, all icmp requests are acknowledged - for a while. Before the target
>> begins to fail to respond to the icmp requests, running top shows that the
>> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
>> processing the deferred tasks of processing data fired up to the network stack.
>> So when the target beings to stop responding to icmp, if I then stop the ping
>> flood and try to ping the host from the target, there is no reply indicated by
>> ping. However, if you do this with a packet sniffer running (eg wireshark) you
>> can see that data is still being transmitted from the target -> host and you can
>> see the icmp reply, only the reply from the host appears to be received ok by
>> the fec driver but is processed by the network stack target.
>>
>> When in this state, a proc entry that I added to the fec driver shows that the
>> last return value from netif_rx() (called in the fec rx interrupt handling
>> routine) is 1, indicating that the last packet was dropped by the network stack,
>> e.g.
>>
>> ~ # cat /proc/driver/fec
>> total interrupts: 1421619
>> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
>> total tx interrupts: 709148
>> total rx interrupts: 712472
>> total mii interrupts: 1
>> last interrupt event: 0x2000000
>> total eberr interrupts: 0
>> total hberr interrupts: 0
>> tx loop current count: 0
>> tx loop last count: 1
>> rx loop current count: 0
>> rx loop last count: 1
>> rx last cbd ctrl/status: 0x800
>> rx last cbd len: 346
>> rx last cbd buff addr: 0x40410000
>> rx last netif_rx status: 1
>>
>> Strangely, wireshark still shows data being transmitted from the target
>> -> host. I can see ARP requests and I can also see DHCP discovery packets being
>> sent by the target when its DHCP lease expires. This all looks ok, only the
>> reply from host -> target is never processed by the target as the network stack
>> is in a state where it is dropping all incoming data provided to it by the driver.
>>
>> I believe udhcpc utilises the network device directly, ie it does not require an
>> intermediate network protocol being implemented in the kernel (tcpdump is
>> similar).
>>
>> The fec driver still seems to be running ok because I can see the ring buffer
>> address changing when data is received. Everything seems to be ok apart from the
>> network stack. Very strange indeed.
>>
>> Running network throughput tests between host and target with netcat or netperf
>> only run for a few seconds before activity ceases.
>>
>> Has anybody experienced anything similar? Why does the network stack appear to
>> be stuck and constantly dropping packets?
>>
>> Any feedback appreciated.
>>
>> Rgds,
>> --  Matt
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.29 & network stack strangeness
  2009-06-05 16:17   ` Matthew Lear
@ 2009-06-05 16:37     ` Finn Thain
  2009-06-05 16:44       ` Matthew Lear
  0 siblings, 1 reply; 8+ messages in thread
From: Finn Thain @ 2009-06-05 16:37 UTC (permalink / raw)
  To: Matthew Lear; +Cc: linux-m68k


My only guess would be that the network stack delayed work queues depend 
upon working timer interrupts...

But since I have no knowledge of your hardware, I don't think I'll be a 
lot of help with that.

Finn


On Fri, 5 Jun 2009, Matthew Lear wrote:

> Hi - thanks for your reply.
> 
> The problem doesn't manifest only when the DHCP lease expires and I can still
> reproduce the problem with a static IP. With or without DHCP makes no difference.
> 
> It seems to effect socket comms quite seriously (and quickly). If I run a simple
> server program on the host that listens on a socket and writes a response string
> to the socket when it receives data, and on the target I run a simple client
> program which writes a string to the socket, reads and prints the response sent
> the server, I only have to send data from client to server with a delay of 1ms
> between transmissions for a few seconds and the client program hangs on calling
> read() on the socket fd.
> 
> If I run a simple netcat test, eg
> 
> on target: nc -l -p 3333 > /dev/null
> on host: dd if=/dev/zero | nc <target-ip> 3333
> 
> ...strangely, once activity on the ethernet link as a result of the netcat test
> ceases, running netstat -a on the target hangs for several seconds, eg:
> 
> 
> ~ # nc -l -p 3333 > /dev/null &
> ~ # netstat -a
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> tcp        0      0 *:login                 *:*                     LISTEN
> tcp        0      0 *:shell                 *:*                     LISTEN
> tcp        0      0 *:sunrpc                *:*                     LISTEN
> tcp        0      0 *:finger                *:*                     LISTEN
> tcp        0      0 *:auth                  *:*                     LISTEN
> tcp        0      0 *:ftp                   *:*                     LISTEN
> tcp        0      0 *:telnet                *:*                     LISTEN
> 
> <system hangs for several seconds here>
> 
> tcp        0      0 192.168.0.11:3333       gateway0:45645
> ESTABLISHED
> udp        0      0 *:ntalk                 *:*
> udp        0      0 *:sunrpc                *:*
> Active UNIX domain sockets (servers and established)
> Proto RefCnt Flags       Type       State         I-Node Path
> unix  4      [ ]         DGRAM                    111    /dev/log
> unix  3      [ ]         STREAM     CONNECTED     123
> unix  3      [ ]         STREAM     CONNECTED     122
> unix  2      [ ]         DGRAM                    120
> unix  2      [ ]         DGRAM                    114
> ~ #
> 
> I thought this was interesting. Also, after this, I have trouble entering
> characters over the serial port / console. It seems like interrupts may having
> trouble getting serviced but this may be a side-effect...
> 
> If you run the same netstat command with strace, you can see that the delay is
> caused by polling the socket following calling send:
> 
> ...
> ...
> gettimeofday({366, 470000}, NULL)       = 0
> poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
> send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43,
> 0x4000) = 43
> poll(
> 
> 
> <delay is here>
> 
> 
> [{fd=4, events=POLLIN}], 1, 5000)  = 0
> ...
> ...
> 
> --  Matt
> 
> 
> Finn Thain wrote:
> > Does the problem manifest only when the DHCP lease expires?
> > Can you reproduce the problem with a static IP?
> > 
> > Finn
> > 
> > 
> > On Fri, 5 Jun 2009, Matthew Lear wrote:
> > 
> >> Hello all,
> >>
> >> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
> >> and I'm having some throughput problems when running network tests.
> >>
> >> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
> >> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
> >> can ping the target from the host and vice versa.
> >>
> >> 1/
> >> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
> >> several mins, the kernel reports 'unexpected interrupt from 24' which is the
> >> vector for a spurious interrupt. This message will repeat randomly (from what I
> >> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
> >> mcf54455 reference manual describes a possible cause for spurious interrupts.
> >> However, this test very rarely reports any packet loss, although the max time to
> >> receive a packet can be very large indeed.
> >>
> >> 2/
> >> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
> >> target, all icmp requests are acknowledged - for a while. Before the target
> >> begins to fail to respond to the icmp requests, running top shows that the
> >> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
> >> processing the deferred tasks of processing data fired up to the network stack.
> >> So when the target beings to stop responding to icmp, if I then stop the ping
> >> flood and try to ping the host from the target, there is no reply indicated by
> >> ping. However, if you do this with a packet sniffer running (eg wireshark) you
> >> can see that data is still being transmitted from the target -> host and you can
> >> see the icmp reply, only the reply from the host appears to be received ok by
> >> the fec driver but is processed by the network stack target.
> >>
> >> When in this state, a proc entry that I added to the fec driver shows that the
> >> last return value from netif_rx() (called in the fec rx interrupt handling
> >> routine) is 1, indicating that the last packet was dropped by the network stack,
> >> e.g.
> >>
> >> ~ # cat /proc/driver/fec
> >> total interrupts: 1421619
> >> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
> >> total tx interrupts: 709148
> >> total rx interrupts: 712472
> >> total mii interrupts: 1
> >> last interrupt event: 0x2000000
> >> total eberr interrupts: 0
> >> total hberr interrupts: 0
> >> tx loop current count: 0
> >> tx loop last count: 1
> >> rx loop current count: 0
> >> rx loop last count: 1
> >> rx last cbd ctrl/status: 0x800
> >> rx last cbd len: 346
> >> rx last cbd buff addr: 0x40410000
> >> rx last netif_rx status: 1
> >>
> >> Strangely, wireshark still shows data being transmitted from the target
> >> -> host. I can see ARP requests and I can also see DHCP discovery packets being
> >> sent by the target when its DHCP lease expires. This all looks ok, only the
> >> reply from host -> target is never processed by the target as the network stack
> >> is in a state where it is dropping all incoming data provided to it by the driver.
> >>
> >> I believe udhcpc utilises the network device directly, ie it does not require an
> >> intermediate network protocol being implemented in the kernel (tcpdump is
> >> similar).
> >>
> >> The fec driver still seems to be running ok because I can see the ring buffer
> >> address changing when data is received. Everything seems to be ok apart from the
> >> network stack. Very strange indeed.
> >>
> >> Running network throughput tests between host and target with netcat or netperf
> >> only run for a few seconds before activity ceases.
> >>
> >> Has anybody experienced anything similar? Why does the network stack appear to
> >> be stuck and constantly dropping packets?
> >>
> >> Any feedback appreciated.
> >>
> >> Rgds,
> >> --  Matt
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.6.29 & network stack strangeness
  2009-06-05 16:37     ` Finn Thain
@ 2009-06-05 16:44       ` Matthew Lear
       [not found]         ` <4A2DC70F.7080401@freescale.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-05 16:44 UTC (permalink / raw)
  To: Finn Thain; +Cc: linux-m68k

Yes. I was suspecting that all may not be well in that area... Current set up is
a 10ms tick with CONFIG_HZ set to 100. Further investigation is required I think.
--  Matt

Finn Thain wrote:
> My only guess would be that the network stack delayed work queues depend 
> upon working timer interrupts...
> 
> But since I have no knowledge of your hardware, I don't think I'll be a 
> lot of help with that.
> 
> Finn
> 
> 
> On Fri, 5 Jun 2009, Matthew Lear wrote:
> 
>> Hi - thanks for your reply.
>>
>> The problem doesn't manifest only when the DHCP lease expires and I can still
>> reproduce the problem with a static IP. With or without DHCP makes no difference.
>>
>> It seems to effect socket comms quite seriously (and quickly). If I run a simple
>> server program on the host that listens on a socket and writes a response string
>> to the socket when it receives data, and on the target I run a simple client
>> program which writes a string to the socket, reads and prints the response sent
>> the server, I only have to send data from client to server with a delay of 1ms
>> between transmissions for a few seconds and the client program hangs on calling
>> read() on the socket fd.
>>
>> If I run a simple netcat test, eg
>>
>> on target: nc -l -p 3333 > /dev/null
>> on host: dd if=/dev/zero | nc <target-ip> 3333
>>
>> ...strangely, once activity on the ethernet link as a result of the netcat test
>> ceases, running netstat -a on the target hangs for several seconds, eg:
>>
>>
>> ~ # nc -l -p 3333 > /dev/null &
>> ~ # netstat -a
>> Active Internet connections (servers and established)
>> Proto Recv-Q Send-Q Local Address           Foreign Address         State
>> tcp        0      0 *:login                 *:*                     LISTEN
>> tcp        0      0 *:shell                 *:*                     LISTEN
>> tcp        0      0 *:sunrpc                *:*                     LISTEN
>> tcp        0      0 *:finger                *:*                     LISTEN
>> tcp        0      0 *:auth                  *:*                     LISTEN
>> tcp        0      0 *:ftp                   *:*                     LISTEN
>> tcp        0      0 *:telnet                *:*                     LISTEN
>>
>> <system hangs for several seconds here>
>>
>> tcp        0      0 192.168.0.11:3333       gateway0:45645
>> ESTABLISHED
>> udp        0      0 *:ntalk                 *:*
>> udp        0      0 *:sunrpc                *:*
>> Active UNIX domain sockets (servers and established)
>> Proto RefCnt Flags       Type       State         I-Node Path
>> unix  4      [ ]         DGRAM                    111    /dev/log
>> unix  3      [ ]         STREAM     CONNECTED     123
>> unix  3      [ ]         STREAM     CONNECTED     122
>> unix  2      [ ]         DGRAM                    120
>> unix  2      [ ]         DGRAM                    114
>> ~ #
>>
>> I thought this was interesting. Also, after this, I have trouble entering
>> characters over the serial port / console. It seems like interrupts may having
>> trouble getting serviced but this may be a side-effect...
>>
>> If you run the same netstat command with strace, you can see that the delay is
>> caused by polling the socket following calling send:
>>
>> ...
>> ...
>> gettimeofday({366, 470000}, NULL)       = 0
>> poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
>> send(4, "lJ\1\0\0\1\0\0\0\0\0\0\00211\0010\003168\003192\7in-ad"..., 43,
>> 0x4000) = 43
>> poll(
>>
>>
>> <delay is here>
>>
>>
>> [{fd=4, events=POLLIN}], 1, 5000)  = 0
>> ...
>> ...
>>
>> --  Matt
>>
>>
>> Finn Thain wrote:
>>> Does the problem manifest only when the DHCP lease expires?
>>> Can you reproduce the problem with a static IP?
>>>
>>> Finn
>>>
>>>
>>> On Fri, 5 Jun 2009, Matthew Lear wrote:
>>>
>>>> Hello all,
>>>>
>>>> I'm running a 2.6.29 kernel on an MMU enabled m68k coldfire mcf54455 platform
>>>> and I'm having some throughput problems when running network tests.
>>>>
>>>> The kernel boots and mounts its rootfs from flash (jffs2). udhcpc runs, obtains
>>>> a lease from the dhcp server and configures eth0. Network connectivity is ok. I
>>>> can ping the target from the host and vice versa.
>>>>
>>>> 1/
>>>> If I run ping -s 1500 -i 0.0001 <target ip address> on the host pc, after
>>>> several mins, the kernel reports 'unexpected interrupt from 24' which is the
>>>> vector for a spurious interrupt. This message will repeat randomly (from what I
>>>> saw it appeared ~ 20 times when running the ping test above for 40 mins). The
>>>> mcf54455 reference manual describes a possible cause for spurious interrupts.
>>>> However, this test very rarely reports any packet loss, although the max time to
>>>> receive a packet can be very large indeed.
>>>>
>>>> 2/
>>>> If I reboot, start again and run a ping flood test (ping -f) from host pc ->
>>>> target, all icmp requests are acknowledged - for a while. Before the target
>>>> begins to fail to respond to the icmp requests, running top shows that the
>>>> ksoftirq daemon is running at ~ 5% cpu load. This is normal as it is involved in
>>>> processing the deferred tasks of processing data fired up to the network stack.
>>>> So when the target beings to stop responding to icmp, if I then stop the ping
>>>> flood and try to ping the host from the target, there is no reply indicated by
>>>> ping. However, if you do this with a packet sniffer running (eg wireshark) you
>>>> can see that data is still being transmitted from the target -> host and you can
>>>> see the icmp reply, only the reply from the host appears to be received ok by
>>>> the fec driver but is processed by the network stack target.
>>>>
>>>> When in this state, a proc entry that I added to the fec driver shows that the
>>>> last return value from netif_rx() (called in the fec rx interrupt handling
>>>> routine) is 1, indicating that the last packet was dropped by the network stack,
>>>> e.g.
>>>>
>>>> ~ # cat /proc/driver/fec
>>>> total interrupts: 1421619
>>>> last interrupt type: 2 [1=tx, 2=rx, 3=mii]
>>>> total tx interrupts: 709148
>>>> total rx interrupts: 712472
>>>> total mii interrupts: 1
>>>> last interrupt event: 0x2000000
>>>> total eberr interrupts: 0
>>>> total hberr interrupts: 0
>>>> tx loop current count: 0
>>>> tx loop last count: 1
>>>> rx loop current count: 0
>>>> rx loop last count: 1
>>>> rx last cbd ctrl/status: 0x800
>>>> rx last cbd len: 346
>>>> rx last cbd buff addr: 0x40410000
>>>> rx last netif_rx status: 1
>>>>
>>>> Strangely, wireshark still shows data being transmitted from the target
>>>> -> host. I can see ARP requests and I can also see DHCP discovery packets being
>>>> sent by the target when its DHCP lease expires. This all looks ok, only the
>>>> reply from host -> target is never processed by the target as the network stack
>>>> is in a state where it is dropping all incoming data provided to it by the driver.
>>>>
>>>> I believe udhcpc utilises the network device directly, ie it does not require an
>>>> intermediate network protocol being implemented in the kernel (tcpdump is
>>>> similar).
>>>>
>>>> The fec driver still seems to be running ok because I can see the ring buffer
>>>> address changing when data is received. Everything seems to be ok apart from the
>>>> network stack. Very strange indeed.
>>>>
>>>> Running network throughput tests between host and target with netcat or netperf
>>>> only run for a few seconds before activity ceases.
>>>>
>>>> Has anybody experienced anything similar? Why does the network stack appear to
>>>> be stuck and constantly dropping packets?
>>>>
>>>> Any feedback appreciated.
>>>>
>>>> Rgds,
>>>> --  Matt
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <4A2DC70F.7080401@freescale.com>]

* Re: 2.6.29 & network stack strangeness
       [not found]         ` <4A2DC70F.7080401@freescale.com>
@ 2009-06-09 10:04           ` Matthew Lear
       [not found]             ` <4A2F1C74.1010800@freescale.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-09 10:04 UTC (permalink / raw)
  To: Lanttor; +Cc: linux-m68k


> Could you run "top" command and print out its information?
> "cat /proc/stat" && "cat /proc/interrupts" information are also
> appreciated.

Sure. Info below.

Before running netcat test:

top - 00:02:37 up 2 min,  1 user,  load average: 0.07, 0.07, 0.02
Tasks:  24 total,   1 running,  23 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  0.0%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.7%hi,  0.0%si, 
0.0%st
Mem:    254912k total,     6680k used,   248232k free,        0k buffers
Swap:        0k total,        0k used,        0k free,     3008k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  359 root      20   0  2856 1160  944 R  1.0  0.5   0:01.49 top
    3 root      15  -5     0    0    0 S  0.3  0.0   0:00.11 ksoftirqd/0
    1 root      20   0  3280  776  656 S  0.0  0.3   0:01.46 init
    2 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    5 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 events/0
    6 root      15  -5     0    0    0 S  0.0  0.0   0:00.06 khelper
   56 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/0
   62 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
   80 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pdflush
   81 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pdflush
   82 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kswapd0
   83 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
   84 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 nfsiod
  215 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 mtdblockd
  227 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 hid_compat
  230 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 rpciod/0


~ # cat /proc/stat
cpu  209 0 0 16892 0 757 0 0 0
cpu0 209 0 0 16892 0 757 0 0 0
intr 79221 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 61353 0 0 0 0 0 17858 0 0 5 2 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
ctxt 72511
btime 0
processes 360
procs_running 1
procs_blocked 0
~ #


~ # cat /proc/interrupts
M5445X    90:      62109 UART
M5445X    96:      19793 timer
M5445X    99:          6 timer
M5445X   100:          2 fec(TXF)
M5445X   101:          0 fec(TXB)
M5445X   102:          0 fec(TXFIFO)
M5445X   103:          0 fec(TXCR)
M5445X   104:          2 fec(RXF)
M5445X   105:          0 fec(RXB)
M5445X   106:          1 fec(MII)
M5445X   107:          0 fec(LC)
M5445X   108:          0 fec(HBERR)
M5445X   109:          0 fec(GRA)
M5445X   110:          0 fec(EBERR)
M5445X   111:          0 fec(BABT)
M5445X   112:          0 fec(BABR)
~ #


Run netcat test:

target: nc -l -p 3333 > /dev/null &
host: dd if=/dev/zero | nc <target-ip> 3333

Netcat test activity on ethernet ceases after only a few seconds.
Wireshark concurs.


top - 00:05:04 up 5 min,  1 user,  load average: 0.00, 0.04, 0.01
Tasks:  25 total,   1 running,  24 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  0.0%sy,  0.0%ni, 98.7%id,  0.0%wa,  0.7%hi,  0.0%si, 
0.0%st
Mem:    254912k total,     6904k used,   248008k free,        0k buffers
Swap:        0k total,        0k used,        0k free,     3008k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  363 root      20   0  2856 1160  944 R  1.0  0.5   0:00.27 top
  362 root      20   0  3280  688  576 S  0.3  0.3   0:00.05 nc
    1 root      20   0  3280  776  656 S  0.0  0.3   0:01.46 init
    2 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      15  -5     0    0    0 S  0.0  0.0   0:00.12 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    5 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 events/0
    6 root      15  -5     0    0    0 S  0.0  0.0   0:00.06 khelper
   56 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/0
   62 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
   80 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pdflush
   81 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pdflush
   82 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kswapd0
   83 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
   84 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 nfsiod
  215 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 mtdblockd
  227 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 hid_compat


~ # cat /proc/stat
cpu  228 0 0 31924 0 778 0 0 0
cpu0 228 0 0 31924 0 778 0 0 0
intr 106918 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 73732 0 0 0 0 0 32930 0 0 10 96 0 0 0 149 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 110292
btime 0
processes 364
procs_running 1
procs_blocked 0
~ #


~ # cat /proc/interrupts
M5445X    90:      74523 UART
M5445X    96:      34522 timer
M5445X    99:         10 timer
M5445X   100:        100 fec(TXF)
M5445X   101:          0 fec(TXB)
M5445X   102:          0 fec(TXFIFO)
M5445X   103:          0 fec(TXCR)
M5445X   104:        155 fec(RXF)
M5445X   105:          0 fec(RXB)
M5445X   106:          1 fec(MII)
M5445X   107:          0 fec(LC)
M5445X   108:          0 fec(HBERR)
M5445X   109:          0 fec(GRA)
M5445X   110:          0 fec(EBERR)
M5445X   111:          0 fec(BABT)
M5445X   112:          0 fec(BABR)
~ #


At this point, my proc entry in the fec driver does not indicate any problem:

~ # cat /proc/driver/fec
total interrupts: 284
last interrupt type: 2 [1=tx, 2=rx, 3=mii]
total tx interrupts: 111
total rx interrupts: 172
total mii interrupts: 1
last interrupt event: 0x2000000
total eberr interrupts: 0
total hberr interrupts: 0
tx loop current count: 0
tx loop last count: 1
rx loop current count: 0
rx loop last count: 1
rx last cbd ctrl/status: 0x800
rx last cbd len: 64
rx last cbd buff addr: 0x40407800
rx last netif_rx status: 0
~ #

When starting the netcat test, activity on the ethernet ceases after only
a few seconds. After stopping the dd/nc on the host, I cannot ping the
host from the target or vice versa and the rs-232 console running on the
target does not respond to every key press (usually every second key
press).

I instrumented some of the raw packet handling in the network stack to
attempt to glean a little more info about ping. Running ping from target
to host shows:

~ # ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
[  490.489407] ping:367 /mnt/nht/linux-2.6.29/net/core/datagram.c:99
wait_for_packet - interrupted
[  490.500812] ping:367 /mnt/nht/linux-2.6.29/net/core/datagram.c:110
wait_for_packet - timeo=2147483647, err=-512
[  490.513637] ping:367 /mnt/nht/linux-2.6.29/net/ipv4/raw.c:668
raw_recvmsg - skb_recv_datagram
[  490.524790] ping:367 /mnt/nht/linux-2.6.29/net/ipv4/raw.c:701
raw_recvmsg - returning err -512
[  490.536035] ping:367 /mnt/nht/linux-2.6.29/net/core/sock.c:1869
sock_common_recvmsg - recvmsg returned -512
[  490.548497] ping:367
/mnt/nht/linux-2.6.29/arch/m68k/coldfire/signal.c:804 ERESTART

Other tests which induce similar problems in the network stack include
using wget to pull a large file onto the target.

I can also force netif_rx() to eventually constantly return 1 (pkt
dropped) by using a flood ping at high frequency (see previous post). This
is a different scenario that netcat/wget but possible related to same
cause.

--  Matt


^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <4A2F1C74.1010800@freescale.com>]

* Re: 2.6.29 & network stack strangeness
       [not found]             ` <4A2F1C74.1010800@freescale.com>
@ 2009-06-10  9:36               ` Matthew Lear
       [not found]                 ` <4A2F8318.5080206@freescale.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Lear @ 2009-06-10  9:36 UTC (permalink / raw)
  To: Lanttor; +Cc: linux-m68k

Lanttor wrote:
> I met the interrupt issue before on mcf5445x platform, following is my fixed patch, but I am not sure it could help resolve your problem, just have a try  :-) 

Thanks but this makes no difference. From what I can tell, the patch to entry.S
is essentially benign because ret_from_interrupt() is not invoked anywhere.

Also, the patch to ints.c adds invocations of irq_enter() and irq_exit()
to process_int(). These macros really just force entry/exit to/from hard
interrupt context and increment/decrement a count (used for tracking
nested interrupts). Given that process_int() is called by inthandler() in
entry.S, I think this is always going to be in interrupt context anyway, so I
can't see the advantage of this. Maybe I'm missing something?

> 
> For your mcf54455 platform, if you use one ether net port, try to disable second ether net port (FEC2) on the menuconfig to see whether it would affect the net working. 

Yes. I've already got the second net port disabled.

I have had some level of success by calling flush_cache_all() immediately after
calling netif_rx() in the rx part of the fec interrupt handler - fec_enet_rx()
in fec.c.

This plus a hack to netif_rx() seems to vastly improve things in that I can wget
large files, run netcat tests although in the cast of netcat there is still a
high number of rx overruns in the fec driver.

Hack:

--- a/net/core/dev.c        2009-05-12 07:02:24.000000000 +0100
+++ b/net/core/dev.c       2009-06-10 10:26:47.000000000 +0100
@@ -1969,16 +1969,18 @@
                        __skb_queue_tail(&queue->input_pkt_queue, skb);
                        local_irq_restore(flags);
                        return NET_RX_SUCCESS;
                }

                napi_schedule(&queue->backlog);
                goto enqueue;
        }
+        else
+                napi_schedule(&queue->backlog);

        __get_cpu_var(netdev_rx_stat).dropped++;
        local_irq_restore(flags);

        kfree_skb(skb);
        return NET_RX_DROP;
 }

This effectively ignores the backlog check and schedules the receive function to
run.

I think all this points to is that there is a timing problem in the system. That
said, I can believe the cache coherency issue as I've read some postings on the
net about similar issues with ethernet tx on ppc...

--  Matt

^ permalink raw reply	[flat|nested] 8+ messages in thread

[parent not found: <4A2F8318.5080206@freescale.com>]

* Re: 2.6.29 & network stack strangeness
       [not found]                 ` <4A2F8318.5080206@freescale.com>
@ 2009-06-10 10:59                   ` Matthew Lear
  0 siblings, 0 replies; 8+ messages in thread
From: Matthew Lear @ 2009-06-10 10:59 UTC (permalink / raw)
  To: Lanttor; +Cc: linux-m68k

Lanttor wrote:
>> Thanks but this makes no difference. From what I can tell, the patch to entry.S
>> is essentially benign because ret_from_interrupt() is not invoked anywhere.
> 
> ret_from_interrupt() is the part of inthandler(), and it's called by inthandler() at end.
> 

Indeed it is. My apologies.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-06-10 10:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-05 15:15 2.6.29 & network stack strangeness Matthew Lear
2009-06-05 15:49 ` Finn Thain
2009-06-05 16:17   ` Matthew Lear
2009-06-05 16:37     ` Finn Thain
2009-06-05 16:44       ` Matthew Lear
     [not found]         ` <4A2DC70F.7080401@freescale.com>
2009-06-09 10:04           ` Matthew Lear
     [not found]             ` <4A2F1C74.1010800@freescale.com>
2009-06-10  9:36               ` Matthew Lear
     [not found]                 ` <4A2F8318.5080206@freescale.com>
2009-06-10 10:59                   ` Matthew Lear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox