* 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine
@ 2008-04-27 23:14 Russell King
2008-04-27 23:17 ` Russell King
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Russell King @ 2008-04-27 23:14 UTC (permalink / raw)
To: netdev
Hi,
I've upgraded lists.arm.linux.org.uk to 2.6.25, and I'm now seeing some
very weird networking behaviour from the machine which seems to only
affect IPv4 - including ICMP and NFS(tcp).
tcpdump is available (all 4MB worth):
http://www.home.arm.linux.org.uk/~rmk/ping.capture
Machines involved:
dyn-67 - x86 box 2.6.20-1.2320.fc5
(192.168.0.67 / 2002:4e20:1eda:1:201:80ff:fe4b:1778)
n2100 - ARM box 2.6.24
(78.32.30.221, has ipv6 as well)
lists - ARM box 2.6.25
(78.32.30.220 / 2002:4e20:1eda:1:201:3dff:fe00:0156)
The dump shows three 8200 byte pings running - one IPv4 on n2100 against
lists, one IPv4 on dyn-67 against lists, and one IPv6 on dyn-67 against
lists.
The tcpdump was running on lists itself.
Everything looks fine until around packet 1688, where n2100 sends an
echo request to lists, which doesn't get a reply. 300ms later, dyn-67
sends an echo request to lists, which also coincidentally doesn't get
a reply. Note, however, how the IPv6 pings continue.
The stats for the pings upon their termination are:
rmk@dyn-67:[~]:<1005> ping6 -s 8192 lists
PING lists(lists.arm.linux.org.uk) 8192 data bytes
--- lists ping statistics ---
101 packets transmitted, 101 received, 0% packet loss, time 99990ms
rtt min/avg/max/mdev = 4.132/4.488/26.585/2.374 ms, pipe 2
rmk@dyn-67:[~]:<1051> ping -s 8192 lists
PING lists.arm.linux.org.uk (78.32.30.220) 8192(8220) bytes of data.
--- lists.arm.linux.org.uk ping statistics ---
101 packets transmitted, 54 received, 46% packet loss, time 99993ms
rtt min/avg/max/mdev = 4.139/6.027/35.274/6.405 ms
root@n2100:~# ping -s 8192 lists
PING lists.arm.linux.org.uk (78.32.30.220) 8192(8220) bytes of data.
--- lists.arm.linux.org.uk ping statistics ---
101 packets transmitted, 55 received, 45% packet loss, time 100020ms
rtt min/avg/max/mdev = 4.404/4.610/13.235/1.175 ms
Lastly, in /proc/net/snmp on lists, I find:
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
Ip: 2 64 12771 0 0 0 0 0 5159 6262 9 0 2 8172 1363 2 922 0 5520
Note - InReceives = 12771, but InDelivers = 5159 - so roughly 50% of
IPv4 packets were received but not delivered, which appears to tie up
with the ping statistics.
Not sure what to make of this at the moment. Any ideas?
--
Russell King
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine
2008-04-27 23:14 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine Russell King
@ 2008-04-27 23:17 ` Russell King
2008-04-27 23:26 ` David Miller
2008-04-28 7:02 ` Pavel Emelyanov
2 siblings, 0 replies; 8+ messages in thread
From: Russell King @ 2008-04-27 23:17 UTC (permalink / raw)
To: netdev
Forgot the config file for the problem kernel...
http://www.home.arm.linux.org.uk/~rmk/bast-config-2.6.25
--
Russell King
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine
2008-04-27 23:14 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine Russell King
2008-04-27 23:17 ` Russell King
@ 2008-04-27 23:26 ` David Miller
2008-04-28 7:02 ` Pavel Emelyanov
2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2008-04-27 23:26 UTC (permalink / raw)
To: rmk; +Cc: netdev, xemul
From: Russell King <rmk@arm.linux.org.uk>
Date: Mon, 28 Apr 2008 00:14:11 +0100
> Lastly, in /proc/net/snmp on lists, I find:
>
> Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
> Ip: 2 64 12771 0 0 0 0 0 5159 6262 9 0 2 8172 1363 2 922 0 5520
>
> Note - InReceives = 12771, but InDelivers = 5159 - so roughly 50% of
> IPv4 packets were received but not delivered, which appears to tie up
> with the ping statistics.
>
> Not sure what to make of this at the moment. Any ideas?
The ReasmTimeout and ReasmFails look interesting. Maybe it was the
namespace bits?
Pavel, could you take a quick look?
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine
2008-04-27 23:14 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine Russell King
2008-04-27 23:17 ` Russell King
2008-04-27 23:26 ` David Miller
@ 2008-04-28 7:02 ` Pavel Emelyanov
2008-04-28 9:31 ` Russell King
2 siblings, 1 reply; 8+ messages in thread
From: Pavel Emelyanov @ 2008-04-28 7:02 UTC (permalink / raw)
To: Russell King; +Cc: netdev
> Lastly, in /proc/net/snmp on lists, I find:
>
> Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
> Ip: 2 64 12771 0 0 0 0 0 5159 6262 9 0 2 8172 1363 2 922 0 5520
Can you please also show the /proc/net/netstat contents - I'm interested
in IpExt statistics.
Thanks,
Pavel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine
2008-04-28 7:02 ` Pavel Emelyanov
@ 2008-04-28 9:31 ` Russell King
2008-04-28 10:18 ` Russell King
0 siblings, 1 reply; 8+ messages in thread
From: Russell King @ 2008-04-28 9:31 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: netdev
On Mon, Apr 28, 2008 at 11:02:29AM +0400, Pavel Emelyanov wrote:
> > Lastly, in /proc/net/snmp on lists, I find:
> >
> > Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
> > Ip: 2 64 12771 0 0 0 0 0 5159 6262 9 0 2 8172 1363 2 922 0 5520
>
> Can you please also show the /proc/net/netstat contents - I'm interested
> in IpExt statistics.
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts
IpExt: 0 0 0 0 0 0
I suspect that you were expecting these to be non-zero.
I've just added some debug printks into ip_input.c, and I don't think the
IP stack itself is at fault (if it were, you'd be flooded with reports.)
int ip_local_deliver(struct sk_buff *skb)
{
...
if (ip_hdr(skb)->saddr == htonl(0xc0a80043) &&
ip_hdr(skb)->protocol == IPPROTO_ICMP) printk("ping 2\n");
return NF_HOOK(PF_INET, NF_INET_LOCAL_IN, skb, skb->dev, NULL,
ip_local_deliver_finish);
}
static int ip_local_deliver_finish(struct sk_buff *skb)
{
__skb_pull(skb, ip_hdrlen(skb));
/* Point into the IP datagram, just past the header. */
skb_reset_transport_header(skb);
if (ip_hdr(skb)->saddr == htonl(0xc0a80043) &&
ip_hdr(skb)->protocol == IPPROTO_ICMP) printk("ping 3\n");
When the machine stops responding to pings, I see in the kernel message
log 'ping 2' but no 'ping 3' (whereas I get both when it does respond.)
I don't have the iptables binary installed, so there aren't any rules.
(Also, the iptables_filter module isn't loaded.)
I'll see if I can track the packet's progress through the netfilter code
today.
--
Russell King
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine
2008-04-28 9:31 ` Russell King
@ 2008-04-28 10:18 ` Russell King
2008-04-28 10:30 ` David Miller
0 siblings, 1 reply; 8+ messages in thread
From: Russell King @ 2008-04-28 10:18 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: netdev, netfilter
On Mon, Apr 28, 2008 at 10:31:30AM +0100, Russell King wrote:
> int ip_local_deliver(struct sk_buff *skb)
> {
> ...
> if (ip_hdr(skb)->saddr == htonl(0xc0a80043) &&
> ip_hdr(skb)->protocol == IPPROTO_ICMP) printk("ping 2\n");
> return NF_HOOK(PF_INET, NF_INET_LOCAL_IN, skb, skb->dev, NULL,
> ip_local_deliver_finish);
> }
>
> static int ip_local_deliver_finish(struct sk_buff *skb)
> {
> __skb_pull(skb, ip_hdrlen(skb));
>
> /* Point into the IP datagram, just past the header. */
> skb_reset_transport_header(skb);
>
> if (ip_hdr(skb)->saddr == htonl(0xc0a80043) &&
> ip_hdr(skb)->protocol == IPPROTO_ICMP) printk("ping 3\n");
>
> When the machine stops responding to pings, I see in the kernel message
> log 'ping 2' but no 'ping 3' (whereas I get both when it does respond.)
>
> I don't have the iptables binary installed, so there aren't any rules.
> (Also, the iptables_filter module isn't loaded.)
(Adding netfilter mailing list. See http://marc.info/?t=120933809600001&r=1&w=2
for the initial problem description.)
Further to this, it's looking like there's a nf_conntrack issue. Having
placed similar printks in the netfilter code, I see the ipv4_confirm()
hook normally returning 1 (NF_ACCEPT), but then decides to return 0
(NF_DROP) and no ping replies.
-bash-3.1# cat /proc/net/stat/ip_conntrack
entries searched found new invalid ignore delete delete_list insert insert_failed drop early_drop icmp_error expect_new expect_create expect_delete
00000110 000000e2 000001c6 000003bb 00000140 00000000 000002ab 0000023a 0000034a 0000005f 00000000 00000000 0000000f 00000000 00000000 00000000
insert_failed increments when there aren't any ping replies.
The other interesting thing (though I'm not sure if it's really
related or helps) is:
-bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack
ipv4 2 icmp 1 29 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=19 bytes=156180 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1
-bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack
ipv4 2 icmp 1 29 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=21 bytes=172620 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1
-bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack
ipv4 2 icmp 1 29 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=22 bytes=180840 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1
-bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack
ipv4 2 icmp 1 29 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=23 bytes=189060 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1
-bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack
ipv4 2 icmp 1 29 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=24 bytes=197280 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1
-bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack
ipv4 2 icmp 1 28 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=26 bytes=213720 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1
-bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack
ipv4 2 icmp 1 25 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=26 bytes=213720 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1
-bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack
ipv4 2 icmp 1 24 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=26 bytes=213720 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1
-bash-3.1# grep 'ipv4.*icmp.*192.168.0.67' /proc/net/nf_conntrack
ipv4 2 icmp 1 23 src=192.168.0.67 dst=78.32.30.220 type=8 code=0 id=53823 packets=26 bytes=213720 [UNREPLIED] src=78.32.30.220 dst=192.168.0.67 type=0 code=0 id=53823 packets=0 bytes=0 mark=0 use=1
Note how the conntrack entry stays as "unreplied" and the packet and
byte counters stop incrementing with each ping packet sent. Maybe
something's missing from the local IP output path to confirm the
entry?
--
Russell King
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine
2008-04-28 10:18 ` Russell King
@ 2008-04-28 10:30 ` David Miller
2008-04-28 12:00 ` Russell King
0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2008-04-28 10:30 UTC (permalink / raw)
To: rmk; +Cc: xemul, netdev, netfilter
From: Russell King <rmk@arm.linux.org.uk>
Date: Mon, 28 Apr 2008 11:18:35 +0100
> Further to this, it's looking like there's a nf_conntrack issue. Having
> placed similar printks in the netfilter code, I see the ipv4_confirm()
> hook normally returning 1 (NF_ACCEPT), but then decides to return 0
> (NF_DROP) and no ping replies.
There's already been a report about specific hashing problems with
conntrack on ARM. It has something to do with how structures are
padding on ARM combined with the following patch made by Patrick:
commit 0794935e21a18e7c171b604c31219b60ad9749a9
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Jan 31 04:40:52 2008 -0800
[NETFILTER]: nf_conntrack: optimize hash_conntrack()
Avoid calling jhash three times and hash the entire tuple in one go.
__hash_conntrack | -485 # 760 -> 275, # inlines: 3 -> 1, size inlines: 717 -> 252
1 function changed, 485 bytes removed
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index ce4c4ba..4a2cce1 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -73,15 +73,19 @@ static unsigned int nf_conntrack_hash_rnd;
static u_int32_t __hash_conntrack(const struct nf_conntrack_tuple *tuple,
unsigned int size, unsigned int rnd)
{
- unsigned int a, b;
+ unsigned int n;
+ u_int32_t h;
- a = jhash2(tuple->src.u3.all, ARRAY_SIZE(tuple->src.u3.all),
- (tuple->src.l3num << 16) | tuple->dst.protonum);
- b = jhash2(tuple->dst.u3.all, ARRAY_SIZE(tuple->dst.u3.all),
- ((__force __u16)tuple->src.u.all << 16) |
- (__force __u16)tuple->dst.u.all);
+ /* The direction must be ignored, so we hash everything up to the
+ * destination ports (which is a multiple of 4) and treat the last
+ * three bytes manually.
+ */
+ n = (sizeof(tuple->src) + sizeof(tuple->dst.u3)) / sizeof(u32);
+ h = jhash2((u32 *)tuple, n,
+ rnd ^ (((__force __u16)tuple->dst.u.all << 16) |
+ tuple->dst.protonum));
- return ((u64)jhash_2words(a, b, rnd) * size) >> 32;
+ return ((u64)h * size) >> 32;
}
static inline u_int32_t hash_conntrack(const struct nf_conntrack_tuple *tuple)
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine
2008-04-28 10:30 ` David Miller
@ 2008-04-28 12:00 ` Russell King
0 siblings, 0 replies; 8+ messages in thread
From: Russell King @ 2008-04-28 12:00 UTC (permalink / raw)
To: David Miller; +Cc: xemul, netdev, netfilter
On Mon, Apr 28, 2008 at 03:30:22AM -0700, David Miller wrote:
> From: Russell King <rmk@arm.linux.org.uk>
> Date: Mon, 28 Apr 2008 11:18:35 +0100
>
> > Further to this, it's looking like there's a nf_conntrack issue. Having
> > placed similar printks in the netfilter code, I see the ipv4_confirm()
> > hook normally returning 1 (NF_ACCEPT), but then decides to return 0
> > (NF_DROP) and no ping replies.
>
> There's already been a report about specific hashing problems with
> conntrack on ARM. It has something to do with how structures are
> padding on ARM combined with the following patch made by Patrick:
>
> commit 0794935e21a18e7c171b604c31219b60ad9749a9
> Author: Patrick McHardy <kaber@trash.net>
> Date: Thu Jan 31 04:40:52 2008 -0800
Yup, reverting that appears to fix the problem. Looking at the
structure, it will contain two bytes of padding in the 'u' union
and another two bytes in the 'dst' structure.
I suspect there'll be objections to packing the structure, in which
case what's the permanent fix?
--
Russell King
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-04-28 12:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-27 23:14 2.6.25: Weird IPv4 stack behaviour, IPv6 is fine Russell King
2008-04-27 23:17 ` Russell King
2008-04-27 23:26 ` David Miller
2008-04-28 7:02 ` Pavel Emelyanov
2008-04-28 9:31 ` Russell King
2008-04-28 10:18 ` Russell King
2008-04-28 10:30 ` David Miller
2008-04-28 12:00 ` Russell King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).