Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT
       [not found] <3F964F9D.D5C69498@fy.chalmers.se>
@ 2003-10-23  9:02 ` Harald Welte
  2003-10-23  9:52   ` Andy Polyakov
  0 siblings, 1 reply; 9+ messages in thread
From: Harald Welte @ 2003-10-23  9:02 UTC (permalink / raw)
  To: Andy Polyakov; +Cc: coreteam, Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 1590 bytes --]

On Wed, Oct 22, 2003 at 11:36:29AM +0200, Andy Polyakov wrote:
> Hi,
> 
> This is a preliminary report:-)

ok, thanks.

> - VMware "host-only" network is NAT-ed by ipchains;

Just a quick unrelated question: Why would somebody be using ipchains on
a 2.6 kernel? 

> What the heck with rebooting guest and connecting to same URL? Guest OS
> will use very same port numbers at second boot and NAT layer will use
> same port translations, which appears as the triggering factor itself.
> 
> Reloading ipchains module in between guest OS boots makes it possible to
> avoid lock-ups/oopses.

As I have no idea about vmware: does it destroy the virtual interface on
the host at time of the reboot in your guest os?

What about using iptables?  Does it produce a similar behaviour?  I
think this is the first time within at least a year that we've had any
report of somebody using (or finding bugs) in the ipfwadm/ipchains
compat layer... it wasn't even frequently used with 2.4.x

> Question. Should I pursue the issue further?

yes, please.  Especially a means of reproduction without running
proprietary software (and thus being repruducable for me) would be very
helpful.

> Cheers. A.

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT
  2003-10-23  9:02 ` [netfilter-core] linux-2.6.0-testX ipchains oops in NAT Harald Welte
@ 2003-10-23  9:52   ` Andy Polyakov
  2003-10-23 10:57     ` Harald Welte
  2003-10-23 11:16     ` Andy Polyakov
  0 siblings, 2 replies; 9+ messages in thread
From: Andy Polyakov @ 2003-10-23  9:52 UTC (permalink / raw)
  To: Harald Welte; +Cc: coreteam, Netfilter Development Mailinglist

> Just a quick unrelated question: Why would somebody be using ipchains on
> a 2.6 kernel?

Well, I use it, because it was on my computer since eternity and I used
to it. It's hardly a "crime":-):-):-)

> > What the heck with rebooting guest and connecting to same URL? Guest OS
> > will use very same port numbers at second boot and NAT layer will use
> > same port translations, which appears as the triggering factor itself.
> >
> > Reloading ipchains module in between guest OS boots makes it possible to
> > avoid lock-ups/oopses.
> 
> As I have no idea about vmware: does it destroy the virtual interface on
> the host at time of the reboot in your guest os?

No, vmnetN interfaces are persistent and are taken up/down upon system
boot/shutdown. They are pretty much independent from the VMware
application, the one which arranges for communication between guest OS
and the vmnetN interface.

> What about using iptables?  Does it produce a similar behaviour?

I don't know. I'll check at some point, but not right away...

> I
> think this is the first time within at least a year that we've had any
> report of somebody using (or finding bugs) in the ipfwadm/ipchains
> compat layer... it wasn't even frequently used with 2.4.x

Well, then it probably should have disappeared from 2.6. I mean if the
code is there, then it should work and is actually expected to work.

> > Question. Should I pursue the issue further?
> 
> yes, please.  Especially a means of reproduction without running
> proprietary software (and thus being repruducable for me) would be very
> helpful.

Would eth0:1 be sufficient? A.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT
  2003-10-23  9:52   ` Andy Polyakov
@ 2003-10-23 10:57     ` Harald Welte
  2003-10-23 14:29       ` Andy Polyakov
  2003-10-23 11:16     ` Andy Polyakov
  1 sibling, 1 reply; 9+ messages in thread
From: Harald Welte @ 2003-10-23 10:57 UTC (permalink / raw)
  To: Andy Polyakov; +Cc: coreteam, Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 1785 bytes --]

On Thu, Oct 23, 2003 at 11:52:05AM +0200, Andy Polyakov wrote:
> > Just a quick unrelated question: Why would somebody be using ipchains on
> > a 2.6 kernel?
> 
> Well, I use it, because it was on my computer since eternity and I used
> to it. It's hardly a "crime":-):-):-)

No, it's not a crime - just a very strange thing to do (at least from my
experience).

> > What about using iptables?  Does it produce a similar behaviour?
> 
> I don't know. I'll check at some point, but not right away...

That would be helpful, since it would become a way more important bug if
it was in iptables.

> > think this is the first time within at least a year that we've had any
> > report of somebody using (or finding bugs) in the ipfwadm/ipchains
> > compat layer... it wasn't even frequently used with 2.4.x
> 
> Well, then it probably should have disappeared from 2.6. I mean if the
> code is there, then it should work and is actually expected to work.

Actually we were thinking about removal - and I am still tempted to do
so.  

> > yes, please.  Especially a means of reproduction without running
> > proprietary software (and thus being repruducable for me) would be very
> > helpful.
> 
> Would eth0:1 be sufficient? A.

Yes, it would.  Can you confirm the bug happens when you use ipchains,
an alias interface and reuse adresses/ports from a machine behind that
interface?

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT
  2003-10-23 10:57     ` Harald Welte
@ 2003-10-23 14:29       ` Andy Polyakov
  0 siblings, 0 replies; 9+ messages in thread
From: Andy Polyakov @ 2003-10-23 14:29 UTC (permalink / raw)
  To: Harald Welte; +Cc: coreteam, Netfilter Development Mailinglist

> > > What about using iptables?  Does it produce a similar behaviour?
> >
> > I don't know. I'll check at some point, but not right away...
> 
> That would be helpful, since it would become a way more important bug if
> it was in iptables.

The bug appears to be ipchains specific. Meaning that I can confirm that
my kernel does *not* crash [after VMware guest OS reboot] if I use
iptables to implement equivalent setup. A.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT
  2003-10-23  9:52   ` Andy Polyakov
  2003-10-23 10:57     ` Harald Welte
@ 2003-10-23 11:16     ` Andy Polyakov
  2003-10-23 11:26       ` Andy Polyakov
  2003-10-26  6:19       ` Rusty Russell
  1 sibling, 2 replies; 9+ messages in thread
From: Andy Polyakov @ 2003-10-23 11:16 UTC (permalink / raw)
  To: Harald Welte, coreteam, Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

> > > Question. Should I pursue the issue further?
> >
> > yes, please.  Especially a means of reproduction without running
> > proprietary software (and thus being repruducable for me) would be very
> > helpful.
> 
> Would eth0:1 be sufficient?

It's perfectly reproducible with eth0:1. In other words I

- take up eth0:1 with private ip address, e.g. 192.168.60.1 on computer
running 2.6 with 'ipchains -A forward -s 192.168.0.0/255.255.0.0 -d
0.0.0.0/0.0.0.0 -j MASQ';
- on another computer take up eth0:1 with e.g. 192.168.60.2 and 'route
add host some.host 192.168.60.1';
- on that other computer run attached script as './conn.pl some.host 80
2345';
- wait till port translation expires at first computer;
- run attached script as './conn.pl some.host 80 2345' once again;
- collect attached console.dump;

A.

[-- Attachment #2: conn.pl --]
[-- Type: application/x-perl, Size: 598 bytes --]

[-- Attachment #3: console.dump --]
[-- Type: text/plain, Size: 1824 bytes --]

Unable to handle kernel paging request at virtual address 00100108
 printing eip:
e08787e1
*pde = 00000000
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<e08787e1>]    Tainted: PF 
EFLAGS: 00013203
EIP is at find_appropriate_src+0x3d/0xa0 [ipchains]
eax: e084dcf0   ebx: 00100100   ecx: dd3cdd44   edx: 0000059e
esi: 00000000   edi: dd3cdcd8   ebp: dd3cdc50   esp: dd3cdc40
ds: 007b   es: 007b   ss: 0068
Process X (pid: 1257, threadinfo=dd3cc000 task=dd3b5940)
Stack: dd3cdce8 dd3cdd08 dd3cdcd8 0000059e dd3cdc90 e0878a83 dd3cdcd8 dd3cdd44 
       dd3cdce8 c80e4ea4 dd3cdd08 e08817e0 dd3cdcd8 c80e4e2c c80e4e2c dd3cdc9c 
       e0876099 dd3cdcd8 c80e4e2c e0881580 dd3cdd18 e0878c2d dd3cdd08 dd3cdcd8 
Call Trace:
 [<e0878a83>] get_unique_tuple+0x33/0x190 [ipchains]
 [<e0876099>] invert_tuplepr+0x1d/0x28 [ipchains]
 [<e0878c2d>] ip_nat_setup_info+0x4d/0x2a0 [ipchains]
 [<e0875ff3>] ip_conntrack_in+0x18f/0x218 [ipchains]
 [<c02300c3>] __ip_route_output_key+0x23/0xe4
 [<e08780b8>] gcc2_compiled.+0x168/0x1f0 [ipchains]
 [<e0877735>] fw_in+0x1f9/0x228 [ipchains]
 [<c0229f70>] nf_iterate+0x44/0xa4
 [<c0232c44>] ip_forward_finish+0x0/0x4c
 [<c022a30a>] nf_hook_slow+0x8e/0x124
 [<c0232c44>] ip_forward_finish+0x0/0x4c
 [<c0232bfc>] ip_forward+0x1ec/0x234
 [<c0232c44>] ip_forward_finish+0x0/0x4c
 [<c0231c89>] ip_rcv_finish+0x1bd/0x204
 [<c022a348>] nf_hook_slow+0xcc/0x124
 [<c02318da>] ip_rcv+0x3ae/0x3f0
 [<c0231acc>] ip_rcv_finish+0x0/0x204
 [<c0220fe0>] netif_receive_skb+0x13c/0x18c
 [<c022109f>] process_backlog+0x6f/0x100
 [<c02211a2>] net_rx_action+0x72/0x11c
 [<c01202be>] do_softirq+0x4e/0xa0
 [<c010d251>] do_IRQ+0x115/0x130
 [<c010ba08>] common_interrupt+0x18/0x20

Code: 8b 53 08 0f b7 47 0e 31 f6 66 39 42 1e 75 2e 8b 07 39 42 10 
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT
  2003-10-23 11:16     ` Andy Polyakov
@ 2003-10-23 11:26       ` Andy Polyakov
  2003-10-26  6:19       ` Rusty Russell
  1 sibling, 0 replies; 9+ messages in thread
From: Andy Polyakov @ 2003-10-23 11:26 UTC (permalink / raw)
  To: Harald Welte, coreteam, Netfilter Development Mailinglist

> - collect attached console.dump;
> Unable to handle kernel paging request at virtual address 00100108
> EIP is at find_appropriate_src+0x3d/0xa0 [ipchains]

Just as in my original report. It was TCP translation which has expired,
therefore +0x3d, 0x100100 is address "i" itself from
"i->conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum ==
tuple->dst.protonum" line in src_cmp. A.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT
  2003-10-23 11:16     ` Andy Polyakov
  2003-10-23 11:26       ` Andy Polyakov
@ 2003-10-26  6:19       ` Rusty Russell
  2003-10-26 13:31         ` Andy Polyakov
  2003-10-27  7:58         ` David S. Miller
  1 sibling, 2 replies; 9+ messages in thread
From: Rusty Russell @ 2003-10-26  6:19 UTC (permalink / raw)
  To: Andy Polyakov
  Cc: Harald Welte, coreteam, Netfilter Development Mailinglist, davem

In message <3F97B874.CB12C184@fy.chalmers.se> you write:
> It's perfectly reproducible with eth0:1. In other words I

Thanks for the excellent help Andy!

Found it by inspection from Andy's description.

We updated ip_nat_setup_info to set the initialized flag and call
place_in_hashes, but *didn't* change the call in ip_fw_compat_masq.c
which also calls place_in_hashes() itself (again!).  Result: corrupt
list, and next thing which lands in the same hash bucket goes boom.

This should fix it.
Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

Name: ipchains/ipfwadm compat changes for new ip_nat_setup_info
Author: Rusty Russell
Status: Experimental

D: We updated ip_nat_setup_info to set the initialized flag and call
D: place_in_hashes, but *didn't* change the call in ip_fw_compat_masq.c
D: which also calls place_in_hashes() itself (again!).  Result: corrupt
D: list, and next thing which lands in the same hash bucket goes boom.
D: 
D: Thanks to Andy Polyakov for chasing this down.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .17896-linux-2.6.0-test9/net/ipv4/netfilter/ip_fw_compat_masq.c .17896-linux-2.6.0-test9.updated/net/ipv4/netfilter/ip_fw_compat_masq.c
--- .17896-linux-2.6.0-test9/net/ipv4/netfilter/ip_fw_compat_masq.c	2003-09-22 10:28:14.000000000 +1000
+++ .17896-linux-2.6.0-test9.updated/net/ipv4/netfilter/ip_fw_compat_masq.c	2003-10-26 17:17:30.000000000 +1100
@@ -91,9 +91,6 @@ do_masquerade(struct sk_buff **pskb, con
 			WRITE_UNLOCK(&ip_nat_lock);
 			return ret;
 		}
-
-		place_in_hashes(ct, info);
-		info->initialized = 1;
 	} else
 		DEBUGP("Masquerading already done on this conn.\n");
 	WRITE_UNLOCK(&ip_nat_lock);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT
  2003-10-26  6:19       ` Rusty Russell
@ 2003-10-26 13:31         ` Andy Polyakov
  2003-10-27  7:58         ` David S. Miller
  1 sibling, 0 replies; 9+ messages in thread
From: Andy Polyakov @ 2003-10-26 13:31 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Harald Welte, coreteam, Netfilter Development Mailinglist, davem

> We updated ip_nat_setup_info to set the initialized flag and call
> place_in_hashes, but *didn't* change the call in ip_fw_compat_masq.c
> which also calls place_in_hashes() itself (again!).  Result: corrupt
> list, and next thing which lands in the same hash bucket goes boom.
> 
> This should fix it.

I can confirm that the proposed patch does resolve the problem. Thank
you. A.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [netfilter-core] linux-2.6.0-testX ipchains oops in NAT
  2003-10-26  6:19       ` Rusty Russell
  2003-10-26 13:31         ` Andy Polyakov
@ 2003-10-27  7:58         ` David S. Miller
  1 sibling, 0 replies; 9+ messages in thread
From: David S. Miller @ 2003-10-27  7:58 UTC (permalink / raw)
  To: Rusty Russell; +Cc: appro, laforge, coreteam, netfilter-devel

On Sun, 26 Oct 2003 17:19:38 +1100
Rusty Russell <rusty@rustcorp.com.au> wrote:

> We updated ip_nat_setup_info to set the initialized flag and call
> place_in_hashes, but *didn't* change the call in ip_fw_compat_masq.c
> which also calls place_in_hashes() itself (again!).  Result: corrupt
> list, and next thing which lands in the same hash bucket goes boom.
> 
> This should fix it.

Applied, thanks a lot Rusty.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-10-27  7:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3F964F9D.D5C69498@fy.chalmers.se>
2003-10-23  9:02 ` [netfilter-core] linux-2.6.0-testX ipchains oops in NAT Harald Welte
2003-10-23  9:52   ` Andy Polyakov
2003-10-23 10:57     ` Harald Welte
2003-10-23 14:29       ` Andy Polyakov
2003-10-23 11:16     ` Andy Polyakov
2003-10-23 11:26       ` Andy Polyakov
2003-10-26  6:19       ` Rusty Russell
2003-10-26 13:31         ` Andy Polyakov
2003-10-27  7:58         ` David S. Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.