nfnetlink/ctnetlink from pom-ng r3884

All of lore.kernel.org
 help / color / mirror / Atom feed

* nfnetlink/ctnetlink from pom-ng r3884
@ 2005-04-19 13:37 Wang Jian
  2005-04-20  0:55 ` Pablo Neira
  2005-04-20 13:41 ` Amin Azez
  0 siblings, 2 replies; 24+ messages in thread
From: Wang Jian @ 2005-04-19 13:37 UTC (permalink / raw)
  To: netfilter-devel

Hi,

I now use conntrack/ + libctnetlink/ + libnfnetlink/ + nfnetlink/ +
ctnetlink/ + conntrack-event-api/, all are from r3884. I also enable
CT_ACCT.

First issue is this kind of duplicate in event message

type: [NEW] src=192.168.0.27 dst=192.168.0.254 sport=22 dport=2846 src=192.168.0.254 dst=192.168.0.27 sport=2846 dport=22 status:8 timeout:432000 tcp 6 orig_packets=1 orig_bytes=0, reply_packets=268 reply_bytes=0 
type: [UPDATE] src=192.168.0.27 dst=192.168.0.254 sport=22 dport=2846 src=192.168.0.254 dst=192.168.0.27 sport=2846 dport=22 status:10 timeout:432000 orig_packets=1 orig_bytes=0, reply_packets=268 reply_bytes=0 
orig_packets=1 orig_bytes=0, reply_packets=268 reply_bytes=0 

See the second event message. account information is printed twice, that
means the netlink message has duplicated account information.


And

# ./conntrack -E conntrack
...
type: [UPDATE] src=192.168.0.254 dst=192.168.0.27 sport=4347 dport=22 src=192.168.0.27 dst=192.168.0.254 sport=22 dport=4347 timeout:120 tcp 6 orig_packets=5 orig_bytes=0, reply_packets=270 reply_bytes=0 
type: [DESTROY] src=192.168.0.254 dst=192.168.0.27 sport=4347 dport=22 src=192.168.0.27 dst=192.168.0.254 sport=22 dport=4347 orig_packets=5 orig_bytes=0, reply_packets=270 reply_bytes=0 
Segmentation fault (core dumped)

core dumps again. This time the backtrace is

[root@qos conntrack]# gdb conntrack core.31335 
Loaded symbols for extensions/libct_proto_tcp.so
#0  0x0804a258 in event_handler (sock=0xbffff700, nlh=0xbfffd760, 
    arg=0xbffff760) at src/libct.c:186
186                     while (NFA_OK(attr, attrlen)) {
(gdb) bt
#0  0x0804a258 in event_handler (sock=0xbffff700, nlh=0xbfffd760, 
    arg=0xbffff760) at src/libct.c:186
#1  0x0804b03c in list_conntrack_handler ()
#2  0x0804bceb in nfnl_listen ()
#3  0x0804b22d in ctnl_event_conntrack ()
#4  0x0804ac02 in event_conntrack () at src/libct.c:451
#5  0x08049d51 in main (argc=3, argv=0xbffff8f4) at src/conntrack.c:473
(gdb) print attr
$1 = (struct nfattr *) 0xc00063c4
(gdb) print attrlen
$2 = 1525387413

Pablo, if you need more information, just tell me.


-- 
  lark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nfnetlink/ctnetlink from pom-ng r3884
  2005-04-19 13:37 nfnetlink/ctnetlink from pom-ng r3884 Wang Jian
@ 2005-04-20  0:55 ` Pablo Neira
  2005-04-21  8:21   ` Wang Jian
  2005-04-20 13:41 ` Amin Azez
  1 sibling, 1 reply; 24+ messages in thread
From: Pablo Neira @ 2005-04-20  0:55 UTC (permalink / raw)
  To: Wang Jian; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1284 bytes --]

Wang Jian wrote:
> type: [NEW] src=192.168.0.27 dst=192.168.0.254 sport=22 dport=2846 src=192.168.0.254 dst=192.168.0.27 sport=2846 dport=22 status:8 timeout:432000 tcp 6 orig_packets=1 orig_bytes=0, reply_packets=268 reply_bytes=0 
> type: [UPDATE] src=192.168.0.27 dst=192.168.0.254 sport=22 dport=2846 src=192.168.0.254 dst=192.168.0.27 sport=2846 dport=22 status:10 timeout:432000 orig_packets=1 orig_bytes=0, reply_packets=268 reply_bytes=0 
> orig_packets=1 orig_bytes=0, reply_packets=268 reply_bytes=0 
> 
> See the second event message. account information is printed twice, that
> means the netlink message has duplicated account information.

no, status flags has changed. The seen_reply bit has been set.

> And
> 
> # ./conntrack -E conntrack
> ...
> type: [UPDATE] src=192.168.0.254 dst=192.168.0.27 sport=4347 dport=22 src=192.168.0.27 dst=192.168.0.254 sport=22 dport=4347 timeout:120 tcp 6 orig_packets=5 orig_bytes=0, reply_packets=270 reply_bytes=0 
> type: [DESTROY] src=192.168.0.254 dst=192.168.0.27 sport=4347 dport=22 src=192.168.0.27 dst=192.168.0.254 sport=22 dport=4347 orig_packets=5 orig_bytes=0, reply_packets=270 reply_bytes=0 
> Segmentation fault (core dumped)
> 
> core dumps again. This time the backtrace is

The patch attached fixes it.

--
Pablo

[-- Attachment #2: x --]
[-- Type: text/plain, Size: 383 bytes --]

Index: src/libct.c
===================================================================
--- src/libct.c	(revision 3881)
+++ src/libct.c	(working copy)
@@ -239,7 +239,7 @@
 			attr = NFA_NEXT(attr, attrlen);
 		}
 		min_len += nlh->nlmsg_len;
-		nlh = (struct nlmsghdr *) attr;
+		nlh = (struct nlmsghdr *) (nlh + nlh->nlmsg_len);
 		printf("\n");
 	}
 	DEBUGP("exit from handler\n");

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nfnetlink/ctnetlink from pom-ng r3884
  2005-04-20  0:55 ` Pablo Neira
@ 2005-04-21  8:21   ` Wang Jian
  2005-04-21 11:05     ` Pablo Neira
  0 siblings, 1 reply; 24+ messages in thread
From: Wang Jian @ 2005-04-21  8:21 UTC (permalink / raw)
  To: Pablo Neira; +Cc: netfilter-devel

Hi Pablo Neira,

There is same code in expect_handler(). Should that place be fixed too?

On Wed, 20 Apr 2005 02:55:49 +0200, Pablo Neira <pablo@eurodev.net> wrote:

> Index: src/libct.c
> ===================================================================
> --- src/libct.c	(revision 3881)
> +++ src/libct.c	(working copy)
> @@ -239,7 +239,7 @@
>  			attr = NFA_NEXT(attr, attrlen);
>  		}
>  		min_len += nlh->nlmsg_len;
> -		nlh = (struct nlmsghdr *) attr;
> +		nlh = (struct nlmsghdr *) (nlh + nlh->nlmsg_len);
>  		printf("\n");
>  	}
>  	DEBUGP("exit from handler\n");



-- 
  lark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nfnetlink/ctnetlink from pom-ng r3884
  2005-04-21  8:21   ` Wang Jian
@ 2005-04-21 11:05     ` Pablo Neira
  2005-04-21 11:29       ` Wang Jian
  0 siblings, 1 reply; 24+ messages in thread
From: Pablo Neira @ 2005-04-21 11:05 UTC (permalink / raw)
  To: Wang Jian; +Cc: netfilter-devel

Wang Jian wrote:
> Hi Pablo Neira,
> 
> There is same code in expect_handler(). Should that place be fixed too?

I'll add this to my queue of pending patches. Thanks!

--
Pablo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nfnetlink/ctnetlink from pom-ng r3884
  2005-04-21 11:05     ` Pablo Neira
@ 2005-04-21 11:29       ` Wang Jian
  0 siblings, 0 replies; 24+ messages in thread
From: Wang Jian @ 2005-04-21 11:29 UTC (permalink / raw)
  To: Pablo Neira; +Cc: netfilter-devel

Hi Pablo Neira,

BTW, are there any documents on netlink message format and processing? The
function names of netlink layer are too abstract for me, so I am lost
now and then when reading the code.

On Thu, 21 Apr 2005 13:05:33 +0200, Pablo Neira <pablo@eurodev.net> wrote:

> Wang Jian wrote:
> > Hi Pablo Neira,
> > 
> > There is same code in expect_handler(). Should that place be fixed too?
> 
> I'll add this to my queue of pending patches. Thanks!
> 
> --
> Pablo



-- 
  lark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nfnetlink/ctnetlink from pom-ng r3884
  2005-04-19 13:37 nfnetlink/ctnetlink from pom-ng r3884 Wang Jian
  2005-04-20  0:55 ` Pablo Neira
@ 2005-04-20 13:41 ` Amin Azez
  2005-04-20 14:17   ` Samuel Liddicott
  2005-04-20 22:44   ` Pablo Neira
  1 sibling, 2 replies; 24+ messages in thread
From: Amin Azez @ 2005-04-20 13:41 UTC (permalink / raw)
  To: netfilter-devel

Wang Jian wrote:
> Hi,
> 
> I now use conntrack/ + libctnetlink/ + libnfnetlink/ + nfnetlink/ +
> ctnetlink/ + conntrack-event-api/, all are from r3884. I also enable
> CT_ACCT.

I'm using now kernel 2.6.11.7 with nfnetlink, ctnetlink, 
conntrack-event-api from patch-o-matic-ng svn (3884)

I'm using conntrack (3880) and libctnetlink (3877) and libnfnetlink 
(3876) from svn.

I fixed conntrack to remove the "id" parameter, I also have
CONFIG_IP_NF_CT_ACCT=y

Yet conntrack -E conntrack shows for every packet a new connection
type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...

I'm also still getting some kind of "alignment" problems in 
ip_conntrack_netlink.c such that original-bytes is 0 and reply-packets 
seems to have the value for originating bytes.

printk("OP=%ld OB=%ld RP=%ld RB=%ld\n",
ct->counters[IP_CT_DIR_ORIGINAL].packets,
ct->counters[IP_CT_DIR_ORIGINAL].bytes,
ct->counters[IP_CT_DIR_REPLY].packets,
ct->counters[IP_CT_DIR_REPLY].bytes);

OP=1 OB=0 RP=60 RB=0
OP=1 OB=0 RP=60 RB=0
OP=2 OB=0 RP=112 RB=0
OP=2 OB=0 RP=112 RB=0
OP=3 OB=0 RP=164 RB=0
OP=4 OB=0 RP=238 RB=0
OP=4 OB=0 RP=238 RB=0
OP=5 OB=0 RP=930 RB=0
OP=5 OB=0 RP=930 RB=0
OP=5 OB=0 RP=930 RB=0
OP=6 OB=0 RP=1006 RB=0
OP=6 OB=0 RP=1006 RB=0

I even put my debug in ctnetlink_conntrack_event() and it prints these 
bogus values, and yet I can't see how this canbe the case unless the 
kernel is walking all over the ip_conntrack struct, but then the box 
wouldn't be this stable.

All I can think of is to
1) do binary dump of ip_conntrack
2) back down the kernel to where the ct->counters are incremented until 
they start to make sense.

Amin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nfnetlink/ctnetlink from pom-ng r3884
  2005-04-20 13:41 ` Amin Azez
@ 2005-04-20 14:17   ` Samuel Liddicott
  2005-04-20 22:44   ` Pablo Neira
  1 sibling, 0 replies; 24+ messages in thread
From: Samuel Liddicott @ 2005-04-20 14:17 UTC (permalink / raw)
  To: netfilter-devel

OK, I think this was my fault due to bad use of printk and 64 bit numbers.

I used to think that printk/printf pushed a pointer to the number to the 
stack, it seems to push the actual value to the stack, and I wasn't 
properly specifying in my format string that it was a 64 bit number, so 
the most significant 4 bytes were being used as the next argument.

This was shown by outputting only one value per printk statement.

Apologies all.

Amin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nfnetlink/ctnetlink from pom-ng r3884
  2005-04-20 13:41 ` Amin Azez
  2005-04-20 14:17   ` Samuel Liddicott
@ 2005-04-20 22:44   ` Pablo Neira
  2005-04-21  8:07     ` Amin Azez
  2005-04-21  9:25     ` extending conntrack event data Amin Azez
  1 sibling, 2 replies; 24+ messages in thread
From: Pablo Neira @ 2005-04-20 22:44 UTC (permalink / raw)
  To: Amin Azez; +Cc: netfilter-devel

Amin Azez wrote:
> I'm using now kernel 2.6.11.7 with nfnetlink, ctnetlink, 
> conntrack-event-api from patch-o-matic-ng svn (3884)
> 
> I'm using conntrack (3880) and libctnetlink (3877) and libnfnetlink 
> (3876) from svn.

Plus the three patches that I posted recently that fix:

a) core dump when a packet of a not know protocol is received.
b) core dump when you receive a destroy message.
c) fail if ip_queue is loaded.

Make sure that you install the libraries correctly and you aren't using 
the old ones, this is *very* important.

> I fixed conntrack to remove the "id" parameter, I also have
> CONFIG_IP_NF_CT_ACCT=y
> 
> Yet conntrack -E conntrack shows for every packet a new connection
> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...

I'm not able to reproduce such behaviour. It works just fine here. Make 
sure that you are really using the lastest version. In such case, send 
me in private a tcpdump trace and the full list of event displayed.

> I'm also still getting some kind of "alignment" problems in 
> ip_conntrack_netlink.c such that original-bytes is 0 and reply-packets 
> seems to have the value for originating bytes.
> 
> printk("OP=%ld OB=%ld RP=%ld RB=%ld\n",
> ct->counters[IP_CT_DIR_ORIGINAL].packets,
> ct->counters[IP_CT_DIR_ORIGINAL].bytes,
> ct->counters[IP_CT_DIR_REPLY].packets,
> ct->counters[IP_CT_DIR_REPLY].bytes);
> 
> OP=1 OB=0 RP=60 RB=0
> OP=1 OB=0 RP=60 RB=0
> OP=2 OB=0 RP=112 RB=0
> OP=2 OB=0 RP=112 RB=0
> OP=3 OB=0 RP=164 RB=0
> OP=4 OB=0 RP=238 RB=0
> OP=4 OB=0 RP=238 RB=0
> OP=5 OB=0 RP=930 RB=0
> OP=5 OB=0 RP=930 RB=0
> OP=5 OB=0 RP=930 RB=0
> OP=6 OB=0 RP=1006 RB=0
> OP=6 OB=0 RP=1006 RB=0
> 
> I even put my debug in ctnetlink_conntrack_event() and it prints these 
> bogus values, and yet I can't see how this canbe the case unless the 
> kernel is walking all over the ip_conntrack struct, but then the box 
> wouldn't be this stable.

They aren't bogus, actually you see snapshots of the counters value in 
every event message.

See that we don't send a netlink message to user space every time a 
packet is received, instead we do when the state of a conntrack changes. 
So this is not a bug, it's a feature. As soon as I get some spare time 
I'll write the proper documentation and a manpage. Alternatively I'll 
appreciate if you could start writing it.

Then, to see the current value of counters use: conntrack -L conntrack

> All I can think of is to
> 1) do binary dump of ip_conntrack

if you're planning cat'ing from /proc/net/ip_conntrack, you must know 
that it harm system performance. You've been warned.

--
Pablo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: nfnetlink/ctnetlink from pom-ng r3884
  2005-04-20 22:44   ` Pablo Neira
@ 2005-04-21  8:07     ` Amin Azez
  2005-04-21  9:25     ` extending conntrack event data Amin Azez
  1 sibling, 0 replies; 24+ messages in thread
From: Amin Azez @ 2005-04-21  8:07 UTC (permalink / raw)
  To: Pablo Neira; +Cc: netfilter-devel

Pablo Neira wrote:

> Amin Azez wrote:
>
>> I'm using now kernel 2.6.11.7 with nfnetlink, ctnetlink, 
>> conntrack-event-api from patch-o-matic-ng svn (3884)
>>
>> I'm using conntrack (3880) and libctnetlink (3877) and libnfnetlink 
>> (3876) from svn.
>
>
> Plus the three patches that I posted recently that fix:
>
> a) core dump when a packet of a not know protocol is received.
> b) core dump when you receive a destroy message.
> c) fail if ip_queue is loaded.

Aye, those too.

> Make sure that you install the libraries correctly and you aren't 
> using the old ones, this is *very* important.
>
Aye, I re=installed them to get the new id 10 instead of 3, so I know I 
did that right.

>> I fixed conntrack to remove the "id" parameter, I also have
>> CONFIG_IP_NF_CT_ACCT=y
>>
>> Yet conntrack -E conntrack shows for every packet a new connection
>> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
>> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
>> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
>> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
>> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
>> type: [NEW] src=192.168.0.204 dst=192.168.0.131 sport=41605 ...
>
>
> I'm not able to reproduce such behaviour. It works just fine here. 
> Make sure that you are really using the lastest version. In such case, 
> send me in private a tcpdump trace and the full list of event displayed.

I may have to do that.

>>
>> printk("OP=%ld OB=%ld RP=%ld RB=%ld\n",
>> ct->counters[IP_CT_DIR_ORIGINAL].packets,
>> ct->counters[IP_CT_DIR_ORIGINAL].bytes,
>> ct->counters[IP_CT_DIR_REPLY].packets,
>> ct->counters[IP_CT_DIR_REPLY].bytes);
>>
>> OP=1 OB=0 RP=60 RB=0
>> OP=1 OB=0 RP=60 RB=0
>> OP=2 OB=0 RP=112 RB=0
>> OP=2 OB=0 RP=112 RB=0
>> OP=3 OB=0 RP=164 RB=0
>> OP=4 OB=0 RP=238 RB=0
>> OP=4 OB=0 RP=238 RB=0
>> OP=5 OB=0 RP=930 RB=0
>> OP=5 OB=0 RP=930 RB=0
>> OP=5 OB=0 RP=930 RB=0
>> OP=6 OB=0 RP=1006 RB=0
>> OP=6 OB=0 RP=1006 RB=0
>>
>> I even put my debug in ctnetlink_conntrack_event() and it prints 
>> these bogus values, and yet I can't see how this canbe the case 
>> unless the kernel is walking all over the ip_conntrack struct, but 
>> then the box wouldn't be this stable.
>
> I'm also still getting some kind of "alignment" problems in 
> ip_conntrack_netlink.c such that original-bytes is 0 and reply-packets 
> seems to have the value for originating bytes.
>
> They aren't bogus, actually you see snapshots of the counters value in 
> every event message.
>
You are right here, the bogus part was in my debug messages that cause 
the counter values to be displayed against the wrong labels.

> See that we don't send a netlink message to user space every time a 
> packet is received, instead we do when the state of a conntrack 
> changes. So this is not a bug, it's a feature. As soon as I get some 
> spare time I'll write the proper documentation and a manpage. 
> Alternatively I'll appreciate if you could start writing it.

I certainly am happy to start writing it. I just need to look at why I 
do get a netlink message for every packet and they are all "new". Once I 
have a some sane behaviour I will be glad to start.

I do get a destroy message after the TIMEWAIT expires. I could also do 
with a notification when the connection close sequence begins, or an 
RST, but I guess I can use an iptables rule to modify the conntrack 
state and get one that way?

> Then, to see the current value of counters use: conntrack -L conntrack
>
>> All I can think of is to
>> 1) do binary dump of ip_conntrack
>
>
> if you're planning cat'ing from /proc/net/ip_conntrack, you must know 
> that it harm system performance. You've been warned.
>
I wasn't, and I can't think what I did mean now!

Thanks Pablo.

Sam

^ permalink raw reply	[flat|nested] 24+ messages in thread

* extending conntrack event data
  2005-04-20 22:44   ` Pablo Neira
  2005-04-21  8:07     ` Amin Azez
@ 2005-04-21  9:25     ` Amin Azez
  2005-04-21  9:49       ` Amin Azez
  1 sibling, 1 reply; 24+ messages in thread
From: Amin Azez @ 2005-04-21  9:25 UTC (permalink / raw)
  To: Pablo Neira; +Cc: netfilter-devel

I realise that I may be extending the scope of ctevent too far for some, 
but I would like to make some more info available for conntrack events.

Specifically: mac addresses, and the time stamp associated with the skb 
that triggered the conntrack event.

notifier_call_chain in include/linux/netfilter_ipv4/ip_conntrack.h 
extacts the ip_conntrack from the skb and passes that to each event 
callback.
As notifier_call_chain is not passed the addresses of any members of the 
skb the callback can't use any container tricks to extract the skb address.
For my own convenience, and for compatability, I would like to modifer 
notify_call_chain to pass the skb address as a 4th parameter so that 
callback handlers may receive it. How do others feel about this?

Also, I realise that at the moment the mac member of an sk_buff presumes 
that all mac info will be ethernet; however I would like to write clean 
code that does not presume this will always be the case. What is the 
best way to check from an sk_buff that the link layer is ethernet, or 
rather that it is valid to presume that sk->mac.raw points to an ethhdr 
(if not null)?  The best I could think of was sk->inputdev but this 
seems to have no members to indicate device type. Is it true that the 
kernel only supports ethernet-type devices, or devices that masquerade 
as ethernet type devices?

Finally, I know the issue of conntrack id's has been discussed before, 
and that in kernel space the tuples are actually the best way to relate 
skb's to conntracks. I realise that tuple contents could also be used to 
maintain user-space references there is a race condition that exists; if 
a user-space application requests some action on a conntrack or related 
connection, because by the time this instruction is acted on in the 
kernel it might be applied to a different actual connection with the 
same tuple (previous connection has been closed). As netlink/netfilter 
stuff introduces more user-space connection management capabilities, we 
surely need to consider this problem?

Amin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: extending conntrack event data
  2005-04-21  9:25     ` extending conntrack event data Amin Azez
@ 2005-04-21  9:49       ` Amin Azez
  2005-04-21 10:14         ` Wang Jian
  0 siblings, 1 reply; 24+ messages in thread
From: Amin Azez @ 2005-04-21  9:49 UTC (permalink / raw)
  To: Amin Azez; +Cc: netfilter-devel, Pablo Neira

OK, I see that the skb is only available in ip_conntrack_event_cache and 
not ip_conntrack_event. I'm not clear on the different purposes of these 
two functions, but I see that both could potentially cause events in 
conntrack(-tool). I also see that notifier_call_chain is a general 
function and that my suggestion of adding an extra parameter to it is 
not likely to be well received.

It is thus not practical to do as I suggested and make skb information 
available at conntrack events.

Perhaps my real answer is to store the mac address as part of the 
conntrack info?
Is it legitimate to maintain link layer information of a connection with 
the other conntrack info?
I can see that some people won't care for it, while in other cases it 
forms an important part of the connection state.

Perhaps it should be a configure option, whether or not to maintain link 
layer information in the conntrack?
(I'm likely to also store the conntrack creation time, and some kind of 
serial number for my own purposes.)

Opinions?

Amin

Amin Azez wrote:

> I realise that I may be extending the scope of ctevent too far for 
> some, but I would like to make some more info available for conntrack 
> events.
>
> Specifically: mac addresses, and the time stamp associated with the 
> skb that triggered the conntrack event.
>
> notifier_call_chain in include/linux/netfilter_ipv4/ip_conntrack.h 
> extacts the ip_conntrack from the skb and passes that to each event 
> callback.
> As notifier_call_chain is not passed the addresses of any members of 
> the skb the callback can't use any container tricks to extract the skb 
> address.
> For my own convenience, and for compatability, I would like to modifer 
> notify_call_chain to pass the skb address as a 4th parameter so that 
> callback handlers may receive it. How do others feel about this?
>
> Also, I realise that at the moment the mac member of an sk_buff 
> presumes that all mac info will be ethernet; however I would like to 
> write clean code that does not presume this will always be the case. 
> What is the best way to check from an sk_buff that the link layer is 
> ethernet, or rather that it is valid to presume that sk->mac.raw 
> points to an ethhdr (if not null)?  The best I could think of was 
> sk->inputdev but this seems to have no members to indicate device 
> type. Is it true that the kernel only supports ethernet-type devices, 
> or devices that masquerade as ethernet type devices?
>
> Finally, I know the issue of conntrack id's has been discussed before, 
> and that in kernel space the tuples are actually the best way to 
> relate skb's to conntracks. I realise that tuple contents could also 
> be used to maintain user-space references there is a race condition 
> that exists; if a user-space application requests some action on a 
> conntrack or related connection, because by the time this instruction 
> is acted on in the kernel it might be applied to a different actual 
> connection with the same tuple (previous connection has been closed). 
> As netlink/netfilter stuff introduces more user-space connection 
> management capabilities, we surely need to consider this problem?
>
> Amin
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: extending conntrack event data
  2005-04-21  9:49       ` Amin Azez
@ 2005-04-21 10:14         ` Wang Jian
  2005-04-21 11:04           ` Pablo Neira
  2005-04-21 11:04           ` extending conntrack event data Amin Azez
  0 siblings, 2 replies; 24+ messages in thread
From: Wang Jian @ 2005-04-21 10:14 UTC (permalink / raw)
  To: Amin Azez; +Cc: netfilter-devel, Pablo Neira

Hi Amin Azez,


On Thu, 21 Apr 2005 10:49:38 +0100, Amin Azez <azez@ufomechanic.net> wrote:

> OK, I see that the skb is only available in ip_conntrack_event_cache and 
> not ip_conntrack_event. I'm not clear on the different purposes of these 
> two functions, but I see that both could potentially cause events in 
> conntrack(-tool). I also see that notifier_call_chain is a general 
> function and that my suggestion of adding an extra parameter to it is 
> not likely to be well received.

ip_conntrack_event_cache() marks a bitmap to indicate that certain event
occurs. The message will not be delivered immediately due to whatever
reason such as performance. I think one ctnetlink message can carry more
than one event so it is reasonable to cache different kinds of (not
emergent) event till one type of event occurs again.

> 
> It is thus not practical to do as I suggested and make skb information 
> available at conntrack events.
>

If we use skb everywhere (because it can contains conntrack information),
then you can do what you want.

> Perhaps my real answer is to store the mac address as part of the 
> conntrack info?
> Is it legitimate to maintain link layer information of a connection with 
> the other conntrack info?
> I can see that some people won't care for it, while in other cases it 
> forms an important part of the connection state.
> 
> Perhaps it should be a configure option, whether or not to maintain link 
> layer information in the conntrack?
> (I'm likely to also store the conntrack creation time, and some kind of 
> serial number for my own purposes.)
> 
> Opinions?

You can always use your own patch :)


> 
> Amin
> 
> Amin Azez wrote:
> 
> > I realise that I may be extending the scope of ctevent too far for 
> > some, but I would like to make some more info available for conntrack 
> > events.
> >
> > Specifically: mac addresses, and the time stamp associated with the 
> > skb that triggered the conntrack event.
> >
> > notifier_call_chain in include/linux/netfilter_ipv4/ip_conntrack.h 
> > extacts the ip_conntrack from the skb and passes that to each event 
> > callback.
> > As notifier_call_chain is not passed the addresses of any members of 
> > the skb the callback can't use any container tricks to extract the skb 
> > address.
> > For my own convenience, and for compatability, I would like to modifer 
> > notify_call_chain to pass the skb address as a 4th parameter so that 
> > callback handlers may receive it. How do others feel about this?
> >
> > Also, I realise that at the moment the mac member of an sk_buff 
> > presumes that all mac info will be ethernet; however I would like to 
> > write clean code that does not presume this will always be the case. 
> > What is the best way to check from an sk_buff that the link layer is 
> > ethernet, or rather that it is valid to presume that sk->mac.raw 
> > points to an ethhdr (if not null)?  The best I could think of was 
> > sk->inputdev but this seems to have no members to indicate device 
> > type. Is it true that the kernel only supports ethernet-type devices, 
> > or devices that masquerade as ethernet type devices?
> >
> > Finally, I know the issue of conntrack id's has been discussed before, 
> > and that in kernel space the tuples are actually the best way to 
> > relate skb's to conntracks. I realise that tuple contents could also 
> > be used to maintain user-space references there is a race condition 
> > that exists; if a user-space application requests some action on a 
> > conntrack or related connection, because by the time this instruction 
> > is acted on in the kernel it might be applied to a different actual 
> > connection with the same tuple (previous connection has been closed). 
> > As netlink/netfilter stuff introduces more user-space connection 
> > management capabilities, we surely need to consider this problem?
> >
> > Amin
> >
> >
> >



-- 
  lark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: extending conntrack event data
  2005-04-21 10:14         ` Wang Jian
@ 2005-04-21 11:04           ` Pablo Neira
  2005-04-25 13:51             ` Amin Azez
  2005-04-21 11:04           ` extending conntrack event data Amin Azez
  1 sibling, 1 reply; 24+ messages in thread
From: Pablo Neira @ 2005-04-21 11:04 UTC (permalink / raw)
  To: Wang Jian; +Cc: netfilter-devel, Amin Azez

Wang Jian wrote:
> On Thu, 21 Apr 2005 10:49:38 +0100, Amin Azez <azez@ufomechanic.net> wrote:
> 
> 
>>OK, I see that the skb is only available in ip_conntrack_event_cache and 
>>not ip_conntrack_event. I'm not clear on the different purposes of these 
>>two functions, but I see that both could potentially cause events in 
>>conntrack(-tool). I also see that notifier_call_chain is a general 
>>function and that my suggestion of adding an extra parameter to it is 
>>not likely to be well received.
> 
> 
> ip_conntrack_event_cache() marks a bitmap to indicate that certain event
> occurs. The message will not be delivered immediately due to whatever
> reason such as performance. 

right, performance is the reason why we use event caching. Spamming a 
netlink message to user space every time a packet is received is simply 
"matador" (overkill).

--
Pablo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: extending conntrack event data
  2005-04-21 11:04           ` Pablo Neira
@ 2005-04-25 13:51             ` Amin Azez
  2005-04-25 16:35               ` IPCT_NEW comes from was " Amin Azez
  0 siblings, 1 reply; 24+ messages in thread
From: Amin Azez @ 2005-04-25 13:51 UTC (permalink / raw)
  To: Pablo Neira; +Cc: netfilter-devel

Pablo Neira wrote:

> Wang Jian wrote:
>
>> On Thu, 21 Apr 2005 10:49:38 +0100, Amin Azez <azez@ufomechanic.net> 
>> wrote:
>>
>>> OK, I see that the skb is only available in ip_conntrack_event_cache 
>>> and not ip_conntrack_event. I'm not clear on the different purposes 
>>> of these two functions, but I see that both could potentially cause 
>>> events in conntrack(-tool). I also see that notifier_call_chain is a 
>>> general function and that my suggestion of adding an extra parameter 
>>> to it is not likely to be well received.
>>
>> ip_conntrack_event_cache() marks a bitmap to indicate that certain event
>> occurs. The message will not be delivered immediately due to whatever
>> reason such as performance. 
>
> right, performance is the reason why we use event caching. Spamming a 
> netlink message to user space every time a packet is received is 
> simply "matador" (overkill).

I've been through this in a lot of detail in the source, and I can't see 
how the source should give different behaviour to what I see, which is a 
netlink message for every packet of an open connection. (At least event 
caching stops multiple messages per packet)

I suspect the problem is either in the fact that ip_confirm calls 
ip_conntrack_deliver_cached_events for every packet, (ip_confirm is 
called as a netlink hook) or in what tcp_packet 
(net/ipv4/netfilter/ip_conntrack_proto_tcp.c) does to the skb before 
ip_confirm gets it.
(Is the problem because the connection terminates on the test machine 
instead of passing through it? I have confirmed that bridging has 
nothing to do with it)

We may get through this quickly if someone can send me their 
ip_conntrack_proto_tcp.c file to diff against; however I go into the 
detail of my examinations below.

tcp_packet always calls:
ip_conntrack_event_cache(IPCT_PROTOINFO_VOLATILE, skb);
unless new_state is one of TCP_CONNTRACK_IGNORE, TCP_CONNTRACK_MAX, 
TCP_CONNTRACK_SYN_SENT, TCP_CONNTRACK_CLOSE.

Is there a reason why we are doing this, even when new_state or 
old_state=TCP_CONNTRACK_ESTABLISHED ?
It also calls calling ip_conntrack_event_cache for pretty much every 
packet too, including those for TCP_CONNTRACK_ESTABLISHED connections.

Also, how is the code supposed to reduce the netlink message rate? The 
timer functions seem to be related only to death by timeout, there 
doesn't seem to be any mechanism for selecting only some packets for 
netlink notification/

As an experiement, if I type ^U into an active ssh session, tcpdump 
shows 4 packets, summarized below, along with the nfcache event value 
that was delivered over netlink for each one:

client->server: c04a (4a=IPCT_RELATED,IPCT_REFRESH,IPCT_PROTOINFO_VOLATILE)
server->client: c049 (49=IPCT_NEW | IPCT_REFRESH,IPCT_PROTOINFO_VOLATILE)
server->client: c049
client->server: c04a

Now I have shown where IPCT_REFRESH and IPCT_PROTINFO_VOLATILE are 
coming from but I don't see why I get IPCT_NEW for the return packets, 
and what could IPCT_RELATED mean for the packets in the original direction?

printk shows that the IPCT_NEW events and IPCT_RELATED events are NOT 
set via __ip_conntrack_confirm which does:
ip_conntrack_event_cache(master_ct(ct) ? IPCT_RELATED : IPCT_NEW, *pskb);
[which WOULD explain where the RELATED and NEW keep coming from, (but 
that should only happen if __ip_conntrack_confirm doesn't find either 
the forward or reverse tuple in the hash)]

What is troubling that is that nowhere else in net or include dirs is 
IPCT_RELATED used actively, so how is IPCT_RELATED getting in the 
skb->nfcache?

So the three questions are
1) should tcp_packet be setting so many event types in skb->nfcache as 
it does, for every packet?
2) should ip_confirm really be being called for every packet like it is?
3) or is the problem in IPCT_RELATED and IPCT_NEW being stuffed in the 
event flags, and what could cause this as it is NOT 
__ip_conntrack_confirm doing it?

Azez

^ permalink raw reply	[flat|nested] 24+ messages in thread

* IPCT_NEW comes from was Re: extending conntrack event data
  2005-04-25 13:51             ` Amin Azez
@ 2005-04-25 16:35               ` Amin Azez
  2005-04-25 16:43                 ` Amin Azez
  0 siblings, 1 reply; 24+ messages in thread
From: Amin Azez @ 2005-04-25 16:35 UTC (permalink / raw)
  Cc: netfilter-devel, Pablo Neira

Looking at some of my skb->nfcache debugging
(de8ce580 is the skb address)

during tcp_packet, I get calls to ip_conntrack_event_cache which changes 
nfcache thus:
* event_cache on de8ce580 from 4000 to 4040
* event_cache on de8ce580 from 4040 to 4060
* {leave tcp_packet}
* event_cache on de8ce580 from 4060 to 4068
* event_cache on de8ce580 from 4068 to 4078
* deliver_cached_events c079 right now skb de8ce580

By the time ip_confirm is called some more stuff has happened to 
nfcache, hence ip_confirm c079 de8ce580

Question is how did the nfcache get from 4078 to c079
It was c079 when ip_confirm was called

Whence the extra 8001 that has been combined? The 1 is IPCT_NEW, the 
8000 is NFC_ALTERED

NFC_ALTERED is used in various places, the most like in 
ip_ct_gather_frags but this hardly seems likely if src and dst machines 
are on the same subnet?
I confirmed with logging that it isn't there so I will have to add debug 
to all the other places to see which one is guilty.

Azez

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: IPCT_NEW comes from was Re: extending conntrack event data
  2005-04-25 16:35               ` IPCT_NEW comes from was " Amin Azez
@ 2005-04-25 16:43                 ` Amin Azez
  2005-04-26 13:37                   ` BUG/CONFLICT conntrack with preroute/postroute mangle table Samuel Liddicott
  2005-04-26 13:38                   ` Amin Azez
  0 siblings, 2 replies; 24+ messages in thread
From: Amin Azez @ 2005-04-25 16:43 UTC (permalink / raw)
  Cc: netfilter-devel, Pablo Neira

Further investigation points to the layer 7 matching and mangle-tables 
rules etc, once I remove those rules it stops the magical increment from 
4078 to c079.

Possibly this has been the cuase of the problems, I'll check tomorrow to 
see how this could cause it.

Amin

Amin Azez wrote:

> Looking at some of my skb->nfcache debugging
> (de8ce580 is the skb address)
>
> during tcp_packet, I get calls to ip_conntrack_event_cache which 
> changes nfcache thus:
> * event_cache on de8ce580 from 4000 to 4040
> * event_cache on de8ce580 from 4040 to 4060
> * {leave tcp_packet}
> * event_cache on de8ce580 from 4060 to 4068
> * event_cache on de8ce580 from 4068 to 4078
> * deliver_cached_events c079 right now skb de8ce580
>
> By the time ip_confirm is called some more stuff has happened to 
> nfcache, hence ip_confirm c079 de8ce580
>
> Question is how did the nfcache get from 4078 to c079
> It was c079 when ip_confirm was called
>
> Whence the extra 8001 that has been combined? The 1 is IPCT_NEW, the 
> 8000 is NFC_ALTERED
>
> NFC_ALTERED is used in various places, the most like in 
> ip_ct_gather_frags but this hardly seems likely if src and dst 
> machines are on the same subnet?
> I confirmed with logging that it isn't there so I will have to add 
> debug to all the other places to see which one is guilty.
>
> Azez
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* BUG/CONFLICT conntrack with preroute/postroute mangle table
  2005-04-25 16:43                 ` Amin Azez
@ 2005-04-26 13:37                   ` Samuel Liddicott
  2005-04-26 13:38                   ` Amin Azez
  1 sibling, 0 replies; 24+ messages in thread
From: Samuel Liddicott @ 2005-04-26 13:37 UTC (permalink / raw)
  To: Pablo Neira; +Cc: netfilter-devel


I've got a sample case of two iptables rules that reproduce the problems 
of a netlink message for every packet that I have been having with 
conntrack(-tool).

(I got my kernels confused yesterday, it is nothing to do with layer 7 
matching in the kernel, the ip-tables rules that trigger the bug just 
happen to be part of a rules file that I call layer 7 rules), of course 
the bug shows in a regular kernel patched (pom-ng) only with ctnetlink, 
nfnetlink and conntrack-event-api, as I explained yesterday.

To reproduce the bug, follow these steps which I have just verified, yes 
on a pristine 2.6.11.7 kernel with ctnetlink, nfnetlink and 
conntrack-event-api (and without my conntrack mac address patches):

1) modprobe ip_conntrack_netlink
2) /path/to/conntrack -E conntrack
3) now connect to the box and see that conntrack is reporting NEW UPDATE 
UPDATE

then do: (1.2.3.4 is any IP address nowhere near your network)
4) iptables -t mangle -A PREROUTING -d 1.2.3.4
5) iptables -t mangle -A POSTROUTING -d 1.2.3.4
7) /path/to/conntrack -E conntrack
8) now connect to the box and watch it spring an event for every packet 
as NEW NEW NEW

Thats it! So why does presence of these rules in PREROUTING and 
POSTROUTING damage skb->nfcache in this way?
Either rule will do it, they aren't both needed, but note that the rules 
don't actually match OR take any action if it does match.

So it is merely the action of processing the rule that breaks 
skb->nfcache value.

Amin

Amin Azez wrote:

> Further investigation points to the layer 7 matching and mangle-tables 
> rules etc, once I remove those rules it stops the magical increment 
> from 4078 to c079.
>
> Possibly this has been the cuase of the problems, I'll check tomorrow 
> to see how this could cause it.
>
> Amin
>
> Amin Azez wrote:
>
>> Looking at some of my skb->nfcache debugging
>> (de8ce580 is the skb address)
>>
>> during tcp_packet, I get calls to ip_conntrack_event_cache which 
>> changes nfcache thus:
>> * event_cache on de8ce580 from 4000 to 4040
>> * event_cache on de8ce580 from 4040 to 4060
>> * {leave tcp_packet}
>> * event_cache on de8ce580 from 4060 to 4068
>> * event_cache on de8ce580 from 4068 to 4078
>> * deliver_cached_events c079 right now skb de8ce580
>>
>> By the time ip_confirm is called some more stuff has happened to 
>> nfcache, hence ip_confirm c079 de8ce580
>>
>> Question is how did the nfcache get from 4078 to c079
>> It was c079 when ip_confirm was called
>>
>> Whence the extra 8001 that has been combined? The 1 is IPCT_NEW, the 
>> 8000 is NFC_ALTERED
>>
>> NFC_ALTERED is used in various places, the most like in 
>> ip_ct_gather_frags but this hardly seems likely if src and dst 
>> machines are on the same subnet?
>> I confirmed with logging that it isn't there so I will have to add 
>> debug to all the other places to see which one is guilty.
>>
>> Azez
>>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* BUG/CONFLICT conntrack with preroute/postroute mangle table
  2005-04-25 16:43                 ` Amin Azez
  2005-04-26 13:37                   ` BUG/CONFLICT conntrack with preroute/postroute mangle table Samuel Liddicott
@ 2005-04-26 13:38                   ` Amin Azez
  2005-05-05 11:08                     ` Amin Azez
  1 sibling, 1 reply; 24+ messages in thread
From: Amin Azez @ 2005-04-26 13:38 UTC (permalink / raw)
  To: Pablo Neira; +Cc: netfilter-devel


I've got a sample case of two iptables rules that reproduce the problems 
of a netlink message for every packet that I have been having with 
conntrack(-tool).

(I got my kernels confused yesterday, it is nothing to do with layer 7 
matching in the kernel, the ip-tables rules that trigger the bug just 
happen to be part of a rules file that I call layer 7 rules), of course 
the bug shows in a regular kernel patched (pom-ng) only with ctnetlink, 
nfnetlink and conntrack-event-api, as I explained yesterday.

To reproduce the bug, follow these steps which I have just verified, yes 
on a pristine 2.6.11.7 kernel with ctnetlink, nfnetlink and 
conntrack-event-api (and without my conntrack mac address patches):

1) modprobe ip_conntrack_netlink
2) /path/to/conntrack -E conntrack
3) now connect to the box and see that conntrack is reporting NEW UPDATE 
UPDATE

then do: (1.2.3.4 is any IP address nowhere near your network)
4) iptables -t mangle -A PREROUTING -d 1.2.3.4
5) iptables -t mangle -A POSTROUTING -d 1.2.3.4
7) /path/to/conntrack -E conntrack
8) now connect to the box and watch it spring an event for every packet 
as NEW NEW NEW

Thats it! So why does presence of these rules in PREROUTING and 
POSTROUTING damage skb->nfcache in this way?
Either rule will do it, they aren't both needed, but note that the rules 
don't actually match OR take any action if it does match.

So it is merely the action of processing the rule that breaks 
skb->nfcache value.

Amin

Amin Azez wrote:

> Further investigation points to the layer 7 matching and mangle-tables 
> rules etc, once I remove those rules it stops the magical increment 
> from 4078 to c079.
>
> Possibly this has been the cuase of the problems, I'll check tomorrow 
> to see how this could cause it.
>
> Amin
>
> Amin Azez wrote:
>
>> Looking at some of my skb->nfcache debugging
>> (de8ce580 is the skb address)
>>
>> during tcp_packet, I get calls to ip_conntrack_event_cache which 
>> changes nfcache thus:
>> * event_cache on de8ce580 from 4000 to 4040
>> * event_cache on de8ce580 from 4040 to 4060
>> * {leave tcp_packet}
>> * event_cache on de8ce580 from 4060 to 4068
>> * event_cache on de8ce580 from 4068 to 4078
>> * deliver_cached_events c079 right now skb de8ce580
>>
>> By the time ip_confirm is called some more stuff has happened to 
>> nfcache, hence ip_confirm c079 de8ce580
>>
>> Question is how did the nfcache get from 4078 to c079
>> It was c079 when ip_confirm was called
>>
>> Whence the extra 8001 that has been combined? The 1 is IPCT_NEW, the 
>> 8000 is NFC_ALTERED
>>
>> NFC_ALTERED is used in various places, the most like in 
>> ip_ct_gather_frags but this hardly seems likely if src and dst 
>> machines are on the same subnet?
>> I confirmed with logging that it isn't there so I will have to add 
>> debug to all the other places to see which one is guilty.
>>
>> Azez
>>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: BUG/CONFLICT conntrack with preroute/postroute mangle table
  2005-04-26 13:38                   ` Amin Azez
@ 2005-05-05 11:08                     ` Amin Azez
  2005-05-05 13:36                       ` RFC for fix? Was " Amin Azez
  2005-05-05 16:05                       ` Pablo Neira
  0 siblings, 2 replies; 24+ messages in thread
From: Amin Azez @ 2005-05-05 11:08 UTC (permalink / raw)
  To: netfilter-devel

Further to the problem below, skb->nfcache is having the lower 2 bits 
clobbered by net/ipv4/netfilter/ip_tables.c around line 316 (2.6.11.7)

         back = get_entry(table_base, table->private->underflow[hook]);

         do {
                 IP_NF_ASSERT(e);
                 IP_NF_ASSERT(back);

//THIS IS THE CULPRIT!
                 (*pskb)->nfcache |= e->nfcache;


e->nfcache has been observed at 0,1,2,0x4000
1 and 2 are IPCT_NEW and IPCT_RELATED and these are causing the damage.

At this point, e is a: struct ipt_entry *e

I haven't manage to find in the kernel source where nfcache of an 
ipt_entry is used or set for anything.

find net include -type f | xargs grep 'nfcache' | less
shows every modification to be on some kind of skb!

I'm baffled, but I'll keep looking to see where this is being set.

Anyone else got ideas?

Is this maybe just ipt_entry not being properly initialized when 
allocated and I'm just getting junk?

Amin Azez wrote:
> 
> I've got a sample case of two iptables rules that reproduce the problems 
> of a netlink message for every packet that I have been having with 
> conntrack(-tool).
> 
...
> To reproduce the bug, follow these steps which I have just verified, yes 
> on a pristine 2.6.11.7 kernel with ctnetlink, nfnetlink and 
> conntrack-event-api (and without my conntrack mac address patches):
> 
> 1) modprobe ip_conntrack_netlink
> 2) /path/to/conntrack -E conntrack
> 3) now connect to the box and see that conntrack is reporting NEW UPDATE 
> UPDATE
> 
> then do: (1.2.3.4 is any IP address nowhere near your network)
> 4) iptables -t mangle -A PREROUTING -d 1.2.3.4
> 5) iptables -t mangle -A POSTROUTING -d 1.2.3.4
> 7) /path/to/conntrack -E conntrack
> 8) now connect to the box and watch it spring an event for every packet 
> as NEW NEW NEW
> 
> Thats it! So why does presence of these rules in PREROUTING and 
> POSTROUTING damage skb->nfcache in this way?
> Either rule will do it, they aren't both needed, but note that the rules 
> don't actually match OR take any action if it does match.
> 
> So it is merely the action of processing the rule that breaks 
> skb->nfcache value.
> 
> Amin
> 
> Amin Azez wrote:
> 
>> Further investigation points to the layer 7 matching and mangle-tables 
>> rules etc, once I remove those rules it stops the magical increment 
>> from 4078 to c079.
>>
>> Possibly this has been the cuase of the problems, I'll check tomorrow 
>> to see how this could cause it.
>>
>> Amin
>>
>> Amin Azez wrote:
>>
>>> Looking at some of my skb->nfcache debugging
>>> (de8ce580 is the skb address)
>>>
>>> during tcp_packet, I get calls to ip_conntrack_event_cache which 
>>> changes nfcache thus:
>>> * event_cache on de8ce580 from 4000 to 4040
>>> * event_cache on de8ce580 from 4040 to 4060
>>> * {leave tcp_packet}
>>> * event_cache on de8ce580 from 4060 to 4068
>>> * event_cache on de8ce580 from 4068 to 4078
>>> * deliver_cached_events c079 right now skb de8ce580
>>>
>>> By the time ip_confirm is called some more stuff has happened to 
>>> nfcache, hence ip_confirm c079 de8ce580
>>>
>>> Question is how did the nfcache get from 4078 to c079
>>> It was c079 when ip_confirm was called
>>>
>>> Whence the extra 8001 that has been combined? The 1 is IPCT_NEW, the 
>>> 8000 is NFC_ALTERED
>>>
>>> NFC_ALTERED is used in various places, the most like in 
>>> ip_ct_gather_frags but this hardly seems likely if src and dst 
>>> machines are on the same subnet?
>>> I confirmed with logging that it isn't there so I will have to add 
>>> debug to all the other places to see which one is guilty.
>>>
>>> Azez
>>>
>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RFC for fix? Was Re: BUG/CONFLICT conntrack with preroute/postroute mangle table
  2005-05-05 11:08                     ` Amin Azez
@ 2005-05-05 13:36                       ` Amin Azez
  2005-05-05 16:05                       ` Pablo Neira
  1 sibling, 0 replies; 24+ messages in thread
From: Amin Azez @ 2005-05-05 13:36 UTC (permalink / raw)
  To: netfilter-devel


I'm proposing to remove the culprit line (below).
	(*pskb)->nfcache |= e->nfcache;
from around line 316 of net/ipv4/netfilter/ip_tables.c

because:

1) I can't find any code that references nfcache in an ipt_entry anyway
2) It seems like older kernels use of nfcache was more common for filter 
matches and such, but not as far as I can tell now.
I renamed the nfcache member in ipt_entry and the kernel still compiles 
fine, the only named use was the line I want to remove.
(This doesn't explain though how ipt_entry->nfcache was getting non-zero 
values, maybe using pointer arithmetic? ugh!).
3) The only documented use for ipt_entry->nfcache that I can find is:
"An nfcache bitfield that gives what parts of the packet the rule exams"

I can't see the connection between part of a packet being EXAMINED 
(ipt_entry->nfcache) and part of a conntrack changing (skb->nfcache) so 
I don't think:
    (*pskb)->nfcache |= e->nfcache;

make sense right anyway, even if we could show that ipt_entry->nfcache 
was actually used

Any objections?

If I am wrong, then
1) some modules (mangle code) is setting fields in nfcache for merely 
examining a field
2) I can't see where that is happening

Amin


Amin Azez wrote:
> Further to the problem below, skb->nfcache is having the lower 2 bits 
> clobbered by net/ipv4/netfilter/ip_tables.c around line 316 (2.6.11.7)
> 
>         back = get_entry(table_base, table->private->underflow[hook]);
> 
>         do {
>                 IP_NF_ASSERT(e);
>                 IP_NF_ASSERT(back);
> 
> //THIS IS THE CULPRIT!
>                 (*pskb)->nfcache |= e->nfcache;
> 
> 
> e->nfcache has been observed at 0,1,2,0x4000
> 1 and 2 are IPCT_NEW and IPCT_RELATED and these are causing the damage.
> 
> At this point, e is a: struct ipt_entry *e
> 
> I haven't manage to find in the kernel source where nfcache of an 
> ipt_entry is used or set for anything.
> 
> find net include -type f | xargs grep 'nfcache' | less
> shows every modification to be on some kind of skb!
> 
> I'm baffled, but I'll keep looking to see where this is being set.
> 
> Anyone else got ideas?
> 
> Is this maybe just ipt_entry not being properly initialized when 
> allocated and I'm just getting junk?
> 
> Amin Azez wrote:
> 
>>
>> I've got a sample case of two iptables rules that reproduce the 
>> problems of a netlink message for every packet that I have been having 
>> with conntrack(-tool).
>>
> ...
> 
>> To reproduce the bug, follow these steps which I have just verified, 
>> yes on a pristine 2.6.11.7 kernel with ctnetlink, nfnetlink and 
>> conntrack-event-api (and without my conntrack mac address patches):
>>
>> 1) modprobe ip_conntrack_netlink
>> 2) /path/to/conntrack -E conntrack
>> 3) now connect to the box and see that conntrack is reporting NEW 
>> UPDATE UPDATE
>>
>> then do: (1.2.3.4 is any IP address nowhere near your network)
>> 4) iptables -t mangle -A PREROUTING -d 1.2.3.4
>> 5) iptables -t mangle -A POSTROUTING -d 1.2.3.4
>> 7) /path/to/conntrack -E conntrack
>> 8) now connect to the box and watch it spring an event for every 
>> packet as NEW NEW NEW
>>
>> Thats it! So why does presence of these rules in PREROUTING and 
>> POSTROUTING damage skb->nfcache in this way?
>> Either rule will do it, they aren't both needed, but note that the 
>> rules don't actually match OR take any action if it does match.
>>
>> So it is merely the action of processing the rule that breaks 
>> skb->nfcache value.
>>
>> Amin
>>
>> Amin Azez wrote:
>>
>>> Further investigation points to the layer 7 matching and 
>>> mangle-tables rules etc, once I remove those rules it stops the 
>>> magical increment from 4078 to c079.
>>>
>>> Possibly this has been the cuase of the problems, I'll check tomorrow 
>>> to see how this could cause it.
>>>
>>> Amin
>>>
>>> Amin Azez wrote:
>>>
>>>> Looking at some of my skb->nfcache debugging
>>>> (de8ce580 is the skb address)
>>>>
>>>> during tcp_packet, I get calls to ip_conntrack_event_cache which 
>>>> changes nfcache thus:
>>>> * event_cache on de8ce580 from 4000 to 4040
>>>> * event_cache on de8ce580 from 4040 to 4060
>>>> * {leave tcp_packet}
>>>> * event_cache on de8ce580 from 4060 to 4068
>>>> * event_cache on de8ce580 from 4068 to 4078
>>>> * deliver_cached_events c079 right now skb de8ce580
>>>>
>>>> By the time ip_confirm is called some more stuff has happened to 
>>>> nfcache, hence ip_confirm c079 de8ce580
>>>>
>>>> Question is how did the nfcache get from 4078 to c079
>>>> It was c079 when ip_confirm was called
>>>>
>>>> Whence the extra 8001 that has been combined? The 1 is IPCT_NEW, the 
>>>> 8000 is NFC_ALTERED
>>>>
>>>> NFC_ALTERED is used in various places, the most like in 
>>>> ip_ct_gather_frags but this hardly seems likely if src and dst 
>>>> machines are on the same subnet?
>>>> I confirmed with logging that it isn't there so I will have to add 
>>>> debug to all the other places to see which one is guilty.
>>>>
>>>> Azez
>>>>
>>>
>>>
>>
>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: BUG/CONFLICT conntrack with preroute/postroute mangle table
  2005-05-05 11:08                     ` Amin Azez
  2005-05-05 13:36                       ` RFC for fix? Was " Amin Azez
@ 2005-05-05 16:05                       ` Pablo Neira
  2005-05-09 11:11                         ` Amin Azez
  1 sibling, 1 reply; 24+ messages in thread
From: Pablo Neira @ 2005-05-05 16:05 UTC (permalink / raw)
  To: Amin Azez; +Cc: netfilter-devel, 'Krisztian Kovacs'

Amin Azez wrote:
> Further to the problem below, skb->nfcache is having the lower 2 bits 
> clobbered by net/ipv4/netfilter/ip_tables.c around line 316 (2.6.11.7)
> 
>         back = get_entry(table_base, table->private->underflow[hook]);
> 
>         do {
>                 IP_NF_ASSERT(e);
>                 IP_NF_ASSERT(back);
> 
> //THIS IS THE CULPRIT!
>                 (*pskb)->nfcache |= e->nfcache;
> 
> 
> e->nfcache has been observed at 0,1,2,0x4000
> 1 and 2 are IPCT_NEW and IPCT_RELATED and these are causing the damage.

This bug can be reproduced if you use iptables < 1.3.1, since I 
personally sent a patch to remove any nfcache references in iptables code.

https://lists.netfilter.org/pipermail/netfilter-devel/2005-February/018463.html

Yes, It's a matter of removing that line. thanks for pointing out this.

Krisztian, I think that this could be source of weird behaviours in 
ct_sync if your users use old iptables versions.

--
Pablo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: BUG/CONFLICT conntrack with preroute/postroute mangle table
  2005-05-05 16:05                       ` Pablo Neira
@ 2005-05-09 11:11                         ` Amin Azez
  2005-05-09 13:48                           ` Amin Azez
  0 siblings, 1 reply; 24+ messages in thread
From: Amin Azez @ 2005-05-09 11:11 UTC (permalink / raw)
  To: Pablo Neira; +Cc: netfilter-devel, 'Krisztian Kovacs'

Pablo Neira wrote:

> This bug can be reproduced if you use iptables < 1.3.1, since I 
> personally sent a patch to remove any nfcache references in iptables 
> code.
>
> https://lists.netfilter.org/pipermail/netfilter-devel/2005-February/018463.html 
>
>
> Yes, It's a matter of removing that line. thanks for pointing out this.
>
> Krisztian, I think that this could be source of weird behaviours in 
> ct_sync if your users use old iptables versions.

iptables refers to some kernel-side and some user-side code.
When you say "iptables<1.3.1" which are you talking about? User side? 
(I've inherited use of 1.2.11 for historical reasons though I'm hoping 
to change that)
Is it a requirement to update to iptables 1.3.1 user space applications 
to avoid other instances of this bug?

Does a patch for this "one liner" need submitting to the list as 
Subject: [PATCH], to get signed off by anyone at all?
What is the procedure to ensure that this fix reaches the kernel properly?

Amin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: BUG/CONFLICT conntrack with preroute/postroute mangle table
  2005-05-09 11:11                         ` Amin Azez
@ 2005-05-09 13:48                           ` Amin Azez
  0 siblings, 0 replies; 24+ messages in thread
From: Amin Azez @ 2005-05-09 13:48 UTC (permalink / raw)
  To: Amin Azez; +Cc: netfilter-devel, 'Krisztian Kovacs'



Amin Azez wrote:
> iptables refers to some kernel-side and some user-side code.
> When you say "iptables<1.3.1" which are you talking about? User side? 
> (I've inherited use of 1.2.11 for historical reasons though I'm hoping 
> to change that)
> Is it a requirement to update to iptables 1.3.1 user space applications 
> to avoid other instances of this bug?
> 
> Does a patch for this "one liner" need submitting to the list as 
> Subject: [PATCH], to get signed off by anyone at all?
> What is the procedure to ensure that this fix reaches the kernel properly?

Ignore my senseless ramblings, I also see that you have submitted a 
wider patch addressing this problem.

Thanks

Amin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: extending conntrack event data
  2005-04-21 10:14         ` Wang Jian
  2005-04-21 11:04           ` Pablo Neira
@ 2005-04-21 11:04           ` Amin Azez
  1 sibling, 0 replies; 24+ messages in thread
From: Amin Azez @ 2005-04-21 11:04 UTC (permalink / raw)
  To: Wang Jian; +Cc: netfilter-devel, Pablo Neira

Wang Jian wrote:

>Hi Amin Azez,
>
>On Thu, 21 Apr 2005 10:49:38 +0100, Amin Azez <azez@ufomechanic.net> wrote:
>  
>
>>It is thus not practical to do as I suggested and make skb information 
>>available at conntrack events.
>>
>If we use skb everywhere (because it can contains conntrack information),
>then you can do what you want.
>  
>
That would be fine, but it looks like the skb is lost a few layers up 
the function calls except for the  *event_cache() calls, so this may 
involve a lot of changes? I guess the question is "which do we prefer?".

Adding to the conntrack is cleanest, it touches only conntrack_core and 
conntrack_standalone and is protected by a kernel CONFIG_ define.
Passing the skb instead of the conntrack is more flexible but will touch 
a lot more code. I would think Pablo or Harald are better placed to make 
that call.

I'm working on adding the option of link-layer to the conntrack struct 
as it is cleanest, perhaps genuinely more useful, and will have the most 
compact patch if I have to maintain it outside the core kernel.

Sam

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2005-05-09 13:48 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-19 13:37 nfnetlink/ctnetlink from pom-ng r3884 Wang Jian
2005-04-20  0:55 ` Pablo Neira
2005-04-21  8:21   ` Wang Jian
2005-04-21 11:05     ` Pablo Neira
2005-04-21 11:29       ` Wang Jian
2005-04-20 13:41 ` Amin Azez
2005-04-20 14:17   ` Samuel Liddicott
2005-04-20 22:44   ` Pablo Neira
2005-04-21  8:07     ` Amin Azez
2005-04-21  9:25     ` extending conntrack event data Amin Azez
2005-04-21  9:49       ` Amin Azez
2005-04-21 10:14         ` Wang Jian
2005-04-21 11:04           ` Pablo Neira
2005-04-25 13:51             ` Amin Azez
2005-04-25 16:35               ` IPCT_NEW comes from was " Amin Azez
2005-04-25 16:43                 ` Amin Azez
2005-04-26 13:37                   ` BUG/CONFLICT conntrack with preroute/postroute mangle table Samuel Liddicott
2005-04-26 13:38                   ` Amin Azez
2005-05-05 11:08                     ` Amin Azez
2005-05-05 13:36                       ` RFC for fix? Was " Amin Azez
2005-05-05 16:05                       ` Pablo Neira
2005-05-09 11:11                         ` Amin Azez
2005-05-09 13:48                           ` Amin Azez
2005-04-21 11:04           ` extending conntrack event data Amin Azez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.