* [PATCH] add TCP protocol state event groups
@ 2007-06-11 18:05 Pablo Neira Ayuso
2007-06-19 13:33 ` Patrick McHardy
0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2007-06-11 18:05 UTC (permalink / raw)
To: Netfilter Development Mailinglist; +Cc: Patrick McHardy
[-- Attachment #1: Type: text/plain, Size: 513 bytes --]
This patch adds per-protocol state event groups, so one can only listen
to a certain TCP state change such as ESTABLISHED. Although such
per-state message filtering could be done in userspace, we save CPU
cycles since the kernel does not need to build and delivery messages
that will be later discarded in userspace. This patch is particularly
useful for conntrackd.
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
[-- Attachment #2: proto.patch --]
[-- Type: text/x-patch, Size: 6948 bytes --]
[CTNETLINK] add TCP protocol state event groups
This patch adds per-protocol state event groups, so one can only listen to a
certain TCP state change such as ESTABLISHED. Although such per-state message
filtering could be done in userspace, we save CPU cycles since the kernel does
not need to build and delivery messages that will be later discarded in
userspace. This patch is particularly useful for conntrackd.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Index: net-2.6.git/include/linux/netfilter/nfnetlink.h
===================================================================
--- net-2.6.git.orig/include/linux/netfilter/nfnetlink.h 2007-06-11 02:31:08.000000000 +0200
+++ net-2.6.git/include/linux/netfilter/nfnetlink.h 2007-06-11 02:31:13.000000000 +0200
@@ -27,6 +27,33 @@ enum nfnetlink_groups {
#define NFNLGRP_CONNTRACK_EXP_UPDATE NFNLGRP_CONNTRACK_EXP_UPDATE
NFNLGRP_CONNTRACK_EXP_DESTROY,
#define NFNLGRP_CONNTRACK_EXP_DESTROY NFNLGRP_CONNTRACK_EXP_DESTROY
+
+ /* TCP protocol state groups */
+ NFNLGRP_CONNTRACK_PROTO_TCP_SYN_SENT,
+#define NFNLGRP_CONNTRACK_PROTO_TCP_SYN_SENT \
+ NFNLGRP_CONNTRACK_PROTO_TCP_SYN_SENT
+ NFNLGRP_CONNTRACK_PROTO_TCP_SYN_RECV,
+#define NFNLGRP_CONNTRACK_PROTO_TCP_SYN_RECV \
+ NFNLGRP_CONNTRACK_PROTO_TCP_SYN_RECV
+ NFNLGRP_CONNTRACK_PROTO_TCP_ESTABLISHED,
+#define NFNLGRP_CONNTRACK_PROTO_TCP_ESTABLISHED \
+ NFNLGRP_CONNTRACK_PROTO_TCP_ESTABLISHED
+ NFNLGRP_CONNTRACK_PROTO_TCP_FIN_WAIT,
+#define NFNLGRP_CONNTRACK_PROTO_TCP_FIN_WAIT \
+ NFNLGRP_CONNTRACK_PROTO_TCP_FIN_WAIT
+ NFNLGRP_CONNTRACK_PROTO_TCP_CLOSE_WAIT,
+#define NFNLGRP_CONNTRACK_PROTO_TCP_CLOSE_WAIT \
+ NFNLGRP_CONNTRACK_PROTO_TCP_CLOSE_WAIT
+ NFNLGRP_CONNTRACK_PROTO_TCP_LAST_ACK,
+#define NFNLGRP_CONNTRACK_PROTO_TCP_LAST_ACK \
+ NFNLGRP_CONNTRACK_PROTO_TCP_LAST_ACK
+ NFNLGRP_CONNTRACK_PROTO_TCP_TIME_WAIT,
+#define NFNLGRP_CONNTRACK_PROTO_TCP_TIME_WAIT \
+ NFNLGRP_CONNTRACK_PROTO_TCP_TIME_WAIT
+ NFNLGRP_CONNTRACK_PROTO_TCP_CLOSE,
+#define NFNLGRP_CONNTRACK_PROTO_TCP_CLOSE \
+ NFNLGRP_CONNTRACK_PROTO_TCP_CLOSE
+
__NFNLGRP_MAX,
};
#define NFNLGRP_MAX (__NFNLGRP_MAX - 1)
Index: net-2.6.git/include/net/netfilter/nf_conntrack_l4proto.h
===================================================================
--- net-2.6.git.orig/include/net/netfilter/nf_conntrack_l4proto.h 2007-06-11 02:31:08.000000000 +0200
+++ net-2.6.git/include/net/netfilter/nf_conntrack_l4proto.h 2007-06-11 02:31:13.000000000 +0200
@@ -75,6 +75,7 @@ struct nf_conntrack_l4proto
const struct nf_conntrack_tuple *t);
int (*nfattr_to_tuple)(struct nfattr *tb[],
struct nf_conntrack_tuple *t);
+ int (*event_group)(const struct nf_conn *ct);
#ifdef CONFIG_SYSCTL
struct ctl_table_header **ctl_table_header;
Index: net-2.6.git/net/netfilter/nf_conntrack_netlink.c
===================================================================
--- net-2.6.git.orig/net/netfilter/nf_conntrack_netlink.c 2007-06-11 02:31:08.000000000 +0200
+++ net-2.6.git/net/netfilter/nf_conntrack_netlink.c 2007-06-11 02:38:00.000000000 +0200
@@ -4,7 +4,7 @@
* (C) 2001 by Jay Schulist <jschlst@samba.org>
* (C) 2002-2006 by Harald Welte <laforge@gnumonks.org>
* (C) 2003 by Patrick Mchardy <kaber@trash.net>
- * (C) 2005-2006 by Pablo Neira Ayuso <pablo@eurodev.net>
+ * (C) 2005-2007 by Pablo Neira Ayuso <pablo@netfilter.org>
*
* Initial connection tracking via netlink development funded and
* generally made possible by Network Robots, Inc. (www.networkrobots.com)
@@ -307,6 +307,20 @@ nfattr_failure:
}
#ifdef CONFIG_NF_CONNTRACK_EVENTS
+static inline int proto_event_group(const struct nf_conn *ct)
+{
+ int ret = NFNLGRP_NONE;
+ u_int16_t npt = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum;
+ u_int16_t l3num = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num;
+ struct nf_conntrack_l4proto *l4proto;
+
+ l4proto = __nf_ct_l4proto_find(l3num, npt);
+ if (l4proto && l4proto->event_group)
+ ret = l4proto->event_group(ct);
+
+ return ret;
+}
+
static int ctnetlink_conntrack_event(struct notifier_block *this,
unsigned long events, void *ptr)
{
@@ -317,7 +331,8 @@ static int ctnetlink_conntrack_event(str
struct sk_buff *skb;
unsigned int type;
sk_buff_data_t b;
- unsigned int flags = 0, group;
+ unsigned int flags = 0, group, proto_group;
+ bool proto_group_has_listener = false;
/* ignore our fake conntrack entry */
if (ct == &nf_conntrack_untracked)
@@ -336,7 +351,11 @@ static int ctnetlink_conntrack_event(str
} else
return NOTIFY_DONE;
- if (!nfnetlink_has_listeners(group))
+ proto_group = proto_event_group(ct);
+ if (proto_group != NFNLGRP_NONE && nfnetlink_has_listeners(proto_group))
+ proto_group_has_listener = true;
+
+ if (!proto_group_has_listener && !nfnetlink_has_listeners(group))
return NOTIFY_DONE;
skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
@@ -396,7 +415,11 @@ static int ctnetlink_conntrack_event(str
}
nlh->nlmsg_len = skb->tail - b;
+ if (proto_group_has_listener)
+ atomic_inc(&skb->users);
nfnetlink_send(skb, 0, group, 0);
+ if (proto_group_has_listener)
+ nfnetlink_send(skb, 0, proto_group, 0);
return NOTIFY_DONE;
nlmsg_failure:
Index: net-2.6.git/net/netfilter/nf_conntrack_proto_tcp.c
===================================================================
--- net-2.6.git.orig/net/netfilter/nf_conntrack_proto_tcp.c 2007-06-11 02:31:08.000000000 +0200
+++ net-2.6.git/net/netfilter/nf_conntrack_proto_tcp.c 2007-06-11 02:31:13.000000000 +0200
@@ -1171,6 +1171,22 @@ static int nfattr_to_tcp(struct nfattr *
return 0;
}
+
+static int tcp_event_group[TCP_CONNTRACK_MAX] = {
+ [TCP_CONNTRACK_SYN_SENT] = NFNLGRP_CONNTRACK_PROTO_TCP_SYN_SENT,
+ [TCP_CONNTRACK_SYN_RECV] = NFNLGRP_CONNTRACK_PROTO_TCP_SYN_RECV,
+ [TCP_CONNTRACK_ESTABLISHED] = NFNLGRP_CONNTRACK_PROTO_TCP_ESTABLISHED,
+ [TCP_CONNTRACK_FIN_WAIT] = NFNLGRP_CONNTRACK_PROTO_TCP_FIN_WAIT,
+ [TCP_CONNTRACK_CLOSE_WAIT] = NFNLGRP_CONNTRACK_PROTO_TCP_CLOSE_WAIT,
+ [TCP_CONNTRACK_LAST_ACK] = NFNLGRP_CONNTRACK_PROTO_TCP_LAST_ACK,
+ [TCP_CONNTRACK_TIME_WAIT] = NFNLGRP_CONNTRACK_PROTO_TCP_TIME_WAIT,
+ [TCP_CONNTRACK_CLOSE] = NFNLGRP_CONNTRACK_PROTO_TCP_CLOSE,
+};
+
+static int tcp_proto_event_group(const struct nf_conn *ct)
+{
+ return tcp_event_group[ct->proto.tcp.state];
+}
#endif
#ifdef CONFIG_SYSCTL
@@ -1400,6 +1416,7 @@ struct nf_conntrack_l4proto nf_conntrack
.from_nfattr = nfattr_to_tcp,
.tuple_to_nfattr = nf_ct_port_tuple_to_nfattr,
.nfattr_to_tuple = nf_ct_port_nfattr_to_tuple,
+ .event_group = tcp_proto_event_group,
#endif
#ifdef CONFIG_SYSCTL
.ctl_table_users = &tcp_sysctl_table_users,
@@ -1429,6 +1446,7 @@ struct nf_conntrack_l4proto nf_conntrack
.from_nfattr = nfattr_to_tcp,
.tuple_to_nfattr = nf_ct_port_tuple_to_nfattr,
.nfattr_to_tuple = nf_ct_port_nfattr_to_tuple,
+ .event_group = tcp_proto_event_group,
#endif
#ifdef CONFIG_SYSCTL
.ctl_table_users = &tcp_sysctl_table_users,
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] add TCP protocol state event groups
2007-06-11 18:05 [PATCH] add TCP protocol state event groups Pablo Neira Ayuso
@ 2007-06-19 13:33 ` Patrick McHardy
2007-06-19 14:13 ` Pablo Neira Ayuso
2007-07-02 9:40 ` [PATCH] add TCP protocol state event groups Holger Eitzenberger
0 siblings, 2 replies; 12+ messages in thread
From: Patrick McHardy @ 2007-06-19 13:33 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Netfilter Development Mailinglist
Pablo Neira Ayuso wrote:
> [CTNETLINK] add TCP protocol state event groups
>
> This patch adds per-protocol state event groups, so one can only listen to a
> certain TCP state change such as ESTABLISHED. Although such per-state message
> filtering could be done in userspace, we save CPU cycles since the kernel does
> not need to build and delivery messages that will be later discarded in
> userspace. This patch is particularly useful for conntrackd.
I can see that this is useful, but one group per protocol state
sounds rather excessive, I would expect that we could group them
more logically, maybe "connection setup, teardown and updates"?
Which states is conntrackd particulary interested in?
I would also like to hear from Holger whether his conntrack daemon
could make use of a mechnism like this too and if the filtering
capabilities you propose will do.
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>
> --- net-2.6.git.orig/net/netfilter/nf_conntrack_netlink.c 2007-06-11 02:31:08.000000000 +0200
> +++ net-2.6.git/net/netfilter/nf_conntrack_netlink.c 2007-06-11 02:38:00.000000000 +0200
> @@ -317,7 +331,8 @@ static int ctnetlink_conntrack_event(str
> struct sk_buff *skb;
> unsigned int type;
> sk_buff_data_t b;
> - unsigned int flags = 0, group;
> + unsigned int flags = 0, group, proto_group;
> + bool proto_group_has_listener = false;
>
> /* ignore our fake conntrack entry */
> if (ct == &nf_conntrack_untracked)
> @@ -336,7 +351,11 @@ static int ctnetlink_conntrack_event(str
> } else
> return NOTIFY_DONE;
>
> - if (!nfnetlink_has_listeners(group))
> + proto_group = proto_event_group(ct);
> + if (proto_group != NFNLGRP_NONE && nfnetlink_has_listeners(proto_group))
> + proto_group_has_listener = true;
> +
> + if (!proto_group_has_listener && !nfnetlink_has_listeners(group))
> return NOTIFY_DONE;
>
> skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
> @@ -396,7 +415,11 @@ static int ctnetlink_conntrack_event(str
> }
>
> nlh->nlmsg_len = skb->tail - b;
> + if (proto_group_has_listener)
> + atomic_inc(&skb->users);
> nfnetlink_send(skb, 0, group, 0);
This will always send to the main group even if only the proto group
has listeners.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] add TCP protocol state event groups
2007-06-19 13:33 ` Patrick McHardy
@ 2007-06-19 14:13 ` Pablo Neira Ayuso
2007-06-19 14:44 ` Patrick McHardy
2007-07-02 9:40 ` [PATCH] add TCP protocol state event groups Holger Eitzenberger
1 sibling, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2007-06-19 14:13 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Netfilter Development Mailinglist
Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> [CTNETLINK] add TCP protocol state event groups
>>
>> This patch adds per-protocol state event groups, so one can only listen to a
>> certain TCP state change such as ESTABLISHED. Although such per-state message
>> filtering could be done in userspace, we save CPU cycles since the kernel does
>> not need to build and delivery messages that will be later discarded in
>> userspace. This patch is particularly useful for conntrackd.
>
> I can see that this is useful, but one group per protocol state
> sounds rather excessive, I would expect that we could group them
> more logically, maybe "connection setup, teardown and updates"?
> Which states is conntrackd particulary interested in?
Well, why just save a couple of groups if we've got 2^32 event groups?
Moreover, per protocol state seems to me the most fine-grain and
flexible solution. Depending on the replication schema I might be
interested in different states.
> I would also like to hear from Holger whether his conntrack daemon
> could make use of a mechnism like this too and if the filtering
> capabilities you propose will do.
I'm sure he will benefit of it. Currently there are two main CPU cycle
consumers: event delivery and network transmission, and it is linked to
the number of messages generated. Not surprisingly, if we reduce the
number of messages generated, we reduce CPU consumption. Sysadmins may
enable this tradeoff. BTW, where's Holger's code? :)
I have a paper here on conntrackd that I can't release yet. Would you be
interested in reviewing it? In return, you'll see all the work that I've
currently done. Do you have some minor spare cycle in your busy agenda? :)
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>
>> --- net-2.6.git.orig/net/netfilter/nf_conntrack_netlink.c 2007-06-11 02:31:08.000000000 +0200
>> +++ net-2.6.git/net/netfilter/nf_conntrack_netlink.c 2007-06-11 02:38:00.000000000 +0200
>
>> @@ -317,7 +331,8 @@ static int ctnetlink_conntrack_event(str
>> struct sk_buff *skb;
>> unsigned int type;
>> sk_buff_data_t b;
>> - unsigned int flags = 0, group;
>> + unsigned int flags = 0, group, proto_group;
>> + bool proto_group_has_listener = false;
>>
>> /* ignore our fake conntrack entry */
>> if (ct == &nf_conntrack_untracked)
>> @@ -336,7 +351,11 @@ static int ctnetlink_conntrack_event(str
>> } else
>> return NOTIFY_DONE;
>>
>> - if (!nfnetlink_has_listeners(group))
>> + proto_group = proto_event_group(ct);
>> + if (proto_group != NFNLGRP_NONE && nfnetlink_has_listeners(proto_group))
>> + proto_group_has_listener = true;
>> +
>> + if (!proto_group_has_listener && !nfnetlink_has_listeners(group))
>> return NOTIFY_DONE;
>>
>> skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
>> @@ -396,7 +415,11 @@ static int ctnetlink_conntrack_event(str
>> }
>>
>> nlh->nlmsg_len = skb->tail - b;
>> + if (proto_group_has_listener)
>> + atomic_inc(&skb->users);
>> nfnetlink_send(skb, 0, group, 0);
>
> This will always send to the main group even if only the proto group
> has listeners.
I can improve that. Anyway, AFAIK the main cost here is the message
allocation and setup. Since we have already do it for the protocol
group, netlink will just notice itself that there's no listeners for
that event just a bit later.
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] add TCP protocol state event groups
2007-06-19 14:13 ` Pablo Neira Ayuso
@ 2007-06-19 14:44 ` Patrick McHardy
2007-06-20 17:17 ` Pablo Neira Ayuso
0 siblings, 1 reply; 12+ messages in thread
From: Patrick McHardy @ 2007-06-19 14:44 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Netfilter Development Mailinglist
Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>
>>>This patch adds per-protocol state event groups, so one can only listen to a
>>>certain TCP state change such as ESTABLISHED. Although such per-state message
>>>filtering could be done in userspace, we save CPU cycles since the kernel does
>>>not need to build and delivery messages that will be later discarded in
>>>userspace. This patch is particularly useful for conntrackd.
>>
>>I can see that this is useful, but one group per protocol state
>>sounds rather excessive, I would expect that we could group them
>>more logically, maybe "connection setup, teardown and updates"?
>>Which states is conntrackd particulary interested in?
>
>
> Well, why just save a couple of groups if we've got 2^32 event groups?
> Moreover, per protocol state seems to me the most fine-grain and
> flexible solution. Depending on the replication schema I might be
> interested in different states.
Its not only about saving groups. A scheme like this only makes
sense if you introduce groups for every tiny bit, otherwise you
need to subscribe to the "global" group anyway to get the remaining
"unclassified" events you're interested in. And that not only uses
a lot of groups, it also requires dispatching the same event to
potentially many groups. I'm interested, do you already use this
feature in conntrackd? If yes, how do you deal with UDP etc. that
you didn't introduce new groups for?
>>I would also like to hear from Holger whether his conntrack daemon
>>could make use of a mechnism like this too and if the filtering
>>capabilities you propose will do.
>
>
> I'm sure he will benefit of it. Currently there are two main CPU cycle
> consumers: event delivery and network transmission, and it is linked to
> the number of messages generated. Not surprisingly, if we reduce the
> number of messages generated, we reduce CPU consumption. Sysadmins may
> enable this tradeoff. BTW, where's Holger's code? :)
I believe we're going to see it at the workshop.
> I have a paper here on conntrackd that I can't release yet. Would you be
> interested in reviewing it? In return, you'll see all the work that I've
> currently done. Do you have some minor spare cycle in your busy agenda? :)
I can try :)
>
>
>>>Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>>
>>>--- net-2.6.git.orig/net/netfilter/nf_conntrack_netlink.c 2007-06-11 02:31:08.000000000 +0200
>>>+++ net-2.6.git/net/netfilter/nf_conntrack_netlink.c 2007-06-11 02:38:00.000000000 +0200
>>
>>>@@ -317,7 +331,8 @@ static int ctnetlink_conntrack_event(str
>>> struct sk_buff *skb;
>>> unsigned int type;
>>> sk_buff_data_t b;
>>>- unsigned int flags = 0, group;
>>>+ unsigned int flags = 0, group, proto_group;
>>>+ bool proto_group_has_listener = false;
>>>
>>> /* ignore our fake conntrack entry */
>>> if (ct == &nf_conntrack_untracked)
>>>@@ -336,7 +351,11 @@ static int ctnetlink_conntrack_event(str
>>> } else
>>> return NOTIFY_DONE;
>>>
>>>- if (!nfnetlink_has_listeners(group))
>>>+ proto_group = proto_event_group(ct);
>>>+ if (proto_group != NFNLGRP_NONE && nfnetlink_has_listeners(proto_group))
>>>+ proto_group_has_listener = true;
>>>+
>>>+ if (!proto_group_has_listener && !nfnetlink_has_listeners(group))
>>> return NOTIFY_DONE;
>>>
>>> skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
>>>@@ -396,7 +415,11 @@ static int ctnetlink_conntrack_event(str
>>> }
>>>
>>> nlh->nlmsg_len = skb->tail - b;
>>>+ if (proto_group_has_listener)
>>>+ atomic_inc(&skb->users);
>>> nfnetlink_send(skb, 0, group, 0);
>>
>>This will always send to the main group even if only the proto group
>>has listeners.
>
>
> I can improve that. Anyway, AFAIK the main cost here is the message
> allocation and setup. Since we have already do it for the protocol
> group, netlink will just notice itself that there's no listeners for
> that event just a bit later.
There's more overhead, before af_netlink notices that no listeners
are present it will reallocate and trim the skb. This should be
avoided anyway by using a better fitting allocation size though.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] add TCP protocol state event groups
2007-06-19 14:44 ` Patrick McHardy
@ 2007-06-20 17:17 ` Pablo Neira Ayuso
2007-06-20 17:40 ` Patrick McHardy
0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2007-06-20 17:17 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Netfilter Development Mailinglist
Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> Patrick McHardy wrote:
>>
>>>> This patch adds per-protocol state event groups, so one can only listen to a
>>>> certain TCP state change such as ESTABLISHED. Although such per-state message
>>>> filtering could be done in userspace, we save CPU cycles since the kernel does
>>>> not need to build and delivery messages that will be later discarded in
>>>> userspace. This patch is particularly useful for conntrackd.
>>> I can see that this is useful, but one group per protocol state
>>> sounds rather excessive, I would expect that we could group them
>>> more logically, maybe "connection setup, teardown and updates"?
>>> Which states is conntrackd particulary interested in?
>>
>> Well, why just save a couple of groups if we've got 2^32 event groups?
>> Moreover, per protocol state seems to me the most fine-grain and
>> flexible solution. Depending on the replication schema I might be
>> interested in different states.
>
> Its not only about saving groups. A scheme like this only makes
> sense if you introduce groups for every tiny bit, otherwise you
> need to subscribe to the "global" group anyway to get the remaining
> "unclassified" events you're interested in. And that not only uses
> a lot of groups, it also requires dispatching the same event to
> potentially many groups. I'm interested, do you already use this
> feature in conntrackd? If yes, how do you deal with UDP etc. that
> you didn't introduce new groups for?
The patch is incomplete but you can guess the schema. Consider a patch
to add the following groups:
NFNLGRP_CONNTRACK_PROTO_UDP_NEW,
NFNLGRP_CONNTRACK_PROTO_UDP_UPDATE,
NFNLGRP_CONNTRACK_PROTO_UDP_DESTROY,
NFNLGRP_CONNTRACK_PROTO_ICMP_NEW,
NFNLGRP_CONNTRACK_PROTO_ICMP_UPDATE,
NFNLGRP_CONNTRACK_PROTO_ICMP_DESTROY,
NFNLGRP_CONNTRACK_PROTO_SCTP_...,
NFNLGRP_CONNTRACK_PROTO_UNCLASSIFIED_NEW,
NFNLGRP_CONNTRACK_PROTO_UNCLASSIFIED_UPDATE,
NFNLGRP_CONNTRACK_PROTO_UNCLASSIFIED_DESTROY
At a glance I see the function netlink_update_listeners that is O(n),
that could be the bottleneck if we have a lot of groups. Still
objections with this approach?
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] add TCP protocol state event groups
2007-06-20 17:17 ` Pablo Neira Ayuso
@ 2007-06-20 17:40 ` Patrick McHardy
2007-06-21 15:47 ` Pablo Neira Ayuso
0 siblings, 1 reply; 12+ messages in thread
From: Patrick McHardy @ 2007-06-20 17:40 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Netfilter Development Mailinglist
Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>
>>> Well, why just save a couple of groups if we've got 2^32 event groups?
>>> Moreover, per protocol state seems to me the most fine-grain and
>>> flexible solution. Depending on the replication schema I might be
>>> interested in different states.
>>
>>
>> Its not only about saving groups. A scheme like this only makes
>> sense if you introduce groups for every tiny bit, otherwise you
>> need to subscribe to the "global" group anyway to get the remaining
>> "unclassified" events you're interested in. And that not only uses
>> a lot of groups, it also requires dispatching the same event to
>> potentially many groups. I'm interested, do you already use this
>> feature in conntrackd? If yes, how do you deal with UDP etc. that
>> you didn't introduce new groups for?
You skipped my question.
> The patch is incomplete but you can guess the schema. Consider a patch
> to add the following groups:
>
> NFNLGRP_CONNTRACK_PROTO_UDP_NEW,
> NFNLGRP_CONNTRACK_PROTO_UDP_UPDATE,
> NFNLGRP_CONNTRACK_PROTO_UDP_DESTROY,
> NFNLGRP_CONNTRACK_PROTO_ICMP_NEW,
> NFNLGRP_CONNTRACK_PROTO_ICMP_UPDATE,
> NFNLGRP_CONNTRACK_PROTO_ICMP_DESTROY,
> NFNLGRP_CONNTRACK_PROTO_SCTP_...,
> NFNLGRP_CONNTRACK_PROTO_UNCLASSIFIED_NEW,
> NFNLGRP_CONNTRACK_PROTO_UNCLASSIFIED_UPDATE,
> NFNLGRP_CONNTRACK_PROTO_UNCLASSIFIED_DESTROY
>
> At a glance I see the function netlink_update_listeners that is O(n),
> that could be the bottleneck if we have a lot of groups. Still
> objections with this approach?
So far I presume conntrackd still listens to the global group. So
I would like to know how many groups we'll end up with until this
scheme is really fine-grained enough that it allows conntrackd
to avoid listening to the global group and what types of events
it would ignore (so I can understand the performance improvement
this might yield). The above still only contains protocol specific
groups.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] add TCP protocol state event groups
2007-06-20 17:40 ` Patrick McHardy
@ 2007-06-21 15:47 ` Pablo Neira Ayuso
2007-06-21 16:04 ` Patrick McHardy
0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2007-06-21 15:47 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Netfilter Development Mailinglist
Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> Patrick McHardy wrote:
>>
>>>> Well, why just save a couple of groups if we've got 2^32 event groups?
>>>> Moreover, per protocol state seems to me the most fine-grain and
>>>> flexible solution. Depending on the replication schema I might be
>>>> interested in different states.
>>>
>>> Its not only about saving groups. A scheme like this only makes
>>> sense if you introduce groups for every tiny bit, otherwise you
>>> need to subscribe to the "global" group anyway to get the remaining
>>> "unclassified" events you're interested in. And that not only uses
>>> a lot of groups, it also requires dispatching the same event to
>>> potentially many groups. I'm interested, do you already use this
>>> feature in conntrackd? If yes, how do you deal with UDP etc. that
>>> you didn't introduce new groups for?
>
> You skipped my question.
Sorry, I'm lazy :-)
>> The patch is incomplete but you can guess the schema. Consider a patch
>> to add the following groups:
>>
>> NFNLGRP_CONNTRACK_PROTO_UDP_NEW,
>> NFNLGRP_CONNTRACK_PROTO_UDP_UPDATE,
>> NFNLGRP_CONNTRACK_PROTO_UDP_DESTROY,
>> NFNLGRP_CONNTRACK_PROTO_ICMP_NEW,
>> NFNLGRP_CONNTRACK_PROTO_ICMP_UPDATE,
>> NFNLGRP_CONNTRACK_PROTO_ICMP_DESTROY,
>> NFNLGRP_CONNTRACK_PROTO_SCTP_...,
>> NFNLGRP_CONNTRACK_PROTO_UNCLASSIFIED_NEW,
>> NFNLGRP_CONNTRACK_PROTO_UNCLASSIFIED_UPDATE,
>> NFNLGRP_CONNTRACK_PROTO_UNCLASSIFIED_DESTROY
>>
>> At a glance I see the function netlink_update_listeners that is O(n),
>> that could be the bottleneck if we have a lot of groups. Still
>> objections with this approach?
>
> So far I presume conntrackd still listens to the global group. So
> I would like to know how many groups we'll end up with until this
> scheme is really fine-grained enough that it allows conntrackd
> to avoid listening to the global group and what types of events
> it would ignore (so I can understand the performance improvement
> this might yield). The above still only contains protocol specific
> groups.
OK, let me develop the issue a bit more. In my initial experiments, I
used the patch to replicate only TCP connections, so conntrackd was not
listening the global group but just TCP state events. I currently have a
clause that looks like:
Replicate ESTABLISHED TIME_WAIT for TCP
Thus, only established and time_wait states are replicated. My plan is
to add similar clauses for other protocols so:
Replicate ESTABLISHED for STCP
Replicate NEW DESTROY for UDP
Replica NEW DESTROY for UNCLASSIFIED
The sysadmin manually may tune the states that it wants to replicate to
reduce the CPU overhead.
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] add TCP protocol state event groups
2007-06-21 15:47 ` Pablo Neira Ayuso
@ 2007-06-21 16:04 ` Patrick McHardy
2007-06-21 19:11 ` states worth to replicate [was Re: [PATCH] add TCP protocol state event groups] Pablo Neira Ayuso
0 siblings, 1 reply; 12+ messages in thread
From: Patrick McHardy @ 2007-06-21 16:04 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Netfilter Development Mailinglist
Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>
>>So far I presume conntrackd still listens to the global group. So
>>I would like to know how many groups we'll end up with until this
>>scheme is really fine-grained enough that it allows conntrackd
>>to avoid listening to the global group and what types of events
>>it would ignore (so I can understand the performance improvement
>>this might yield). The above still only contains protocol specific
>>groups.
>
>
> OK, let me develop the issue a bit more. In my initial experiments, I
> used the patch to replicate only TCP connections, so conntrackd was not
> listening the global group but just TCP state events. I currently have a
> clause that looks like:
>
> Replicate ESTABLISHED TIME_WAIT for TCP
>
> Thus, only established and time_wait states are replicated. My plan is
> to add similar clauses for other protocols so:
Is that set of states useful in real-life? How are connections
destroyed?
> Replicate ESTABLISHED for STCP
What does this exactly mean? Replicate messages that put the connection
in established state, or replicate all messages that happen while the
connection is in established state?
> Replicate NEW DESTROY for UDP
There aren't any other events for UDP anyway except STATUS
for the assured bit, and I guess you do want to replicate
that as well.
> Replica NEW DESTROY for UNCLASSIFIED
Same here, assuming UNCLASSIFIED will be proto generic in the
end.
Again, what I would like to understand before we add something
we have to live with forever is what it will look like in the
end and what the gain of it is, so I would like to see a realistic
example. Since the vast majority of updates is in TCP ESTABLISHED
state and basically everyone is going to synchronize that I have
a hard time believing that this is really going to give a notable
performance improvement.
I also only see new groups related to protocol states, what
about helpers, timeouts, ...?
^ permalink raw reply [flat|nested] 12+ messages in thread
* states worth to replicate [was Re: [PATCH] add TCP protocol state event groups]
2007-06-21 16:04 ` Patrick McHardy
@ 2007-06-21 19:11 ` Pablo Neira Ayuso
2007-06-22 12:49 ` Patrick McHardy
0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2007-06-21 19:11 UTC (permalink / raw)
To: Patrick McHardy
Cc: Netfilter Development Mailinglist, Netfilter-failover list
Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> OK, let me develop the issue a bit more. In my initial experiments, I
>> used the patch to replicate only TCP connections, so conntrackd was not
>> listening the global group but just TCP state events. I currently have a
>> clause that looks like:
>>
>> Replicate ESTABLISHED TIME_WAIT for TCP
>>
>> Thus, only established and time_wait states are replicated. My plan is
>> to add similar clauses for other protocols so:
>
> Is that set of states useful in real-life?
I think so. In general, syn states have a very short lifetime. Thus,
they are quickly superseded by established, making the replication
effort barely useful. Moreover, I assume that clients usually reissue
connections that fail in (early) syn states. I'm assuming that we are
focusing on the recovery of connection established.
> How are connections destroyed?
Once we received the time_wait, we tell other replicas to destroy the
connection. We miss previous teardown states that have also short
lifetime. This is a tradeoff, we reduce the "copy-equivalence" degree
among replicas but reduce the number of messages generated in return.
>> Replicate ESTABLISHED for STCP
>
> What does this exactly mean? Replicate messages that put the connection
> in established state, or replicate all messages that happen while the
> connection is in established state?
Replicate all messages that happen while the connection is in
established state. So, the protocol groups are just a different way to
group events.
>> Replicate NEW DESTROY for UDP
>
> There aren't any other events for UDP anyway except STATUS
> for the assured bit, and I guess you do want to replicate
> that as well.
Right. In this case, I think that it would be worth replicating assured
and destroy states.
>> Replica NEW DESTROY for UNCLASSIFIED
>
> Same here, assuming UNCLASSIFIED will be proto generic in the
> end.
Same thing as above.
> Again, what I would like to understand before we add something
> we have to live with forever is what it will look like in the
> end and what the gain of it is, so I would like to see a realistic
> example. Since the vast majority of updates is in TCP ESTABLISHED
> state and basically everyone is going to synchronize that I have
> a hard time believing that this is really going to give a notable
> performance improvement.
Well, since there is usually just one messages per state change in the
case of TCP connections, we would just generate two, one to notify the
established state and one to destroy it. Thus, in the particular case of
a HTTP connection, we would need just two messages instead of seven.
> I also only see new groups related to protocol states, what
> about helpers, timeouts, ...?
Same thing as above, the protocol events groups are just a way to group
events. Thus, if the conntrack mark changes while in established state,
an event message to the established group is delivered.
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: states worth to replicate [was Re: [PATCH] add TCP protocol state event groups]
2007-06-21 19:11 ` states worth to replicate [was Re: [PATCH] add TCP protocol state event groups] Pablo Neira Ayuso
@ 2007-06-22 12:49 ` Patrick McHardy
0 siblings, 0 replies; 12+ messages in thread
From: Patrick McHardy @ 2007-06-22 12:49 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Netfilter Development Mailinglist, Netfilter-failover list
Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>
>>>Replicate ESTABLISHED TIME_WAIT for TCP
>>>
>>>Thus, only established and time_wait states are replicated. My plan is
>>>to add similar clauses for other protocols so:
>>
>>Is that set of states useful in real-life?
>
>
> I think so. In general, syn states have a very short lifetime. Thus,
> they are quickly superseded by established, making the replication
> effort barely useful. Moreover, I assume that clients usually reissue
> connections that fail in (early) syn states. I'm assuming that we are
> focusing on the recovery of connection established.
>
>
>>How are connections destroyed?
>
>
> Once we received the time_wait, we tell other replicas to destroy the
> connection. We miss previous teardown states that have also short
> lifetime. This is a tradeoff, we reduce the "copy-equivalence" degree
> among replicas but reduce the number of messages generated in return.
Connections don't go through TIME_WAIT in case they are reset.
>>Again, what I would like to understand before we add something
>>we have to live with forever is what it will look like in the
>>end and what the gain of it is, so I would like to see a realistic
>>example. Since the vast majority of updates is in TCP ESTABLISHED
>>state and basically everyone is going to synchronize that I have
>>a hard time believing that this is really going to give a notable
>>performance improvement.
>
>
> Well, since there is usually just one messages per state change in the
> case of TCP connections, we would just generate two, one to notify the
> established state and one to destroy it. Thus, in the particular case of
> a HTTP connection, we would need just two messages instead of seven.
>
>
>>I also only see new groups related to protocol states, what
>>about helpers, timeouts, ...?
>
>
> Same thing as above, the protocol events groups are just a way to group
> events. Thus, if the conntrack mark changes while in established state,
> an event message to the established group is delivered.
Mhh OK. I'll think about it some more over the weekend.
Something not directly related: we're generating events (internally)
for every packet and call the notifiers, yet we don't use them for
anything (for example IPCT_PROTOINFO_VOLATILE and IPCT_REFRESH),
which seems like a complete waste. I think we should remove them.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] add TCP protocol state event groups
2007-06-19 13:33 ` Patrick McHardy
2007-06-19 14:13 ` Pablo Neira Ayuso
@ 2007-07-02 9:40 ` Holger Eitzenberger
2007-07-02 15:36 ` Patrick McHardy
1 sibling, 1 reply; 12+ messages in thread
From: Holger Eitzenberger @ 2007-07-02 9:40 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Pablo Neira Ayuso
Patrick McHardy wrote:
>> This patch adds per-protocol state event groups, so one can only listen to a
>> certain TCP state change such as ESTABLISHED. Although such per-state message
>> filtering could be done in userspace, we save CPU cycles since the kernel does
>> not need to build and delivery messages that will be later discarded in
>> userspace. This patch is particularly useful for conntrackd.
> I can see that this is useful, but one group per protocol state
> sounds rather excessive, I would expect that we could group them
> more logically, maybe "connection setup, teardown and updates"?
> Which states is conntrackd particulary interested in?
> I would also like to hear from Holger whether his conntrack daemon
> could make use of a mechnism like this too and if the filtering
> capabilities you propose will do.
Hi :)
Sorry for my late answer, but I am currently rather busy with other stuff.
ctsyncd currently does quite some per-protocol filtering, e.g. TCP
connections currently get synced only after being in the ESTABLISHED
state and deleted on the slave side if the teardown event is received.
This greatly reduces the number of events which are send to the ctsyncd
peer.
Although ctsyncd does not have a performance problem at all even if I
run it against our Spirent performance system I generally like the idea
of doing some of the filtering in kernelspace. Though I would leave the
userspace filtering code as a fallback solution in place.
As of July I have a new colleague which hopefully takes over some of the
workloads I currently have. My plan therefore for ctsyncd is to finally
improve filtering capability of ctsyncd and make if configurable, as
currently most of the filtering code is still hardcoded. My plan is
also to improve the partnership with keepalived et all, basically making
this part more configurable too. This version will then be my 1.0
release, as it currently runs fine for serveral months now on our ASG
products.
So hopefully I am able to release ctsyncd 1.0 into the public within the
next few weeks, especially with Netfilter Workshop in mind.
Hope that helps.
/holger
--
Holger Eitzenberger <heitzenberger@astaro.com> | Kernel Developer
Astaro AG | www.astaro.com | Phone +49-721-25516-246 | Fax -200
Download your free 30 day trial version of the award-winning Astaro
Security Gateway solution here: https://my.astaro.com/download/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] add TCP protocol state event groups
2007-07-02 9:40 ` [PATCH] add TCP protocol state event groups Holger Eitzenberger
@ 2007-07-02 15:36 ` Patrick McHardy
0 siblings, 0 replies; 12+ messages in thread
From: Patrick McHardy @ 2007-07-02 15:36 UTC (permalink / raw)
To: Holger Eitzenberger; +Cc: Netfilter Development Mailinglist, Pablo Neira Ayuso
Holger Eitzenberger wrote:
> Patrick McHardy wrote:
>
>> I can see that this is useful, but one group per protocol state
>> sounds rather excessive, I would expect that we could group them
>> more logically, maybe "connection setup, teardown and updates"?
>> Which states is conntrackd particulary interested in?
>
>
>> I would also like to hear from Holger whether his conntrack daemon
>> could make use of a mechnism like this too and if the filtering
>> capabilities you propose will do.
>
>
> ctsyncd currently does quite some per-protocol filtering, e.g. TCP
> connections currently get synced only after being in the ESTABLISHED
> state and deleted on the slave side if the teardown event is received.
> This greatly reduces the number of events which are send to the ctsyncd
> peer.
>
> Although ctsyncd does not have a performance problem at all even if I
> run it against our Spirent performance system I generally like the idea
> of doing some of the filtering in kernelspace. Though I would leave the
> userspace filtering code as a fallback solution in place.
I agree that kernel space filtering is useful, I'm just not convinced
of this particular approach. I would much rather like to see something
that is not specific to nf_conntrack_netlink and can be specified per
socket, something like bpf for netlink.
> As of July I have a new colleague which hopefully takes over some of the
> workloads I currently have. My plan therefore for ctsyncd is to finally
> improve filtering capability of ctsyncd and make if configurable, as
> currently most of the filtering code is still hardcoded. My plan is
> also to improve the partnership with keepalived et all, basically making
> this part more configurable too. This version will then be my 1.0
> release, as it currently runs fine for serveral months now on our ASG
> products.
>
> So hopefully I am able to release ctsyncd 1.0 into the public within the
> next few weeks, especially with Netfilter Workshop in mind.
Great.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-07-02 15:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-11 18:05 [PATCH] add TCP protocol state event groups Pablo Neira Ayuso
2007-06-19 13:33 ` Patrick McHardy
2007-06-19 14:13 ` Pablo Neira Ayuso
2007-06-19 14:44 ` Patrick McHardy
2007-06-20 17:17 ` Pablo Neira Ayuso
2007-06-20 17:40 ` Patrick McHardy
2007-06-21 15:47 ` Pablo Neira Ayuso
2007-06-21 16:04 ` Patrick McHardy
2007-06-21 19:11 ` states worth to replicate [was Re: [PATCH] add TCP protocol state event groups] Pablo Neira Ayuso
2007-06-22 12:49 ` Patrick McHardy
2007-07-02 9:40 ` [PATCH] add TCP protocol state event groups Holger Eitzenberger
2007-07-02 15:36 ` Patrick McHardy
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.