From: Patrick McHardy <kaber@trash.net>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH 1/2] netfilter: conntrack: move event cache to conntrack extension infrastructure
Date: Fri, 05 Jun 2009 16:13:47 +0200 [thread overview]
Message-ID: <4A29281B.6010607@trash.net> (raw)
In-Reply-To: <4A291863.4090604@netfilter.org>
Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>>>
>>> - /* New conntrack */
>>> - IPCT_NEW_BIT = 0,
>>> - IPCT_NEW = (1 << IPCT_NEW_BIT),
>>> -
>>> -...
>>> + IPCT_NEW = 0, /* new conntrack */
>>
>> Why this change? Further down, you change the code to use
>> (1 << IPCT_*), which isn't really an improvement.
>
> Oh, this is not intended to be an improvement. I needed to change this
> to use bitopts operations, that's all. Or I could use IPCT_*_BIT and
> change every reference to this in the whole conntrack table, but the
> patch would be much bigger and noisy :).
The major part is in ctnetlink I think, and you do seem to touch every
one of them :) But OK, using _BIT for all the events doesn't seem too
appealing either.
>>> +
>>> + e = nf_ct_ecache_find(ct);
>>> + if (e == NULL)
>>> + return;
>>> +
>>> + set_bit(event, &e->cache);
>>
>> This looks quite expensive, given how often this operation is performed.
>> Did you benchmark this?
>
> I'll do that benchmark. I was initially using nf_conntrack_lock to
> protect the per-conntrack event cache, I think that use bitops is a
> bit better?
I actually meant the lookup done potentially multiple times per packet.
But I incorrectly thought that it was more expensive, that seems fine.
>>> @@ -8,12 +8,14 @@ enum nf_ct_ext_id
>>> NF_CT_EXT_HELPER,
>>> NF_CT_EXT_NAT,
>>> NF_CT_EXT_ACCT,
>>> + NF_CT_EXT_ECACHE,
>>> NF_CT_EXT_NUM,
>>
>> Quoting nf_conntrack_extend.c:
>>
>> /* This assumes that extended areas in conntrack for the types
>> whose NF_CT_EXT_F_PREALLOC bit set are allocated in order */
>>
>> Is that actually the case here? It might be beneficial to move
>> this before accounting if possible, I guess its used more often.
>
> I think that accounting information is updated more often. Events are
> only updated for very few packet specifically the setup and the
> tear-down packets of a flow.
No, events are only sent to userspace every seldom. But f.i. TCP
conntrack generates at least one event per packet.
But what I actually meant was that its used more often I think.
Never mind, also forget about the PREALLOC question, I should
have read what I pasted :) Of course you could add the PREALLOC
flag, when events are enabled you add the extension for every
conntrack anyways.
>>> @@ -738,6 +739,9 @@ nf_conntrack_in(struct net *net, u_int8_t pf,
>>> unsigned int hooknum,
>>>
>>> NF_CT_ASSERT(skb->nfct);
>>>
>>> + /* We may have pending events, deliver them and clear the cache */
>>> + nf_ct_deliver_cached_events(ct);
>>
>> How does this guarantee that an event will be delivered in time?
>> As far as I can see, it might never be delivered, or at least not
>> until a timeout occurs.
>
> Yes, that's the idea. In short, if we have cached events, we trigger a
> delivery via ctnetlink. If the delivery fails, we keep the cached
> events and the next packet will trigger a new delivery. We can lose
> events but at worse case, the destroy will be delivered.
OK, so this is essentially replacing the delivery we did previously when
beginning to cache events for a different conntrack. Thats fine and
necessary
in case a packet triggering an event didn't made it past POST_ROUTING.
>
> If we add a new conntrack extension to store the creation and
> destruction time of the conntrack entries, we can have reliable
> flow-accouting since, at least, the destroy event will be delivered.
> In the case of the state synchronization, we aim to ensure that
> long-standing flows survive failures, thus, under event loss, the
> backup nodes would gett the state after some tries, specially for
> long-standing flows since every packet would trigger another delivery
> in case of problems.
*conntrackd* aims at that I think :) Anyways, on second - third thought, I
agree that this is fine, the previous guarantees weren't any stronger.
>
> BTW, I have removed that line locally. So the next delivery try for
> pending events is done only in nf_conntrack_confirm(), not twice, one
> in nf_conntrack_in() and nf_conntrack_confirm() as it happens in this
> patch.
That means we're only delivering events if a packet actually made
it through. I guess this is fine too, ideally we wouldn't even have
state transistions.
>>> @@ -1123,6 +1123,8 @@ ctnetlink_change_conntrack(struct nf_conn *ct,
>>> struct nlattr *cda[])
>>> err = ctnetlink_change_helper(ct, cda);
>>> if (err < 0)
>>> return err;
>>> +
>>> + nf_conntrack_event_cache(IPCT_HELPER, ct);
>>
>> Why are we suddenly caching a lot more events manually?
>
> Currently, in user-space triggered events, we are including in the
> event message some fields that may not have been updated. Now we can
> provide more accurante events by notifying only the conntrack object
> fields that have been updated.
>
The patch is already pretty large, please seperate that part if
doesn't has to be in this patch to make it work.
next prev parent reply other threads:[~2009-06-05 14:13 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-04 11:07 [PATCH 0/2] Pablo Neira Ayuso
2009-06-04 11:08 ` [PATCH 1/2] netfilter: conntrack: move event cache to conntrack extension infrastructure Pablo Neira Ayuso
2009-06-04 12:16 ` Pablo Neira Ayuso
2009-06-05 11:04 ` Patrick McHardy
2009-06-05 13:06 ` Pablo Neira Ayuso
2009-06-05 14:13 ` Patrick McHardy [this message]
2009-06-06 6:24 ` Pablo Neira Ayuso
2009-06-04 11:08 ` [PATCH 2/2] netfilter: conntrack: optional reliable conntrack event delivery Pablo Neira Ayuso
2009-06-05 14:37 ` Patrick McHardy
2009-06-06 6:34 ` Pablo Neira Ayuso
2009-06-08 13:49 ` Patrick McHardy
2009-06-09 22:36 ` Pablo Neira Ayuso
2009-06-09 22:43 ` Patrick McHardy
2009-06-09 22:45 ` Patrick McHardy
2009-06-09 22:58 ` Pablo Neira Ayuso
2009-06-10 1:18 ` Eric Dumazet
2009-06-10 9:55 ` Patrick McHardy
2009-06-10 10:36 ` Pablo Neira Ayuso
2009-06-10 10:55 ` Patrick McHardy
2009-06-10 11:01 ` Patrick McHardy
2009-06-10 11:40 ` Patrick McHardy
2009-06-10 12:22 ` Pablo Neira Ayuso
2009-06-10 12:27 ` Patrick McHardy
2009-06-10 12:43 ` Pablo Neira Ayuso
2009-06-10 12:56 ` Patrick McHardy
2009-06-10 12:26 ` Jozsef Kadlecsik
2009-06-10 12:30 ` Patrick McHardy
2009-06-10 12:41 ` Patrick McHardy
2009-06-04 11:17 ` [PATCH 0/2] reliable per-conntrack event cache Pablo Neira Ayuso
-- strict thread matches above, loose matches on Subject: below --
2009-05-04 13:53 [PATCH 0/2] conntrack event subsystem updates for 2.6.31 (part 2) Pablo Neira Ayuso
2009-05-04 13:53 ` [PATCH 1/2] netfilter: conntrack: move event cache to conntrack extension infrastructure Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A29281B.6010607@trash.net \
--to=kaber@trash.net \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).