From: Patrick McHardy <kaber@trash.net>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH 2/2] netfilter: conntrack: optional reliable conntrack event delivery
Date: Fri, 05 Jun 2009 16:37:43 +0200 [thread overview]
Message-ID: <4A292DB7.4000000@trash.net> (raw)
In-Reply-To: <20090604110841.6702.76228.stgit@Decadence>
Pablo Neira Ayuso wrote:
> This patch improves ctnetlink event reliability if one broadcast
> listener has set the NETLINK_BROADCAST_ERROR socket option.
>
> The logic is the following: if the event delivery, we keep the
> undelivered events in the per-conntrack event cache. Thus, once
> the next packet arrives, we trigger another event delivery in
> nf_conntrack_in(). If things don't go well in this second try,
> we accumulate the pending events in the cache but we try to
> deliver the current state as soon as possible. Therefore, we may
> lost state transitions but the userspace process gets in sync at
> some point.
>
> At worst case, if no events were delivered to userspace, we make
> sure that destroy events are successfully delivered. This happens
> because if ctnetlink fails to deliver the destroy event, we remove
> the conntrack entry from the hashes and insert them in the dying
> list, which contains inactive entries. Then, the conntrack timer
> is added with an extra grace timeout of random32() % 15 seconds
> to trigger the event again (this grace timeout is tunable via
> /proc). The use of a limited random timeout value allows
> distributing the "destroy" resends, thus, avoiding accumulating
> lots "destroy" events at the same time.
>
> The maximum number of conntrack entries (active or inactive) is
> still handled by nf_conntrack_max. Thus, we may start dropping
> packets at some point if we accumulate a lot of inactive conntrack
> entries waiting to deliver the destroy event to userspace.
> During my stress tests consisting of setting a very small buffer
> of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
> flag, and generating lots of very small connections, I noticed
> very few destroy entries on the fly waiting to be resend.
Conceptually, I think this all makes sense and is an improvement.
> @@ -240,6 +225,60 @@ static void death_by_timeout(unsigned long ul_conntrack)
> NF_CT_STAT_INC(net, delete_list);
> clean_from_lists(ct);
> spin_unlock_bh(&nf_conntrack_lock);
> +}
> +
> +static void death_by_event(unsigned long ul_conntrack)
> +{
> + struct nf_conn *ct = (void *)ul_conntrack;
> + struct net *net = nf_ct_net(ct);
> +
> + if (nf_conntrack_event(IPCT_DESTROY, ct) < 0) {
> + /* bad luck, let's retry again */
> + ct->timeout.expires = jiffies +
> + (random32() % net->ct.sysctl_events_retry_timeout);
> + add_timer(&ct->timeout);
> + return;
> + }
> + /* we've got the event delivered, now it's dying */
> + set_bit(IPS_DYING_BIT, &ct->status);
> + spin_lock_bh(&nf_conntrack_lock);
_bh is not needed here, the timer is already running in BH context.
> + hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
Why is _rcu used? The lists are never used for a lookup.
> + spin_unlock_bh(&nf_conntrack_lock);
> + nf_ct_helper_destroy(ct);
> + nf_ct_put(ct);
> +}
> +
> +void nf_ct_setup_event_timer(struct nf_conn *ct)
> +{
> + struct net *net = nf_ct_net(ct);
> +
> + nf_ct_delete_from_lists(ct);
This doesn't really belong here. I realize its because of ctnetlink,
but I think I'd rather have an additional export.
> + /* add this conntrack to the dying list */
> + spin_lock_bh(&nf_conntrack_lock);
> + hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
> + &net->ct.dying);
> + /* set a new timer to retry event delivery */
> + setup_timer(&ct->timeout, death_by_event, (unsigned long)ct);
> + ct->timeout.expires = jiffies +
> + (random32() % net->ct.sysctl_events_retry_timeout);
> + add_timer(&ct->timeout);
> + spin_unlock_bh(&nf_conntrack_lock);
Its not necessary to hold the lock around the timer setup. There's
only this single reference left to the conntrack - at least it should
be that way.
> +}
> +EXPORT_SYMBOL_GPL(nf_ct_setup_event_timer);
> +
> +static void death_by_timeout(unsigned long ul_conntrack)
> +{
> + struct nf_conn *ct = (void *)ul_conntrack;
> +
> + if (!test_bit(IPS_DYING_BIT, &ct->status) &&
> + unlikely(nf_conntrack_event(IPCT_DESTROY, ct) < 0)) {
> + /* destroy event was not delivered */
> + nf_ct_setup_event_timer(ct);
> + return;
> + }
> + set_bit(IPS_DYING_BIT, &ct->status);
> + nf_ct_helper_destroy(ct);
> + nf_ct_delete_from_lists(ct);
The helpers might keep global data themselves thats cleaned
up by ->destroy() (gre keymap for instance). The cleanup step
might need to be done immediately to avoid clashes with new data
or incorrectly returning data thats supposed to be gone.
> nf_ct_put(ct);
> }
>
> @@ -1030,6 +1069,22 @@ void nf_conntrack_flush_report(struct net *net, u32 pid, int report)
> }
> EXPORT_SYMBOL_GPL(nf_conntrack_flush_report);
>
> +static void nf_ct_release_dying_list(void)
> +{
> + struct nf_conntrack_tuple_hash *h;
> + struct nf_conn *ct;
> + struct hlist_nulls_node *n;
> +
> + spin_lock_bh(&nf_conntrack_lock);
> + hlist_nulls_for_each_entry(h, n, &init_net.ct.dying, hnnode) {
> + ct = nf_ct_tuplehash_to_ctrack(h);
> + /* never fails to remove them, no listeners at this point */
> + if (del_timer(&ct->timeout))
> + ct->timeout.function((unsigned long)ct);
nf_ct_kill()?
> + }
> + spin_unlock_bh(&nf_conntrack_lock);
> +}
> diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
> index 0fa5a42..5fc1fe7 100644
> --- a/net/netfilter/nf_conntrack_helper.c
> +++ b/net/netfilter/nf_conntrack_helper.c
> @@ -136,6 +136,21 @@ static inline int unhelp(struct nf_conntrack_tuple_hash *i,
> return 0;
> }
>
> +void nf_ct_helper_destroy(struct nf_conn *ct)
> +{
> + struct nf_conn_help *help = nfct_help(ct);
> + struct nf_conntrack_helper *helper;
> +
> + if (help) {
> + rcu_read_lock();
> + helper = rcu_dereference(help->helper);
> + if (helper && helper->destroy)
> + helper->destroy(ct);
> + rcu_read_unlock();
> + }
> +}
> +EXPORT_SYMBOL_GPL(nf_ct_helper_destroy);
The export looks unnecessary, its only used in nf_conntrack_core.c
unless I've missed something
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 2dd2910..6695e4b 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -558,7 +559,10 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
> rcu_read_unlock();
>
> nlmsg_end(skb, nlh);
> - nfnetlink_send(skb, item->pid, group, item->report, GFP_ATOMIC);
> + err = nfnetlink_send(skb, item->pid, group, item->report, GFP_ATOMIC);
> + if ((err == -ENOBUFS) || (err == -EAGAIN))
minor nit: unnecessary parens
next prev parent reply other threads:[~2009-06-05 14:37 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-04 11:07 [PATCH 0/2] Pablo Neira Ayuso
2009-06-04 11:08 ` [PATCH 1/2] netfilter: conntrack: move event cache to conntrack extension infrastructure Pablo Neira Ayuso
2009-06-04 12:16 ` Pablo Neira Ayuso
2009-06-05 11:04 ` Patrick McHardy
2009-06-05 13:06 ` Pablo Neira Ayuso
2009-06-05 14:13 ` Patrick McHardy
2009-06-06 6:24 ` Pablo Neira Ayuso
2009-06-04 11:08 ` [PATCH 2/2] netfilter: conntrack: optional reliable conntrack event delivery Pablo Neira Ayuso
2009-06-05 14:37 ` Patrick McHardy [this message]
2009-06-06 6:34 ` Pablo Neira Ayuso
2009-06-08 13:49 ` Patrick McHardy
2009-06-09 22:36 ` Pablo Neira Ayuso
2009-06-09 22:43 ` Patrick McHardy
2009-06-09 22:45 ` Patrick McHardy
2009-06-09 22:58 ` Pablo Neira Ayuso
2009-06-10 1:18 ` Eric Dumazet
2009-06-10 9:55 ` Patrick McHardy
2009-06-10 10:36 ` Pablo Neira Ayuso
2009-06-10 10:55 ` Patrick McHardy
2009-06-10 11:01 ` Patrick McHardy
2009-06-10 11:40 ` Patrick McHardy
2009-06-10 12:22 ` Pablo Neira Ayuso
2009-06-10 12:27 ` Patrick McHardy
2009-06-10 12:43 ` Pablo Neira Ayuso
2009-06-10 12:56 ` Patrick McHardy
2009-06-10 12:26 ` Jozsef Kadlecsik
2009-06-10 12:30 ` Patrick McHardy
2009-06-10 12:41 ` Patrick McHardy
2009-06-04 11:17 ` [PATCH 0/2] reliable per-conntrack event cache Pablo Neira Ayuso
-- strict thread matches above, loose matches on Subject: below --
2009-05-04 13:53 [PATCH 0/2] conntrack event subsystem updates for 2.6.31 (part 2) Pablo Neira Ayuso
2009-05-04 13:53 ` [PATCH 2/2] netfilter: conntrack: optional reliable conntrack event delivery Pablo Neira Ayuso
2009-05-04 14:02 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A292DB7.4000000@trash.net \
--to=kaber@trash.net \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).