Re: [PATCH 2/2] netfilter: conntrack: optional reliable conntrack event delivery

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Patrick McHardy <kaber@trash.net>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH 2/2] netfilter: conntrack: optional reliable conntrack event	delivery
Date: Fri, 05 Jun 2009 16:37:43 +0200	[thread overview]
Message-ID: <4A292DB7.4000000@trash.net> (raw)
In-Reply-To: <20090604110841.6702.76228.stgit@Decadence>

Pablo Neira Ayuso wrote:
> This patch improves ctnetlink event reliability if one broadcast
> listener has set the NETLINK_BROADCAST_ERROR socket option.
> 
> The logic is the following: if the event delivery, we keep the
> undelivered events in the per-conntrack event cache. Thus, once
> the next packet arrives, we trigger another event delivery in
> nf_conntrack_in(). If things don't go well in this second try,
> we accumulate the pending events in the cache but we try to
> deliver the current state as soon as possible. Therefore, we may
> lost state transitions but the userspace process gets in sync at
> some point.
> 
> At worst case, if no events were delivered to userspace, we make
> sure that destroy events are successfully delivered. This happens
> because if ctnetlink fails to deliver the destroy event, we remove
> the conntrack entry from the hashes and insert them in the dying
> list, which contains inactive entries. Then, the conntrack timer
> is added with an extra grace timeout of random32() % 15 seconds
> to trigger the event again (this grace timeout is tunable via
> /proc). The use of a limited random timeout value allows
> distributing the "destroy" resends, thus, avoiding accumulating
> lots "destroy" events at the same time.
> 
> The maximum number of conntrack entries (active or inactive) is
> still handled by nf_conntrack_max. Thus, we may start dropping
> packets at some point if we accumulate a lot of inactive conntrack
> entries waiting to deliver the destroy event to userspace.
> During my stress tests consisting of setting a very small buffer
> of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
> flag, and generating lots of very small connections, I noticed
> very few destroy entries on the fly waiting to be resend.

Conceptually, I think this all makes sense and is an improvement.

> @@ -240,6 +225,60 @@ static void death_by_timeout(unsigned long ul_conntrack)
>  	NF_CT_STAT_INC(net, delete_list);
>  	clean_from_lists(ct);
>  	spin_unlock_bh(&nf_conntrack_lock);
> +}
> +
> +static void death_by_event(unsigned long ul_conntrack)
> +{
> +	struct nf_conn *ct = (void *)ul_conntrack;
> +	struct net *net = nf_ct_net(ct);
> +
> +	if (nf_conntrack_event(IPCT_DESTROY, ct) < 0) {
> +		/* bad luck, let's retry again */
> +		ct->timeout.expires = jiffies +
> +			(random32() % net->ct.sysctl_events_retry_timeout);
> +		add_timer(&ct->timeout);
> +		return;
> +	}
> +	/* we've got the event delivered, now it's dying */
> +	set_bit(IPS_DYING_BIT, &ct->status);
> +	spin_lock_bh(&nf_conntrack_lock);

_bh is not needed here, the timer is already running in BH context.

> +	hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);

Why is _rcu used? The lists are never used for a lookup.

> +	spin_unlock_bh(&nf_conntrack_lock);
> +	nf_ct_helper_destroy(ct);
> +	nf_ct_put(ct);
> +}
> +
> +void nf_ct_setup_event_timer(struct nf_conn *ct)
> +{
> +	struct net *net = nf_ct_net(ct);
> +
> +	nf_ct_delete_from_lists(ct);

This doesn't really belong here. I realize its because of ctnetlink,
but I think I'd rather have an additional export.

> +	/* add this conntrack to the dying list */
> +	spin_lock_bh(&nf_conntrack_lock);
> +	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
> +				 &net->ct.dying);
> +	/* set a new timer to retry event delivery */
> +	setup_timer(&ct->timeout, death_by_event, (unsigned long)ct);
> +	ct->timeout.expires = jiffies +
> +		(random32() % net->ct.sysctl_events_retry_timeout);
> +	add_timer(&ct->timeout);
> +	spin_unlock_bh(&nf_conntrack_lock);

Its not necessary to hold the lock around the timer setup. There's
only this single reference left to the conntrack - at least it should
be that way.

> +}
> +EXPORT_SYMBOL_GPL(nf_ct_setup_event_timer);
> +
> +static void death_by_timeout(unsigned long ul_conntrack)
> +{
> +	struct nf_conn *ct = (void *)ul_conntrack;
> +
> +	if (!test_bit(IPS_DYING_BIT, &ct->status) && 
> +	    unlikely(nf_conntrack_event(IPCT_DESTROY, ct) < 0)) {
> +		/* destroy event was not delivered */
> +		nf_ct_setup_event_timer(ct);
> +		return;
> +	}
> +	set_bit(IPS_DYING_BIT, &ct->status);
> +	nf_ct_helper_destroy(ct);
> +	nf_ct_delete_from_lists(ct);

The helpers might keep global data themselves thats cleaned
up by ->destroy() (gre keymap for instance). The cleanup step
might need to be done immediately to avoid clashes with new data
or incorrectly returning data thats supposed to be gone.

>  	nf_ct_put(ct);
>  }
>  
> @@ -1030,6 +1069,22 @@ void nf_conntrack_flush_report(struct net *net, u32 pid, int report)
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_flush_report);
>  
> +static void nf_ct_release_dying_list(void)
> +{
> +	struct nf_conntrack_tuple_hash *h;
> +	struct nf_conn *ct;
> +	struct hlist_nulls_node *n;
> +
> +	spin_lock_bh(&nf_conntrack_lock);
> +	hlist_nulls_for_each_entry(h, n, &init_net.ct.dying, hnnode) {
> +		ct = nf_ct_tuplehash_to_ctrack(h);
> +		/* never fails to remove them, no listeners at this point */
> +		if (del_timer(&ct->timeout))
> +			ct->timeout.function((unsigned long)ct);

nf_ct_kill()?

> +	}
> +	spin_unlock_bh(&nf_conntrack_lock);
> +}

> diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
> index 0fa5a42..5fc1fe7 100644
> --- a/net/netfilter/nf_conntrack_helper.c
> +++ b/net/netfilter/nf_conntrack_helper.c
> @@ -136,6 +136,21 @@ static inline int unhelp(struct nf_conntrack_tuple_hash *i,
>  	return 0;
>  }
>  
> +void nf_ct_helper_destroy(struct nf_conn *ct)
> +{
> +	struct nf_conn_help *help = nfct_help(ct);
> +	struct nf_conntrack_helper *helper;
> +
> +	if (help) {
> +		rcu_read_lock();
> +		helper = rcu_dereference(help->helper);
> +		if (helper && helper->destroy)
> +			helper->destroy(ct);
> +		rcu_read_unlock();
> +	}
> +}
> +EXPORT_SYMBOL_GPL(nf_ct_helper_destroy);

The export looks unnecessary, its only used in nf_conntrack_core.c
unless I've missed something

> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 2dd2910..6695e4b 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -558,7 +559,10 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
>  	rcu_read_unlock();
>  
>  	nlmsg_end(skb, nlh);
> -	nfnetlink_send(skb, item->pid, group, item->report, GFP_ATOMIC);
> +	err = nfnetlink_send(skb, item->pid, group, item->report, GFP_ATOMIC);
> +	if ((err == -ENOBUFS) || (err == -EAGAIN))

minor nit: unnecessary parens

next prev parent reply	other threads:[~2009-06-05 14:37 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-04 11:07 [PATCH 0/2] Pablo Neira Ayuso
2009-06-04 11:08 ` [PATCH 1/2] netfilter: conntrack: move event cache to conntrack extension infrastructure Pablo Neira Ayuso
2009-06-04 12:16   ` Pablo Neira Ayuso
2009-06-05 11:04   ` Patrick McHardy
2009-06-05 13:06     ` Pablo Neira Ayuso
2009-06-05 14:13       ` Patrick McHardy
2009-06-06  6:24         ` Pablo Neira Ayuso
2009-06-04 11:08 ` [PATCH 2/2] netfilter: conntrack: optional reliable conntrack event delivery Pablo Neira Ayuso
2009-06-05 14:37   ` Patrick McHardy [this message]
2009-06-06  6:34     ` Pablo Neira Ayuso
2009-06-08 13:49       ` Patrick McHardy
2009-06-09 22:36     ` Pablo Neira Ayuso
2009-06-09 22:43       ` Patrick McHardy
2009-06-09 22:45         ` Patrick McHardy
2009-06-09 22:58           ` Pablo Neira Ayuso
2009-06-10  1:18             ` Eric Dumazet
2009-06-10  9:55               ` Patrick McHardy
2009-06-10 10:36                 ` Pablo Neira Ayuso
2009-06-10 10:55                   ` Patrick McHardy
2009-06-10 11:01                     ` Patrick McHardy
2009-06-10 11:40                       ` Patrick McHardy
2009-06-10 12:22                         ` Pablo Neira Ayuso
2009-06-10 12:27                           ` Patrick McHardy
2009-06-10 12:43                             ` Pablo Neira Ayuso
2009-06-10 12:56                               ` Patrick McHardy
2009-06-10 12:26                         ` Jozsef Kadlecsik
2009-06-10 12:30                           ` Patrick McHardy
2009-06-10 12:41                             ` Patrick McHardy
2009-06-04 11:17 ` [PATCH 0/2] reliable per-conntrack event cache Pablo Neira Ayuso
  -- strict thread matches above, loose matches on Subject: below --
2009-05-04 13:53 [PATCH 0/2] conntrack event subsystem updates for 2.6.31 (part 2) Pablo Neira Ayuso
2009-05-04 13:53 ` [PATCH 2/2] netfilter: conntrack: optional reliable conntrack event delivery Pablo Neira Ayuso
2009-05-04 14:02   ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A292DB7.4000000@trash.net \
    --to=kaber@trash.net \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).