[RFC] netfilter: conntrack race between dump

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] netfilter: conntrack race between dump_table and destroy
@ 2010-11-25  6:27 Stephen Hemminger
  2010-11-25  6:34 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2010-11-25  6:27 UTC (permalink / raw)
  To: Patrick McHardy, Paul E. McKenney; +Cc: netdev, netfilter-devel

A customer reported a crash and the backtrace showed that
ctnetlink_dump_table was running while a conntrack entry was
being destroyed.  It looks like the code for walking the table
with hlist_nulls_for_each_entry_rcu is not correctly handling the
case where it finds a deleted entry.

According to RCU documentation, when using hlist_nulls the reader
must handle the case of seeing a deleted entry and not proceed
further down the linked list.  For lookup the correct behavior would
be to restart the scan, but that would generate duplicate entries.

This patch is the simplest one of three alternatives:
  1) if dead entry detected, skip the rest of the hash chain (see below)
  2) remember skb location at start of hash chain and rescan that chain
  3) switch to using a full lock when scanning rather than RCU.
It all depends on the amount of effort versus consistency of results.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:11:27.661682148 -0800
+++ b/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:22:28.431980247 -0800
@@ -651,8 +651,12 @@ restart:
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
+
+			/* if entry is being deleted then can not proceed
+			 * past this point. */
 			if (!atomic_inc_not_zero(&ct->ct_general.use))
-				continue;
+				break;
+
 			/* Dump entries of a given L3 protocol number.
 			 * If it is not specified, ie. l3proto == 0,
 			 * then dump everything. */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] netfilter: conntrack race between dump_table and destroy
  2010-11-25  6:27 [RFC] netfilter: conntrack race between dump_table and destroy Stephen Hemminger
@ 2010-11-25  6:34 ` Eric Dumazet
  2010-11-25  7:00   ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2010-11-25  6:34 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel

Le mercredi 24 novembre 2010 à 22:27 -0800, Stephen Hemminger a écrit :
> A customer reported a crash and the backtrace showed that
> ctnetlink_dump_table was running while a conntrack entry was
> being destroyed.  It looks like the code for walking the table
> with hlist_nulls_for_each_entry_rcu is not correctly handling the
> case where it finds a deleted entry.
> 
> According to RCU documentation, when using hlist_nulls the reader
> must handle the case of seeing a deleted entry and not proceed
> further down the linked list.  For lookup the correct behavior would
> be to restart the scan, but that would generate duplicate entries.
> 
> This patch is the simplest one of three alternatives:
>   1) if dead entry detected, skip the rest of the hash chain (see below)
>   2) remember skb location at start of hash chain and rescan that chain
>   3) switch to using a full lock when scanning rather than RCU.
> It all depends on the amount of effort versus consistency of results.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
> --- a/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:11:27.661682148 -0800
> +++ b/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:22:28.431980247 -0800
> @@ -651,8 +651,12 @@ restart:
>  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
>  				continue;
>  			ct = nf_ct_tuplehash_to_ctrack(h);
> +
> +			/* if entry is being deleted then can not proceed
> +			 * past this point. */
>  			if (!atomic_inc_not_zero(&ct->ct_general.use))
> -				continue;
> +				break;
> +
>  			/* Dump entries of a given L3 protocol number.
>  			 * If it is not specified, ie. l3proto == 0,
>  			 * then dump everything. */
> --

Hmm...

How restarting the loop can be a problem ? 

There must be a bug somewhere else that your patch try to avoid, not to
really fix.

Normally, destroyed ct is removed eventually from the chain, so this
lookup should stop.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] netfilter: conntrack race between dump_table and destroy
  2010-11-25  6:34 ` Eric Dumazet
@ 2010-11-25  7:00   ` Stephen Hemminger
  2010-11-25  7:13     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2010-11-25  7:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel

On Thu, 25 Nov 2010 07:34:33 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mercredi 24 novembre 2010 à 22:27 -0800, Stephen Hemminger a écrit :
> > A customer reported a crash and the backtrace showed that
> > ctnetlink_dump_table was running while a conntrack entry was
> > being destroyed.  It looks like the code for walking the table
> > with hlist_nulls_for_each_entry_rcu is not correctly handling the
> > case where it finds a deleted entry.
> > 
> > According to RCU documentation, when using hlist_nulls the reader
> > must handle the case of seeing a deleted entry and not proceed
> > further down the linked list.  For lookup the correct behavior would
> > be to restart the scan, but that would generate duplicate entries.
> > 
> > This patch is the simplest one of three alternatives:
> >   1) if dead entry detected, skip the rest of the hash chain (see below)
> >   2) remember skb location at start of hash chain and rescan that chain
> >   3) switch to using a full lock when scanning rather than RCU.
> > It all depends on the amount of effort versus consistency of results.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > 
> > --- a/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:11:27.661682148 -0800
> > +++ b/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:22:28.431980247 -0800
> > @@ -651,8 +651,12 @@ restart:
> >  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> >  				continue;
> >  			ct = nf_ct_tuplehash_to_ctrack(h);
> > +
> > +			/* if entry is being deleted then can not proceed
> > +			 * past this point. */
> >  			if (!atomic_inc_not_zero(&ct->ct_general.use))
> > -				continue;
> > +				break;
> > +
> >  			/* Dump entries of a given L3 protocol number.
> >  			 * If it is not specified, ie. l3proto == 0,
> >  			 * then dump everything. */
> > --
> 
> Hmm...
> 
> How restarting the loop can be a problem ? 

At this point in the loop, some entries have been placed in the netlink
dump buffer. Restarting the loop will cause duplicate entries.

> There must be a bug somewhere else that your patch try to avoid, not to
> really fix.
> 
> Normally, destroyed ct is removed eventually from the chain, so this
> lookup should stop.

Because hlist_nulls it is possible to walk into a dead entry, in that
case the next pointer is no longer valid.

-- 
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] netfilter: conntrack race between dump_table and destroy
  2010-11-25  7:00   ` Stephen Hemminger
@ 2010-11-25  7:13     ` Eric Dumazet
  2010-11-26 21:51       ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2010-11-25  7:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel

Le mercredi 24 novembre 2010 à 23:00 -0800, Stephen Hemminger a écrit :
> On Thu, 25 Nov 2010 07:34:33 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Le mercredi 24 novembre 2010 à 22:27 -0800, Stephen Hemminger a écrit :
> > > A customer reported a crash and the backtrace showed that
> > > ctnetlink_dump_table was running while a conntrack entry was
> > > being destroyed.  It looks like the code for walking the table
> > > with hlist_nulls_for_each_entry_rcu is not correctly handling the
> > > case where it finds a deleted entry.
> > > 
> > > According to RCU documentation, when using hlist_nulls the reader
> > > must handle the case of seeing a deleted entry and not proceed
> > > further down the linked list.  For lookup the correct behavior would
> > > be to restart the scan, but that would generate duplicate entries.
> > > 
> > > This patch is the simplest one of three alternatives:
> > >   1) if dead entry detected, skip the rest of the hash chain (see below)
> > >   2) remember skb location at start of hash chain and rescan that chain
> > >   3) switch to using a full lock when scanning rather than RCU.
> > > It all depends on the amount of effort versus consistency of results.
> > > 
> > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > > 
> > > 
> > > --- a/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:11:27.661682148 -0800
> > > +++ b/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:22:28.431980247 -0800
> > > @@ -651,8 +651,12 @@ restart:
> > >  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> > >  				continue;
> > >  			ct = nf_ct_tuplehash_to_ctrack(h);
> > > +
> > > +			/* if entry is being deleted then can not proceed
> > > +			 * past this point. */
> > >  			if (!atomic_inc_not_zero(&ct->ct_general.use))
> > > -				continue;
> > > +				break;
> > > +
> > >  			/* Dump entries of a given L3 protocol number.
> > >  			 * If it is not specified, ie. l3proto == 0,
> > >  			 * then dump everything. */
> > > --
> > 
> > Hmm...
> > 
> > How restarting the loop can be a problem ? 
> 
> At this point in the loop, some entries have been placed in the netlink
> dump buffer. Restarting the loop will cause duplicate entries.
> 

Then this is another problem.

We cannot use RCU at all here.

Or we miss valid entries in the chain.

> > There must be a bug somewhere else that your patch try to avoid, not to
> > really fix.
> > 
> > Normally, destroyed ct is removed eventually from the chain, so this
> > lookup should stop.
> 
> Because hlist_nulls it is possible to walk into a dead entry, in that
> case the next pointer is no longer valid.
> 

RCU should be used where needed, in fast path only, to find one entry,
not to find _all_ entries.

For example, we cannot use it for UDP multicast delivery for the same
reasons : If we find a deleted or moved socket, we must restart the loop
and forget all accumulated sockets.

If netlink dumps each entry in the final destination container, then we
cannot restart loop, and cannot use RCU for chain lookup.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] netfilter:  fix  race in conntrack between dump_table and destroy
  2010-11-25  7:13     ` Eric Dumazet
@ 2010-11-26 21:51       ` Stephen Hemminger
  2010-11-27  6:32         ` Eric Dumazet
                           ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Stephen Hemminger @ 2010-11-26 21:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel

The netlink interface to dump the connection tracking table has a race
when entries are deleted at the same time. A customer reported a crash 
and the backtrace showed thatctnetlink_dump_table was running while a 
conntrack entry wasbeing destroyed.
(see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).

According to RCU documentation, when using hlist_nulls the reader
must handle the case of seeing a deleted entry and not proceed
further down the linked list.  The old code would continue
which caused the scan to walk into the free list. 

This patch uses locking (rather than RCU) for this operation which
is guaranteed safe, and no longer requires getting reference while
doing dump operation.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/netfilter/nf_conntrack_netlink.c	2010-11-25 21:49:11.401158365 -0800
+++ b/net/netfilter/nf_conntrack_netlink.c	2010-11-25 22:18:08.164421697 -0800
@@ -642,25 +642,23 @@ ctnetlink_dump_table(struct sk_buff *skb
 	struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
 	u_int8_t l3proto = nfmsg->nfgen_family;
 
-	rcu_read_lock();
+	spin_lock_bh(&nf_conntrack_lock);
 	last = (struct nf_conn *)cb->args[1];
 	for (; cb->args[0] < net->ct.htable_size; cb->args[0]++) {
 restart:
-		hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[cb->args[0]],
+		hlist_nulls_for_each_entry(h, n, &net->ct.hash[cb->args[0]],
 					 hnnode) {
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
-			if (!atomic_inc_not_zero(&ct->ct_general.use))
-				continue;
 			/* Dump entries of a given L3 protocol number.
 			 * If it is not specified, ie. l3proto == 0,
 			 * then dump everything. */
 			if (l3proto && nf_ct_l3num(ct) != l3proto)
-				goto releasect;
+				continue;
 			if (cb->args[1]) {
 				if (ct != last)
-					goto releasect;
+					continue;
 				cb->args[1] = 0;
 			}
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
@@ -678,8 +676,6 @@ restart:
 				if (acct)
 					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
 			}
-releasect:
-		nf_ct_put(ct);
 		}
 		if (cb->args[1]) {
 			cb->args[1] = 0;
@@ -687,7 +683,7 @@ releasect:
 		}
 	}
 out:
-	rcu_read_unlock();
+	spin_unlock_bh(&nf_conntrack_lock);
 	if (last)
 		nf_ct_put(last);
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] netfilter:  fix  race in conntrack between dump_table and destroy
  2010-11-26 21:51       ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger
@ 2010-11-27  6:32         ` Eric Dumazet
  2010-11-30 17:28         ` Stephen Hemminger
  2011-01-09 21:32         ` Pablo Neira Ayuso
  2 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2010-11-27  6:32 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel

Le vendredi 26 novembre 2010 à 13:51 -0800, Stephen Hemminger a écrit :
> The netlink interface to dump the connection tracking table has a race
> when entries are deleted at the same time. A customer reported a crash 
> and the backtrace showed thatctnetlink_dump_table was running while a 
> conntrack entry wasbeing destroyed.
> (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).
> 
> According to RCU documentation, when using hlist_nulls the reader
> must handle the case of seeing a deleted entry and not proceed
> further down the linked list.  The old code would continue
> which caused the scan to walk into the free list. 
> 
> This patch uses locking (rather than RCU) for this operation which
> is guaranteed safe, and no longer requires getting reference while
> doing dump operation.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] netfilter:  fix  race in conntrack between dump_table and destroy
  2010-11-26 21:51       ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger
  2010-11-27  6:32         ` Eric Dumazet
@ 2010-11-30 17:28         ` Stephen Hemminger
  2011-01-09 21:32         ` Pablo Neira Ayuso
  2 siblings, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2010-11-30 17:28 UTC (permalink / raw)
  To: David Miller
  Cc: Eric Dumazet, Patrick McHardy, Paul E. McKenney, netdev,
	netfilter-devel

On Fri, 26 Nov 2010 13:51:01 -0800
Stephen Hemminger <shemminger@vyatta.com> wrote:

> The netlink interface to dump the connection tracking table has a race
> when entries are deleted at the same time. A customer reported a crash 
> and the backtrace showed thatctnetlink_dump_table was running while a 
> conntrack entry wasbeing destroyed.
> (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).
> 
> According to RCU documentation, when using hlist_nulls the reader
> must handle the case of seeing a deleted entry and not proceed
> further down the linked list.  The old code would continue
> which caused the scan to walk into the free list. 
> 
> This patch uses locking (rather than RCU) for this operation which
> is guaranteed safe, and no longer requires getting reference while
> doing dump operation.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

This should go in net-2.6 and stable for 2.6.32, 2.6.35, and 2.6.36



-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] netfilter:  fix  race in conntrack between dump_table and destroy
  2010-11-26 21:51       ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger
  2010-11-27  6:32         ` Eric Dumazet
  2010-11-30 17:28         ` Stephen Hemminger
@ 2011-01-09 21:32         ` Pablo Neira Ayuso
  2 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2011-01-09 21:32 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric Dumazet, Patrick McHardy, Paul E. McKenney, netdev,
	netfilter-devel

On 26/11/10 22:51, Stephen Hemminger wrote:
> The netlink interface to dump the connection tracking table has a race
> when entries are deleted at the same time. A customer reported a crash 
> and the backtrace showed thatctnetlink_dump_table was running while a 
> conntrack entry wasbeing destroyed.
> (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).
> 
> According to RCU documentation, when using hlist_nulls the reader
> must handle the case of seeing a deleted entry and not proceed
> further down the linked list.  The old code would continue
> which caused the scan to walk into the free list. 
> 
> This patch uses locking (rather than RCU) for this operation which
> is guaranteed safe, and no longer requires getting reference while
> doing dump operation.

I have put this in my tree:

http://1984.lsi.us.es/git/?p=net-2.6/.git;a=summary

I'll pass it to David for -stable inclusion.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-01-09 21:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-25  6:27 [RFC] netfilter: conntrack race between dump_table and destroy Stephen Hemminger
2010-11-25  6:34 ` Eric Dumazet
2010-11-25  7:00   ` Stephen Hemminger
2010-11-25  7:13     ` Eric Dumazet
2010-11-26 21:51       ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger
2010-11-27  6:32         ` Eric Dumazet
2010-11-30 17:28         ` Stephen Hemminger
2011-01-09 21:32         ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).