* [RFC] netfilter: conntrack race between dump_table and destroy @ 2010-11-25 6:27 Stephen Hemminger 2010-11-25 6:34 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Stephen Hemminger @ 2010-11-25 6:27 UTC (permalink / raw) To: Patrick McHardy, Paul E. McKenney; +Cc: netdev, netfilter-devel A customer reported a crash and the backtrace showed that ctnetlink_dump_table was running while a conntrack entry was being destroyed. It looks like the code for walking the table with hlist_nulls_for_each_entry_rcu is not correctly handling the case where it finds a deleted entry. According to RCU documentation, when using hlist_nulls the reader must handle the case of seeing a deleted entry and not proceed further down the linked list. For lookup the correct behavior would be to restart the scan, but that would generate duplicate entries. This patch is the simplest one of three alternatives: 1) if dead entry detected, skip the rest of the hash chain (see below) 2) remember skb location at start of hash chain and rescan that chain 3) switch to using a full lock when scanning rather than RCU. It all depends on the amount of effort versus consistency of results. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> --- a/net/netfilter/nf_conntrack_netlink.c 2010-11-24 14:11:27.661682148 -0800 +++ b/net/netfilter/nf_conntrack_netlink.c 2010-11-24 14:22:28.431980247 -0800 @@ -651,8 +651,12 @@ restart: if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) continue; ct = nf_ct_tuplehash_to_ctrack(h); + + /* if entry is being deleted then can not proceed + * past this point. */ if (!atomic_inc_not_zero(&ct->ct_general.use)) - continue; + break; + /* Dump entries of a given L3 protocol number. * If it is not specified, ie. l3proto == 0, * then dump everything. */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] netfilter: conntrack race between dump_table and destroy 2010-11-25 6:27 [RFC] netfilter: conntrack race between dump_table and destroy Stephen Hemminger @ 2010-11-25 6:34 ` Eric Dumazet 2010-11-25 7:00 ` Stephen Hemminger 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2010-11-25 6:34 UTC (permalink / raw) To: Stephen Hemminger Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel Le mercredi 24 novembre 2010 à 22:27 -0800, Stephen Hemminger a écrit : > A customer reported a crash and the backtrace showed that > ctnetlink_dump_table was running while a conntrack entry was > being destroyed. It looks like the code for walking the table > with hlist_nulls_for_each_entry_rcu is not correctly handling the > case where it finds a deleted entry. > > According to RCU documentation, when using hlist_nulls the reader > must handle the case of seeing a deleted entry and not proceed > further down the linked list. For lookup the correct behavior would > be to restart the scan, but that would generate duplicate entries. > > This patch is the simplest one of three alternatives: > 1) if dead entry detected, skip the rest of the hash chain (see below) > 2) remember skb location at start of hash chain and rescan that chain > 3) switch to using a full lock when scanning rather than RCU. > It all depends on the amount of effort versus consistency of results. > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > > --- a/net/netfilter/nf_conntrack_netlink.c 2010-11-24 14:11:27.661682148 -0800 > +++ b/net/netfilter/nf_conntrack_netlink.c 2010-11-24 14:22:28.431980247 -0800 > @@ -651,8 +651,12 @@ restart: > if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) > continue; > ct = nf_ct_tuplehash_to_ctrack(h); > + > + /* if entry is being deleted then can not proceed > + * past this point. */ > if (!atomic_inc_not_zero(&ct->ct_general.use)) > - continue; > + break; > + > /* Dump entries of a given L3 protocol number. > * If it is not specified, ie. l3proto == 0, > * then dump everything. */ > -- Hmm... How restarting the loop can be a problem ? There must be a bug somewhere else that your patch try to avoid, not to really fix. Normally, destroyed ct is removed eventually from the chain, so this lookup should stop. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] netfilter: conntrack race between dump_table and destroy 2010-11-25 6:34 ` Eric Dumazet @ 2010-11-25 7:00 ` Stephen Hemminger 2010-11-25 7:13 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Stephen Hemminger @ 2010-11-25 7:00 UTC (permalink / raw) To: Eric Dumazet; +Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel On Thu, 25 Nov 2010 07:34:33 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mercredi 24 novembre 2010 à 22:27 -0800, Stephen Hemminger a écrit : > > A customer reported a crash and the backtrace showed that > > ctnetlink_dump_table was running while a conntrack entry was > > being destroyed. It looks like the code for walking the table > > with hlist_nulls_for_each_entry_rcu is not correctly handling the > > case where it finds a deleted entry. > > > > According to RCU documentation, when using hlist_nulls the reader > > must handle the case of seeing a deleted entry and not proceed > > further down the linked list. For lookup the correct behavior would > > be to restart the scan, but that would generate duplicate entries. > > > > This patch is the simplest one of three alternatives: > > 1) if dead entry detected, skip the rest of the hash chain (see below) > > 2) remember skb location at start of hash chain and rescan that chain > > 3) switch to using a full lock when scanning rather than RCU. > > It all depends on the amount of effort versus consistency of results. > > > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > > > > > --- a/net/netfilter/nf_conntrack_netlink.c 2010-11-24 14:11:27.661682148 -0800 > > +++ b/net/netfilter/nf_conntrack_netlink.c 2010-11-24 14:22:28.431980247 -0800 > > @@ -651,8 +651,12 @@ restart: > > if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) > > continue; > > ct = nf_ct_tuplehash_to_ctrack(h); > > + > > + /* if entry is being deleted then can not proceed > > + * past this point. */ > > if (!atomic_inc_not_zero(&ct->ct_general.use)) > > - continue; > > + break; > > + > > /* Dump entries of a given L3 protocol number. > > * If it is not specified, ie. l3proto == 0, > > * then dump everything. */ > > -- > > Hmm... > > How restarting the loop can be a problem ? At this point in the loop, some entries have been placed in the netlink dump buffer. Restarting the loop will cause duplicate entries. > There must be a bug somewhere else that your patch try to avoid, not to > really fix. > > Normally, destroyed ct is removed eventually from the chain, so this > lookup should stop. Because hlist_nulls it is possible to walk into a dead entry, in that case the next pointer is no longer valid. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] netfilter: conntrack race between dump_table and destroy 2010-11-25 7:00 ` Stephen Hemminger @ 2010-11-25 7:13 ` Eric Dumazet 2010-11-26 21:51 ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2010-11-25 7:13 UTC (permalink / raw) To: Stephen Hemminger Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel Le mercredi 24 novembre 2010 à 23:00 -0800, Stephen Hemminger a écrit : > On Thu, 25 Nov 2010 07:34:33 +0100 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > Le mercredi 24 novembre 2010 à 22:27 -0800, Stephen Hemminger a écrit : > > > A customer reported a crash and the backtrace showed that > > > ctnetlink_dump_table was running while a conntrack entry was > > > being destroyed. It looks like the code for walking the table > > > with hlist_nulls_for_each_entry_rcu is not correctly handling the > > > case where it finds a deleted entry. > > > > > > According to RCU documentation, when using hlist_nulls the reader > > > must handle the case of seeing a deleted entry and not proceed > > > further down the linked list. For lookup the correct behavior would > > > be to restart the scan, but that would generate duplicate entries. > > > > > > This patch is the simplest one of three alternatives: > > > 1) if dead entry detected, skip the rest of the hash chain (see below) > > > 2) remember skb location at start of hash chain and rescan that chain > > > 3) switch to using a full lock when scanning rather than RCU. > > > It all depends on the amount of effort versus consistency of results. > > > > > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> > > > > > > > > > --- a/net/netfilter/nf_conntrack_netlink.c 2010-11-24 14:11:27.661682148 -0800 > > > +++ b/net/netfilter/nf_conntrack_netlink.c 2010-11-24 14:22:28.431980247 -0800 > > > @@ -651,8 +651,12 @@ restart: > > > if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) > > > continue; > > > ct = nf_ct_tuplehash_to_ctrack(h); > > > + > > > + /* if entry is being deleted then can not proceed > > > + * past this point. */ > > > if (!atomic_inc_not_zero(&ct->ct_general.use)) > > > - continue; > > > + break; > > > + > > > /* Dump entries of a given L3 protocol number. > > > * If it is not specified, ie. l3proto == 0, > > > * then dump everything. */ > > > -- > > > > Hmm... > > > > How restarting the loop can be a problem ? > > At this point in the loop, some entries have been placed in the netlink > dump buffer. Restarting the loop will cause duplicate entries. > Then this is another problem. We cannot use RCU at all here. Or we miss valid entries in the chain. > > There must be a bug somewhere else that your patch try to avoid, not to > > really fix. > > > > Normally, destroyed ct is removed eventually from the chain, so this > > lookup should stop. > > Because hlist_nulls it is possible to walk into a dead entry, in that > case the next pointer is no longer valid. > RCU should be used where needed, in fast path only, to find one entry, not to find _all_ entries. For example, we cannot use it for UDP multicast delivery for the same reasons : If we find a deleted or moved socket, we must restart the loop and forget all accumulated sockets. If netlink dumps each entry in the final destination container, then we cannot restart loop, and cannot use RCU for chain lookup. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] netfilter: fix race in conntrack between dump_table and destroy 2010-11-25 7:13 ` Eric Dumazet @ 2010-11-26 21:51 ` Stephen Hemminger 2010-11-27 6:32 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Stephen Hemminger @ 2010-11-26 21:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel The netlink interface to dump the connection tracking table has a race when entries are deleted at the same time. A customer reported a crash and the backtrace showed thatctnetlink_dump_table was running while a conntrack entry wasbeing destroyed. (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402). According to RCU documentation, when using hlist_nulls the reader must handle the case of seeing a deleted entry and not proceed further down the linked list. The old code would continue which caused the scan to walk into the free list. This patch uses locking (rather than RCU) for this operation which is guaranteed safe, and no longer requires getting reference while doing dump operation. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> --- a/net/netfilter/nf_conntrack_netlink.c 2010-11-25 21:49:11.401158365 -0800 +++ b/net/netfilter/nf_conntrack_netlink.c 2010-11-25 22:18:08.164421697 -0800 @@ -642,25 +642,23 @@ ctnetlink_dump_table(struct sk_buff *skb struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh); u_int8_t l3proto = nfmsg->nfgen_family; - rcu_read_lock(); + spin_lock_bh(&nf_conntrack_lock); last = (struct nf_conn *)cb->args[1]; for (; cb->args[0] < net->ct.htable_size; cb->args[0]++) { restart: - hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[cb->args[0]], + hlist_nulls_for_each_entry(h, n, &net->ct.hash[cb->args[0]], hnnode) { if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) continue; ct = nf_ct_tuplehash_to_ctrack(h); - if (!atomic_inc_not_zero(&ct->ct_general.use)) - continue; /* Dump entries of a given L3 protocol number. * If it is not specified, ie. l3proto == 0, * then dump everything. */ if (l3proto && nf_ct_l3num(ct) != l3proto) - goto releasect; + continue; if (cb->args[1]) { if (ct != last) - goto releasect; + continue; cb->args[1] = 0; } if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, @@ -678,8 +676,6 @@ restart: if (acct) memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX])); } -releasect: - nf_ct_put(ct); } if (cb->args[1]) { cb->args[1] = 0; @@ -687,7 +683,7 @@ releasect: } } out: - rcu_read_unlock(); + spin_unlock_bh(&nf_conntrack_lock); if (last) nf_ct_put(last); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] netfilter: fix race in conntrack between dump_table and destroy 2010-11-26 21:51 ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger @ 2010-11-27 6:32 ` Eric Dumazet 2010-11-30 17:28 ` Stephen Hemminger 2011-01-09 21:32 ` Pablo Neira Ayuso 2 siblings, 0 replies; 8+ messages in thread From: Eric Dumazet @ 2010-11-27 6:32 UTC (permalink / raw) To: Stephen Hemminger Cc: Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel Le vendredi 26 novembre 2010 à 13:51 -0800, Stephen Hemminger a écrit : > The netlink interface to dump the connection tracking table has a race > when entries are deleted at the same time. A customer reported a crash > and the backtrace showed thatctnetlink_dump_table was running while a > conntrack entry wasbeing destroyed. > (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402). > > According to RCU documentation, when using hlist_nulls the reader > must handle the case of seeing a deleted entry and not proceed > further down the linked list. The old code would continue > which caused the scan to walk into the free list. > > This patch uses locking (rather than RCU) for this operation which > is guaranteed safe, and no longer requires getting reference while > doing dump operation. > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] netfilter: fix race in conntrack between dump_table and destroy 2010-11-26 21:51 ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger 2010-11-27 6:32 ` Eric Dumazet @ 2010-11-30 17:28 ` Stephen Hemminger 2011-01-09 21:32 ` Pablo Neira Ayuso 2 siblings, 0 replies; 8+ messages in thread From: Stephen Hemminger @ 2010-11-30 17:28 UTC (permalink / raw) To: David Miller Cc: Eric Dumazet, Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel On Fri, 26 Nov 2010 13:51:01 -0800 Stephen Hemminger <shemminger@vyatta.com> wrote: > The netlink interface to dump the connection tracking table has a race > when entries are deleted at the same time. A customer reported a crash > and the backtrace showed thatctnetlink_dump_table was running while a > conntrack entry wasbeing destroyed. > (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402). > > According to RCU documentation, when using hlist_nulls the reader > must handle the case of seeing a deleted entry and not proceed > further down the linked list. The old code would continue > which caused the scan to walk into the free list. > > This patch uses locking (rather than RCU) for this operation which > is guaranteed safe, and no longer requires getting reference while > doing dump operation. > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> This should go in net-2.6 and stable for 2.6.32, 2.6.35, and 2.6.36 -- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] netfilter: fix race in conntrack between dump_table and destroy 2010-11-26 21:51 ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger 2010-11-27 6:32 ` Eric Dumazet 2010-11-30 17:28 ` Stephen Hemminger @ 2011-01-09 21:32 ` Pablo Neira Ayuso 2 siblings, 0 replies; 8+ messages in thread From: Pablo Neira Ayuso @ 2011-01-09 21:32 UTC (permalink / raw) To: Stephen Hemminger Cc: Eric Dumazet, Patrick McHardy, Paul E. McKenney, netdev, netfilter-devel On 26/11/10 22:51, Stephen Hemminger wrote: > The netlink interface to dump the connection tracking table has a race > when entries are deleted at the same time. A customer reported a crash > and the backtrace showed thatctnetlink_dump_table was running while a > conntrack entry wasbeing destroyed. > (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402). > > According to RCU documentation, when using hlist_nulls the reader > must handle the case of seeing a deleted entry and not proceed > further down the linked list. The old code would continue > which caused the scan to walk into the free list. > > This patch uses locking (rather than RCU) for this operation which > is guaranteed safe, and no longer requires getting reference while > doing dump operation. I have put this in my tree: http://1984.lsi.us.es/git/?p=net-2.6/.git;a=summary I'll pass it to David for -stable inclusion. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-01-09 21:32 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-25 6:27 [RFC] netfilter: conntrack race between dump_table and destroy Stephen Hemminger 2010-11-25 6:34 ` Eric Dumazet 2010-11-25 7:00 ` Stephen Hemminger 2010-11-25 7:13 ` Eric Dumazet 2010-11-26 21:51 ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger 2010-11-27 6:32 ` Eric Dumazet 2010-11-30 17:28 ` Stephen Hemminger 2011-01-09 21:32 ` Pablo Neira Ayuso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).