Re: [RFC] netfilter: conntrack race between dump_table and destroy

Netdev List
 help / color / mirror / Atom feed

From: Eric Dumazet <eric.dumazet@gmail.com>
To: Stephen Hemminger <shemminger@vyatta.com>
Cc: Patrick McHardy <kaber@trash.net>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	netdev@vger.kernel.org, netfilter-devel@vger.kernel.org
Subject: Re: [RFC] netfilter: conntrack race between dump_table and destroy
Date: Thu, 25 Nov 2010 08:13:34 +0100	[thread overview]
Message-ID: <1290669214.2798.109.camel@edumazet-laptop> (raw)
In-Reply-To: <20101124230004.1dc28e5a@nehalam>

Le mercredi 24 novembre 2010 à 23:00 -0800, Stephen Hemminger a écrit :
> On Thu, 25 Nov 2010 07:34:33 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Le mercredi 24 novembre 2010 à 22:27 -0800, Stephen Hemminger a écrit :
> > > A customer reported a crash and the backtrace showed that
> > > ctnetlink_dump_table was running while a conntrack entry was
> > > being destroyed.  It looks like the code for walking the table
> > > with hlist_nulls_for_each_entry_rcu is not correctly handling the
> > > case where it finds a deleted entry.
> > > 
> > > According to RCU documentation, when using hlist_nulls the reader
> > > must handle the case of seeing a deleted entry and not proceed
> > > further down the linked list.  For lookup the correct behavior would
> > > be to restart the scan, but that would generate duplicate entries.
> > > 
> > > This patch is the simplest one of three alternatives:
> > >   1) if dead entry detected, skip the rest of the hash chain (see below)
> > >   2) remember skb location at start of hash chain and rescan that chain
> > >   3) switch to using a full lock when scanning rather than RCU.
> > > It all depends on the amount of effort versus consistency of results.
> > > 
> > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > > 
> > > 
> > > --- a/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:11:27.661682148 -0800
> > > +++ b/net/netfilter/nf_conntrack_netlink.c	2010-11-24 14:22:28.431980247 -0800
> > > @@ -651,8 +651,12 @@ restart:
> > >  			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> > >  				continue;
> > >  			ct = nf_ct_tuplehash_to_ctrack(h);
> > > +
> > > +			/* if entry is being deleted then can not proceed
> > > +			 * past this point. */
> > >  			if (!atomic_inc_not_zero(&ct->ct_general.use))
> > > -				continue;
> > > +				break;
> > > +
> > >  			/* Dump entries of a given L3 protocol number.
> > >  			 * If it is not specified, ie. l3proto == 0,
> > >  			 * then dump everything. */
> > > --
> > 
> > Hmm...
> > 
> > How restarting the loop can be a problem ? 
> 
> At this point in the loop, some entries have been placed in the netlink
> dump buffer. Restarting the loop will cause duplicate entries.
> 

Then this is another problem.

We cannot use RCU at all here.

Or we miss valid entries in the chain.

> > There must be a bug somewhere else that your patch try to avoid, not to
> > really fix.
> > 
> > Normally, destroyed ct is removed eventually from the chain, so this
> > lookup should stop.
> 
> Because hlist_nulls it is possible to walk into a dead entry, in that
> case the next pointer is no longer valid.
> 

RCU should be used where needed, in fast path only, to find one entry,
not to find _all_ entries.

For example, we cannot use it for UDP multicast delivery for the same
reasons : If we find a deleted or moved socket, we must restart the loop
and forget all accumulated sockets.

If netlink dumps each entry in the final destination container, then we
cannot restart loop, and cannot use RCU for chain lookup.

next prev parent reply	other threads:[~2010-11-25  7:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-25  6:27 [RFC] netfilter: conntrack race between dump_table and destroy Stephen Hemminger
2010-11-25  6:34 ` Eric Dumazet
2010-11-25  7:00   ` Stephen Hemminger
2010-11-25  7:13     ` Eric Dumazet [this message]
2010-11-26 21:51       ` [PATCH] netfilter: fix race in conntrack " Stephen Hemminger
2010-11-27  6:32         ` Eric Dumazet
2010-11-30 17:28         ` Stephen Hemminger
2011-01-09 21:32         ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1290669214.2798.109.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox