netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH 1/1] netfilter: ipset: Fix set:list type crash when flush/dump set in parallel
Date: Thu, 10 Mar 2016 19:27:32 +0100	[thread overview]
Message-ID: <20160310182732.GA21139@salvia> (raw)
In-Reply-To: <alpine.DEB.2.10.1603082004430.19980@blackhole.kfki.hu>

On Tue, Mar 08, 2016 at 08:27:30PM +0100, Jozsef Kadlecsik wrote:
> Hi Pablo,
> 
> On Sat, 5 Mar 2016, Pablo Neira Ayuso wrote:
> 
> > On Mon, Feb 29, 2016 at 01:47:59PM +0100, Jozsef Kadlecsik wrote:
> > > On Mon, 29 Feb 2016, Pablo Neira Ayuso wrote:
> > > 
> > > > On Wed, Feb 24, 2016 at 09:19:26PM +0100, Jozsef Kadlecsik wrote:
> > > > > Flushing/listing entries was not RCU safe, so parallel flush/dump
> > > > > could lead to kernel crash. Bug reported by Deniz Eren.
> > > > > 
> > > > > Fixes netfilter bugzilla id #1050.
> > > > > 
> > > > > Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> > > > > ---
> > > > >  net/netfilter/ipset/ip_set_core.c     |  3 ++
> > > > >  net/netfilter/ipset/ip_set_list_set.c | 55 ++++++++++++++++-------------------
> > > > >  2 files changed, 28 insertions(+), 30 deletions(-)
> > > > > 
> > > > > diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
> > > > > index 95db43f..7e6568c 100644
> > > > > --- a/net/netfilter/ipset/ip_set_core.c
> > > > > +++ b/net/netfilter/ipset/ip_set_core.c
> > > > > @@ -985,6 +985,9 @@ static int ip_set_destroy(struct net *net, struct sock *ctnl,
> > > > >  	if (unlikely(protocol_failed(attr)))
> > > > >  		return -IPSET_ERR_PROTOCOL;
> > > > >  
> > > > > +	/* Must wait for flush to be really finished in list:set */
> > > > > +	rcu_barrier();
> > > > 
> > > > Jozsef, are you sure you need this rcu_barrier()?
> > > > 
> > > > This is waiting for rcu callback completion (ie. decrement the set
> > > > reference counter and releasing the extension and object itself).
> > > 
> > > Yes, that's exactly what it's waiting for, in the case of sets which are 
> > > elements in set:list type of sets.
> > > 
> > > > The rcu read side should be safe when accessing old copies from the
> > > > dumping path when using call_rcu().
> > > 
> > > Three competing actions play role here: flush/delete, destroy, dumping. 
> > > The rcu_barrier() above waits for the flush/delete actions. The rcu read 
> > > side has access to the set id only and must lookup the set by id in order 
> > > to get the name, which is protected by a reference counter. The function 
> > > ip_set_name_byindex() intentionally checks that the reference counter 
> > > cannot be equal to zero and I believe it's an important internal sanity 
> > > checking and I don't want to remove it. The problem was that the reference 
> > > counter of the set was first decremented at flush/delete and so we could 
> > > get a BUG_ON() crash in ip_set_name_byindex() when an ongoing dumping was 
> > > processed. So the reference counter must be released last for a successful 
> > > parallel dumping. If rcu_barrier() was not added to the destroy path, a 
> > > "destroy" immediately following a "flush" can lead to the error "set is in 
> > > use, cannot delete". I tested it and couldn't find a simpler way.
> > 
> > ip_set_destroy() calls ip_set_destroy_set() which releases the set
> > object and its content without waiting for the rcu grace period. I
> > think readers (dump path) may still be walking on the set by when
> > you're releasing this object.
> > 
> > Our nfnetlink dump path is lockless (rcu read side protected), so we
> > unpublish the set object and release objects via rcu callback when
> > operating from the control plane.
> 
> Yes, but we are dealing with flush and then destroy, when the set we are 
> flushing is a list:set type of set where the elements are sets. It is not 
> possible to unpublish the flushed objects, because those are totally valid 
> sets.
> 
> Before I sent my patch I tested it without rcu_barrier() and the destroy 
> command failed (quite rightly) because 'Set is in use' - destroy has to 
> wait for flush to be completed anyway.

OK, thanks for explaining. I don't find any better way at this moment
to handle the interdependencies that the set:list type creates with
this elements that are actually sets, so this rcu_barrier() guarantees
the in-order set releasing that you need not to trigger the bug.
Sorry, I misunderstood the goal of this and I just wanted to make sure
this rcu_barrier() was not actually hiding a real problem.

I have pulled this into the nf-next tree, thanks for your patience.

      reply	other threads:[~2016-03-10 18:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-24 20:19 [PATCH 0/1] ipset patch for nf Jozsef Kadlecsik
2016-02-24 20:19 ` [PATCH 1/1] netfilter: ipset: Fix set:list type crash when flush/dump set in parallel Jozsef Kadlecsik
2016-02-29 12:22   ` Pablo Neira Ayuso
2016-02-29 12:47     ` Jozsef Kadlecsik
2016-03-05 12:32       ` Pablo Neira Ayuso
2016-03-08 19:27         ` Jozsef Kadlecsik
2016-03-10 18:27           ` Pablo Neira Ayuso [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160310182732.GA21139@salvia \
    --to=pablo@netfilter.org \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).