From: Florian Westphal <fw@strlen.de>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>, netfilter-devel@vger.kernel.org
Subject: Re: [PATCH nf] netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
Date: Thu, 28 Sep 2023 15:49:23 +0200 [thread overview]
Message-ID: <20230928134923.GD27208@breakpoint.cc> (raw)
In-Reply-To: <ZRWBxJBxQ4z7QYVo@calendula>
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Thu, Sep 28, 2023 at 03:12:44PM +0200, Florian Westphal wrote:
> > nft_rbtree_gc_elem() walks back and removes the end interval element that
> > comes before the expired element.
> >
> > There is a small chance that we've cached this element as 'rbe_ge'.
> > If this happens, we hold and test a pointer that has been queued for
> > freeing.
> >
> > It also causes spurious insertion failures:
> >
> > $ cat nft-test.20230921-143934.826.dMBwHt/test-testcases-sets-0044interval_overlap_0.1/testout.log
> > Error: Could not process rule: File exists
> > add element t s { 0 - 2 }
> > ^^^^^^
> > Failed to insert 0 - 2 given:
> > table ip t {
> > set s {
> > type inet_service
> > flags interval,timeout
> > timeout 2s
> > gc-interval 2s
> > }
> > }
> >
> > The set (rbtree) is empty. The 'failure' doesn't happen on next attempt.
> >
> > Reason is that when we try to insert, the tree may hold an expired
> > element that collides with the range we're adding.
> > While we do evict/erase this element, we can trip over this check:
> >
> > if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new))
> > return -ENOTEMPTY;
> >
> > rbe_ge was erased by the synchronous gc, we should not have done this
> > check. Next attempt won't find it, so retry results in successful
> > insertion.
> >
> > Restart in-kernel to avoid such spurious errors.
> >
> > Such restart are rare, unless userspace intentionally adds very large
> > numbers of elements with very short timeouts while setting a huge
> > gc interval.
> >
> > Even in this case, this cannot loop forever, on each retry an existing
> > element has been removed.
> >
> > As the caller is holding the transaction mutex, its impossible
> > for a second entity to add more expiring elements to the tree.
> >
> > After this it also becomes feasible to remove the async gc worker
> > and perform all garbage collection from the commit path.
> >
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> > Changes since v1:
> > - restart entire insertion process in case we remove
> > element that we held as lesser/greater overlap detection
> Thanks, I am still looking at finding a way to move this to .commit,
> if no better solution, then let's get this in for the next round.
I don't think its that bad. In most cases, no restart is required
as no expired elements in the interesting range will ever be found.
I think its fine really, no need to go for two trees or anything
like pipapo is doing.
I have a few patches that build on top of this, first one
gets rid of async worker and places it in commit.
prev parent reply other threads:[~2023-09-28 13:49 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-28 13:12 [PATCH nf] netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure Florian Westphal
2023-09-28 13:38 ` Pablo Neira Ayuso
2023-09-28 13:49 ` Florian Westphal [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230928134923.GD27208@breakpoint.cc \
--to=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.