From: Florian Westphal <fw@strlen.de>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>, netfilter-devel@vger.kernel.org
Subject: Re: [PATCH nf] netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure
Date: Thu, 28 Sep 2023 15:49:23 +0200 [thread overview]
Message-ID: <20230928134923.GD27208@breakpoint.cc> (raw)
In-Reply-To: <ZRWBxJBxQ4z7QYVo@calendula>
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Thu, Sep 28, 2023 at 03:12:44PM +0200, Florian Westphal wrote:
> > nft_rbtree_gc_elem() walks back and removes the end interval element that
> > comes before the expired element.
> >
> > There is a small chance that we've cached this element as 'rbe_ge'.
> > If this happens, we hold and test a pointer that has been queued for
> > freeing.
> >
> > It also causes spurious insertion failures:
> >
> > $ cat nft-test.20230921-143934.826.dMBwHt/test-testcases-sets-0044interval_overlap_0.1/testout.log
> > Error: Could not process rule: File exists
> > add element t s { 0 - 2 }
> > ^^^^^^
> > Failed to insert 0 - 2 given:
> > table ip t {
> > set s {
> > type inet_service
> > flags interval,timeout
> > timeout 2s
> > gc-interval 2s
> > }
> > }
> >
> > The set (rbtree) is empty. The 'failure' doesn't happen on next attempt.
> >
> > Reason is that when we try to insert, the tree may hold an expired
> > element that collides with the range we're adding.
> > While we do evict/erase this element, we can trip over this check:
> >
> > if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new))
> > return -ENOTEMPTY;
> >
> > rbe_ge was erased by the synchronous gc, we should not have done this
> > check. Next attempt won't find it, so retry results in successful
> > insertion.
> >
> > Restart in-kernel to avoid such spurious errors.
> >
> > Such restart are rare, unless userspace intentionally adds very large
> > numbers of elements with very short timeouts while setting a huge
> > gc interval.
> >
> > Even in this case, this cannot loop forever, on each retry an existing
> > element has been removed.
> >
> > As the caller is holding the transaction mutex, its impossible
> > for a second entity to add more expiring elements to the tree.
> >
> > After this it also becomes feasible to remove the async gc worker
> > and perform all garbage collection from the commit path.
> >
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> > Changes since v1:
> > - restart entire insertion process in case we remove
> > element that we held as lesser/greater overlap detection
> Thanks, I am still looking at finding a way to move this to .commit,
> if no better solution, then let's get this in for the next round.
I don't think its that bad. In most cases, no restart is required
as no expired elements in the interesting range will ever be found.
I think its fine really, no need to go for two trees or anything
like pipapo is doing.
I have a few patches that build on top of this, first one
gets rid of async worker and places it in commit.
prev parent reply other threads:[~2023-09-28 13:49 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-28 13:12 [PATCH nf] netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure Florian Westphal
2023-09-28 13:38 ` Pablo Neira Ayuso
2023-09-28 13:49 ` Florian Westphal [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230928134923.GD27208@breakpoint.cc \
--to=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).