From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH nf] nft_set_rbtree: Move clauses for expired nodes, last active node as leaf
Date: Mon, 23 May 2022 16:59:30 +0200 [thread overview]
Message-ID: <YouhUq09zfcflOnz@salvia> (raw)
In-Reply-To: <20220520174524.439b5fa2@elisabeth>
On Fri, May 20, 2022 at 05:45:24PM +0200, Stefano Brivio wrote:
> On Tue, 17 May 2022 14:57:09 +0200
> Stefano Brivio <sbrivio@redhat.com> wrote:
>
> > On Mon, 16 May 2022 20:16:53 +0200
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > > Hi Stefano,
> > >
> > > On Thu, May 12, 2022 at 08:34:21PM +0200, Stefano Brivio wrote:
> > > > In the overlap detection performed as part of the insertion operation,
> > > > we have to skip nodes that are not active in the current generation or
> > > > expired. This was done by adding several conditions in overlap decision
> > > > clauses, which made it hard to check for correctness, and debug the
> > > > actual issue this patch is fixing.
> > > >
> > > > Consolidate these conditions into a single clause, so that we can
> > > > proceed with debugging and fixing the following issue.
> > > >
> > > > Case b3. described in comments covers the insertion of a start
> > > > element after an existing end, as a leaf. If all the descendants of
> > > > a given element are inactive, functionally, for the purposes of
> > > > overlap detection, we still have to treat this as a leaf, but we don't.
> > > >
> > > > If, in the same transaction, we remove a start element to the right,
> > > > remove an end element to the left, and add a start element to the right
> > > > of an existing, active end element, we don't hit case b3. For example:
> > > >
> > > > - existing intervals: 1-2, 3-5, 6-7
> > > > - transaction: delete 3-5, insert 4-5
> > > >
> > > > node traversal might happen as follows:
> > > > - 2 (active end)
> > > > - 5 (inactive end)
> > > > - 3 (inactive start)
> > > >
> > > > when we add 4 as start element, we should simply consider 2 as
> > > > preceding end, and consider it as a leaf, because interval 3-5 has been
> > > > deleted. If we don't, we'll report an overlap because we forgot about
> > > > this.
> > > >
> > > > Add an additional flag which is set as we find an active end, and reset
> > > > it if we find an active start (to the left). If we finish the traversal
> > > > with this flag set, it means we need to functionally consider the
> > > > previous active end as a leaf, and allow insertion instead of reporting
> > > > overlap.
> > >
> > > I can still trigger EEXIST with deletion of existing interval. It
> > > became harder to reproduce after this patch.
> > >
> > > After hitting EEXIST, if I do:
> > >
> > > echo "flush ruleset" > test.nft
> > > nft list ruleset >> test.nft
> > >
> > > to dump the existing ruleset, then I run the delete element command
> > > again to remove the interval and it works. Before this patch I could
> > > reproduce it by reloading an existing ruleset dump.
> > >
> > > I'm running the script that I'm attaching manually, just one manual
> > > invocation after another.
> >
> > Ouch, sorry for that.
> >
> > It looks like there's another case where inactive elements still affect
> > overlap detection in an unexpected way... at least with the structure
> > of this patch it should be easier to find, I'm looking into that now.
>
> ...what a mess. I could figure that part out (it was a case symmetric
> to what this patch fixed, in this case resolving to case b5.) but
> there's then another case (found by triggering a specific tree topology
> with 0044interval_overlap_1) where we first add a start element, then
> fail to add the end element because the start element is completely
> "hidden" in the tree by inactive nodes.
>
> I tried to solve that with some backtracking, but that looks also
> fragile. If I clean up the tree before insertion, instead, that will
> only deal with expired nodes, not inactive nodes -- I can't erase
> non-expired, inactive nodes because the API expects to find them at
> some later point and call nft_rbtree_remove() on them.
>
> Now I'm trying another approach that looks more robust: instead of
> descending the tree to find overlaps, just going through it in the same
> way nft_rbtree_gc() does (linearly, node by node), marking the
> value-wise closest points from left and right _valid_ nodes, and
> applying the same reasoning. I need a bit more time for this, but it
> looks way simpler. Insertion itself would keep working as it does now.
>
> In hindsight, it looks like it was a terrible idea to try to fix this
> implementation. I really underestimated how bad this is. Functionally
> speaking, it's not a red-black tree because:
>
> - we can't use it as a sorted binary tree, given that some elements
> "don't matter" for some operations, or have another colour. We might
> try to think of it as some other structure and rebuild from there
> useful properties of sorted binary trees, but I'm not sure a
> "red-brown-black" tree would have any other use making it worth of
> any further research
>
> - end elements being represented as their value plus one also break
> assumptions of sorted trees (this is the issue I'm actually facing
> with backtracking)
>
> - left subtrees store keys greater than right subtrees, but this
> looks consistent so it's just added fun and could be fixed
> trivially (it's all reversed)
>
> By the way, I think we _should_ have similar issues in the lookup
> function. Given that it's possible to build a tree with a subtree of at
> least three levels with entirely non-valid nodes, I guess we can hide a
> valid interval from the lookup too. It's just harder to test.
Thanks for the detailed report.
Another possibility? Maintain two trees, one for the existing
generation (this is read-only) and another for the next generation
(insertion/removals are performed on it), then swap them when commit
happens? pipapo has similar requirements, currently it is relying on a
workqueue to make some postprocess after the commit phase. At the
expense of consuming more memory.
> In the perspective of getting rid of it, I think we need:
>
> 1. some "introductory" documentation for nft_set_pipapo -- I just
> got back to it (drawing some diagrams first...)
>
> 2. to understand if the performance gap in the few (maybe not
> reasonable) cases where nft_set_rbtree matches faster than
> nft_set_pipapo is acceptable. Summary:
> https://lore.kernel.org/netfilter-devel/be7d4e51603633e7b88e4b0dde54b25a3e5a018b.1583598508.git.sbrivio@redhat.com/
IIRC pipapo was not too far behind from rbtree for a few scenarios.
> 3. a solution for https://bugzilla.netfilter.org/show_bug.cgi?id=1583,
> it's an atomicity issue which has little to do with nft_set_pipapo
> structures themselves, but I couldn't figure out the exact issue
> yet. I'm struggling to find the time for it, so if somebody wants to
> give it a try, I'd be more than happy to reassign it...
OK, a different problem, related to pipapo.
> Anyway, I'll post a different patch for nft_set_rbtree soon.
Thanks.
next prev parent reply other threads:[~2022-05-23 14:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-12 18:34 [PATCH nf] nft_set_rbtree: Move clauses for expired nodes, last active node as leaf Stefano Brivio
2022-05-16 18:16 ` Pablo Neira Ayuso
2022-05-17 12:57 ` Stefano Brivio
2022-05-20 15:45 ` Stefano Brivio
2022-05-23 14:59 ` Pablo Neira Ayuso [this message]
2022-05-25 12:15 ` Stefano Brivio
2022-06-01 11:15 ` Pablo Neira Ayuso
2022-06-03 13:04 ` Stefano Brivio
2022-06-06 9:01 ` Pablo Neira Ayuso
2022-06-14 9:58 ` Stefano Brivio
2022-06-16 9:08 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YouhUq09zfcflOnz@salvia \
--to=pablo@netfilter.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).