Re: [PATCH nf] nft_set_rbtree: Move clauses for expired nodes, last active node as leaf

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH nf] nft_set_rbtree: Move clauses for expired nodes, last active node as leaf
Date: Mon, 6 Jun 2022 11:01:21 +0200	[thread overview]
Message-ID: <Yp3CYfbdHH1lm945@salvia> (raw)
In-Reply-To: <20220603150445.3d797c87@elisabeth>

On Fri, Jun 03, 2022 at 03:04:45PM +0200, Stefano Brivio wrote:
> On Wed, 1 Jun 2022 13:15:08 +0200
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> > On Wed, May 25, 2022 at 02:15:07PM +0200, Stefano Brivio wrote:
> > > On Mon, 23 May 2022 16:59:30 +0200
> > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:  
> > [...]
> > > > Another possibility? Maintain two trees, one for the existing
> > > > generation (this is read-only) and another for the next generation
> > > > (insertion/removals are performed on it), then swap them when commit
> > > > happens?  
> > > 
> > > It sounded like a good idea and I actually started sketching it, but
> > > there's one fundamental issue: it doesn't help with overlap detection
> > > because we also want to check insertions that will be part of the live
> > > copy. If, within one transaction, you delete and create elements, the
> > > "dirty" copy is still dirty for the purposes of overlaps.  
> > 
> > Updates on the copy could be done without using the deactivate/active
> > logic since it is not exposed to the packet path. Then, you use the
> > copy (next generation of the datastructure) to check for overlaps? We
> > need keep pointer to two sets in the rbtree private data area, the
> > generation ID would point to the current set that is being used from
> > the packet path. The stale tree is released from the commit phase (see
> > below).
> 
> Oh, right, I guess that would work. But at a first glance it looks more
> complicated than the other idea:

Yes, my idea would trigger a larger rewrite.

> > > For the lookup, that might help. Is it worth it right now, though? At
> > > the moment I would go back and try to get overlap detection work
> > > properly, at least, with my previous idea.  
> > 
> > If your idea is still in planning phase, could you summarize again the
> > idea? You mentioned about using gc you mentioned, if it is more simple
> > than my proposal, then it should be good to go too.
> 
> ...hmm, no, forget about gc, that was flawed. I'm just "walking" the
> tree (going through elements as a list, instead of descending it),
> noting down closest left and right elements to what we're inserting,
> and check it with similar criteria to what we already have (but much
> simpler, because we don't have to infer anything from what might be in
> other leaves/nodes).
> 
> That is, if you have elements 3 (start), 5 (end), 7 (start), 8 (end),
> and you're inserting 6 as a start, we'll end up the tree walk with 5
> (end) on the left and 7 (start) on the right, so we know it's not
> overlapping.
> 
> If you insert 4 (as start or end), we know we have 3 and 5 around, so
> it overlaps.
> 
> It's essentially the same as it is now, just dropping a lot of corner
> cases and changing the way we go through the tree.
> 
> I kind of implemented it, I still need a bit to make it working.

That sounds an incremental fix, I prefer this too.

> > > > pipapo has similar requirements, currently it is relying on a
> > > > workqueue to make some postprocess after the commit phase. At the
> > > > expense of consuming more memory.  
> > > 
> > > Well, it keeps two copies already: all the insertions and removals are
> > > done on the dirty copy. The two problems we have there are:
> > > 
> > > - the clean copy might also contain inactive elements (because on a
> > >   "commit" operation the API doesn't guarantee that all inserted
> > >   elements are active), so we need to check for those during lookup,
> > >   which is quite heavy (in my tests that was 4% of the clock cycles
> > >   needed for lookup in a set with 30 000 "port, protocol" entries)
> > > 
> > > - on every _activate() call, we also need to commit the dirty copy onto
> > >   a clean one, instead of having one commit per transaction (because if
> > >   there's a newly activated item, we need to see it from the lookup),
> > >   so every transaction translates to a number of commit operations for
> > >   the back-end.
> > > 
> > >   That also makes things a bit complicated and it might very well be
> > >   related to point 3. below
> > > 
> > > ...there's no actual workqueue: garbage collection (for timed out
> > > entries) only happens on commit, I don't see a particular problem with
> > > it.
> > > 
> > > I think both issues would be solved if we had a more natural API, that
> > > is, having a single call to the back-end implementing a commit
> > > operation, instead of separately activating single entries. I don't
> > > know how complicated this change would be.  
> > 
> > It should be possible to add a ->commit operation to set->ops, then
> > call it at the end of the commit phase, ie. iterate over the list of
> > existing sets in the table and call set->ops->commit().
> 
> That sounds good, but when would we call it? Can it be independent of
> the userspace version? Could we then obsolete the "activate" operation
> (of course, by implementing commit() in all the sets)?

Call it from nf_tables_commit().

I don't see how we can obsolete "activate" operation, though, the
existing approach works at set element granularity.

next prev parent reply	other threads:[~2022-06-06  9:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12 18:34 [PATCH nf] nft_set_rbtree: Move clauses for expired nodes, last active node as leaf Stefano Brivio
2022-05-16 18:16 ` Pablo Neira Ayuso
2022-05-17 12:57   ` Stefano Brivio
2022-05-20 15:45     ` Stefano Brivio
2022-05-23 14:59       ` Pablo Neira Ayuso
2022-05-25 12:15         ` Stefano Brivio
2022-06-01 11:15           ` Pablo Neira Ayuso
2022-06-03 13:04             ` Stefano Brivio
2022-06-06  9:01               ` Pablo Neira Ayuso [this message]
2022-06-14  9:58                 ` Stefano Brivio
2022-06-16  9:08                   ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yp3CYfbdHH1lm945@salvia \
    --to=pablo@netfilter.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.