* [PATCH nf-next 0/3] netfilter: nf_tables: remove rbtree async garbage collection
@ 2023-10-13 12:18 Florian Westphal
2023-10-13 12:18 ` [PATCH nf-next 1/3] netfilter: nf_tables: de-constify set commit ops function argument Florian Westphal
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Florian Westphal @ 2023-10-13 12:18 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
The 'rbtree' set backend does not support insertion/removal of elements
from the datapath (ruleset).
Elements can only be added from the control plane, so there is no
compelling reason for regular, async gc scans in the background.
Change rbtree to use the existing 'commit' callback to do a gc scan
instead. This is run as a last step in the commit phase, when all
checks have passed.
This makes rbtree less complex. It also avoids the need to use atomic
allocations during gc: the commit hook is allowed to sleep, the
transaction mutex prevents any interference during walk.
Florian Westphal (3):
netfilter: nf_tables: de-constify set commit ops function argument
netfilter: nft_set_rbtree: rename gc deactivate+erase function
netfilter: nft_set_rbtree: prefer sync gc to async worker
include/net/netfilter/nf_tables.h | 2 +-
net/netfilter/nft_set_pipapo.c | 7 +-
net/netfilter/nft_set_rbtree.c | 135 ++++++++++++++++--------------
3 files changed, 75 insertions(+), 69 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH nf-next 1/3] netfilter: nf_tables: de-constify set commit ops function argument
2023-10-13 12:18 [PATCH nf-next 0/3] netfilter: nf_tables: remove rbtree async garbage collection Florian Westphal
@ 2023-10-13 12:18 ` Florian Westphal
2023-10-13 12:18 ` [PATCH nf-next 2/3] netfilter: nft_set_rbtree: rename gc deactivate+erase function Florian Westphal
2023-10-13 12:18 ` [PATCH nf-next 3/3] netfilter: nft_set_rbtree: prefer sync gc to async worker Florian Westphal
2 siblings, 0 replies; 6+ messages in thread
From: Florian Westphal @ 2023-10-13 12:18 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
The set backend using this already has to work around this via ugly
cast, don't spread this pattern.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_tables.h | 2 +-
net/netfilter/nft_set_pipapo.c | 7 +++----
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 9fb16485d08f..8de040d2d2cf 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -462,7 +462,7 @@ struct nft_set_ops {
const struct nft_set *set,
const struct nft_set_elem *elem,
unsigned int flags);
- void (*commit)(const struct nft_set *set);
+ void (*commit)(struct nft_set *set);
void (*abort)(const struct nft_set *set);
u64 (*privsize)(const struct nlattr * const nla[],
const struct nft_set_desc *desc);
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index c0dcc40de358..75a9dee353e2 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1549,12 +1549,11 @@ static void nft_pipapo_gc_deactivate(struct net *net, struct nft_set *set,
/**
* pipapo_gc() - Drop expired entries from set, destroy start and end elements
- * @_set: nftables API set representation
+ * @set: nftables API set representation
* @m: Matching data
*/
-static void pipapo_gc(const struct nft_set *_set, struct nft_pipapo_match *m)
+static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
{
- struct nft_set *set = (struct nft_set *) _set;
struct nft_pipapo *priv = nft_set_priv(set);
struct net *net = read_pnet(&set->net);
int rules_f0, first_rule = 0;
@@ -1672,7 +1671,7 @@ static void pipapo_reclaim_match(struct rcu_head *rcu)
* We also need to create a new working copy for subsequent insertions and
* deletions.
*/
-static void nft_pipapo_commit(const struct nft_set *set)
+static void nft_pipapo_commit(struct nft_set *set)
{
struct nft_pipapo *priv = nft_set_priv(set);
struct nft_pipapo_match *new_clone, *old;
--
2.41.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH nf-next 2/3] netfilter: nft_set_rbtree: rename gc deactivate+erase function
2023-10-13 12:18 [PATCH nf-next 0/3] netfilter: nf_tables: remove rbtree async garbage collection Florian Westphal
2023-10-13 12:18 ` [PATCH nf-next 1/3] netfilter: nf_tables: de-constify set commit ops function argument Florian Westphal
@ 2023-10-13 12:18 ` Florian Westphal
2023-10-25 9:45 ` Pablo Neira Ayuso
2023-10-13 12:18 ` [PATCH nf-next 3/3] netfilter: nft_set_rbtree: prefer sync gc to async worker Florian Westphal
2 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2023-10-13 12:18 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
Next patch adds a cllaer that doesn't hold the priv->write lock and
will need a similar function.
Rename the existing function to make it clear that it can only
be used for opportunistic gc during insertion.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_set_rbtree.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 2660ceab3759..e137a0fd4179 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -221,14 +221,15 @@ static void *nft_rbtree_get(const struct net *net, const struct nft_set *set,
return rbe;
}
-static void nft_rbtree_gc_remove(struct net *net, struct nft_set *set,
- struct nft_rbtree *priv,
- struct nft_rbtree_elem *rbe)
+static void nft_rbtree_gc_elem_remove(struct net *net, struct nft_set *set,
+ struct nft_rbtree *priv,
+ struct nft_rbtree_elem *rbe)
{
struct nft_set_elem elem = {
.priv = rbe,
};
+ lockdep_assert_held_write(&priv->lock);
nft_setelem_data_deactivate(net, set, &elem);
rb_erase(&rbe->node, &priv->root);
}
@@ -263,7 +264,7 @@ nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv,
rbe_prev = NULL;
if (prev) {
rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node);
- nft_rbtree_gc_remove(net, set, priv, rbe_prev);
+ nft_rbtree_gc_elem_remove(net, set, priv, rbe_prev);
/* There is always room in this trans gc for this element,
* memory allocation never actually happens, hence, the warning
@@ -277,7 +278,7 @@ nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv,
nft_trans_gc_elem_add(gc, rbe_prev);
}
- nft_rbtree_gc_remove(net, set, priv, rbe);
+ nft_rbtree_gc_elem_remove(net, set, priv, rbe);
gc = nft_trans_gc_queue_sync(gc, GFP_ATOMIC);
if (WARN_ON_ONCE(!gc))
return ERR_PTR(-ENOMEM);
--
2.41.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH nf-next 3/3] netfilter: nft_set_rbtree: prefer sync gc to async worker
2023-10-13 12:18 [PATCH nf-next 0/3] netfilter: nf_tables: remove rbtree async garbage collection Florian Westphal
2023-10-13 12:18 ` [PATCH nf-next 1/3] netfilter: nf_tables: de-constify set commit ops function argument Florian Westphal
2023-10-13 12:18 ` [PATCH nf-next 2/3] netfilter: nft_set_rbtree: rename gc deactivate+erase function Florian Westphal
@ 2023-10-13 12:18 ` Florian Westphal
2023-10-25 9:46 ` Pablo Neira Ayuso
2 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2023-10-13 12:18 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
There is no need for asynchronous garbage collection, rbtree inserts
can only happen from the netlink control plane.
We already perform on-demand gc on insertion, in the area of the
tree where the insertion takes place, but we don't do a full tree
walk there for performance reasons.
Do a full gc walk at the end of the transaction instead and
remove the async worker.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_set_rbtree.c | 124 +++++++++++++++++----------------
1 file changed, 65 insertions(+), 59 deletions(-)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index e137a0fd4179..5bb6d475e9e1 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -19,7 +19,7 @@ struct nft_rbtree {
struct rb_root root;
rwlock_t lock;
seqcount_rwlock_t count;
- struct delayed_work gc_work;
+ unsigned long last_gc;
};
struct nft_rbtree_elem {
@@ -48,8 +48,7 @@ static int nft_rbtree_cmp(const struct nft_set *set,
static bool nft_rbtree_elem_expired(const struct nft_rbtree_elem *rbe)
{
- return nft_set_elem_expired(&rbe->ext) ||
- nft_set_elem_is_dead(&rbe->ext);
+ return nft_set_elem_expired(&rbe->ext);
}
static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set,
@@ -508,6 +507,15 @@ static int nft_rbtree_insert(const struct net *net, const struct nft_set *set,
return err;
}
+static void nft_rbtree_erase(struct nft_rbtree *priv, struct nft_rbtree_elem *rbe)
+{
+ write_lock_bh(&priv->lock);
+ write_seqcount_begin(&priv->count);
+ rb_erase(&rbe->node, &priv->root);
+ write_seqcount_end(&priv->count);
+ write_unlock_bh(&priv->lock);
+}
+
static void nft_rbtree_remove(const struct net *net,
const struct nft_set *set,
const struct nft_set_elem *elem)
@@ -515,11 +523,7 @@ static void nft_rbtree_remove(const struct net *net,
struct nft_rbtree *priv = nft_set_priv(set);
struct nft_rbtree_elem *rbe = elem->priv;
- write_lock_bh(&priv->lock);
- write_seqcount_begin(&priv->count);
- rb_erase(&rbe->node, &priv->root);
- write_seqcount_end(&priv->count);
- write_unlock_bh(&priv->lock);
+ nft_rbtree_erase(priv, rbe);
}
static void nft_rbtree_activate(const struct net *net,
@@ -611,45 +615,40 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
read_unlock_bh(&priv->lock);
}
-static void nft_rbtree_gc(struct work_struct *work)
+static void nft_rbtree_gc_remove(struct net *net, struct nft_set *set,
+ struct nft_rbtree *priv,
+ struct nft_rbtree_elem *rbe)
{
+ struct nft_set_elem elem = {
+ .priv = rbe,
+ };
+
+ nft_setelem_data_deactivate(net, set, &elem);
+ nft_rbtree_erase(priv, rbe);
+}
+
+static void nft_rbtree_gc(struct nft_set *set)
+{
+ struct nft_rbtree *priv = nft_set_priv(set);
struct nft_rbtree_elem *rbe, *rbe_end = NULL;
struct nftables_pernet *nft_net;
- struct nft_rbtree *priv;
+ struct rb_node *node, *next;
struct nft_trans_gc *gc;
- struct rb_node *node;
- struct nft_set *set;
- unsigned int gc_seq;
struct net *net;
- priv = container_of(work, struct nft_rbtree, gc_work.work);
set = nft_set_container_of(priv);
net = read_pnet(&set->net);
nft_net = nft_pernet(net);
- gc_seq = READ_ONCE(nft_net->gc_seq);
- if (nft_set_gc_is_pending(set))
- goto done;
-
- gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL);
+ gc = nft_trans_gc_alloc(set, 0, GFP_KERNEL);
if (!gc)
- goto done;
-
- read_lock_bh(&priv->lock);
- for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) {
+ return;
- /* Ruleset has been updated, try later. */
- if (READ_ONCE(nft_net->gc_seq) != gc_seq) {
- nft_trans_gc_destroy(gc);
- gc = NULL;
- goto try_later;
- }
+ for (node = rb_first(&priv->root); node ; node = next) {
+ next = rb_next(node);
rbe = rb_entry(node, struct nft_rbtree_elem, node);
- if (nft_set_elem_is_dead(&rbe->ext))
- goto dead_elem;
-
/* elements are reversed in the rbtree for historical reasons,
* from highest to lowest value, that is why end element is
* always visited before the start element.
@@ -661,37 +660,34 @@ static void nft_rbtree_gc(struct work_struct *work)
if (!nft_set_elem_expired(&rbe->ext))
continue;
- nft_set_elem_dead(&rbe->ext);
-
- if (!rbe_end)
- continue;
-
- nft_set_elem_dead(&rbe_end->ext);
-
- gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC);
+ gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
if (!gc)
goto try_later;
- nft_trans_gc_elem_add(gc, rbe_end);
- rbe_end = NULL;
-dead_elem:
- gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC);
+ /* end element needs to be removed first, it has
+ * no timeout extension.
+ */
+ if (rbe_end) {
+ nft_rbtree_gc_remove(net, set, priv, rbe_end);
+ nft_trans_gc_elem_add(gc, rbe_end);
+ rbe_end = NULL;
+ }
+
+ gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
if (!gc)
goto try_later;
+ nft_rbtree_gc_remove(net, set, priv, rbe);
nft_trans_gc_elem_add(gc, rbe);
}
- gc = nft_trans_gc_catchall_async(gc, gc_seq);
-
try_later:
- read_unlock_bh(&priv->lock);
- if (gc)
- nft_trans_gc_queue_async_done(gc);
-done:
- queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
- nft_set_gc_interval(set));
+ if (gc) {
+ gc = nft_trans_gc_catchall_sync(gc);
+ nft_trans_gc_queue_sync_done(gc);
+ priv->last_gc = jiffies;
+ }
}
static u64 nft_rbtree_privsize(const struct nlattr * const nla[],
@@ -710,11 +706,6 @@ static int nft_rbtree_init(const struct nft_set *set,
seqcount_rwlock_init(&priv->count, &priv->lock);
priv->root = RB_ROOT;
- INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rbtree_gc);
- if (set->flags & NFT_SET_TIMEOUT)
- queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
- nft_set_gc_interval(set));
-
return 0;
}
@@ -725,8 +716,6 @@ static void nft_rbtree_destroy(const struct nft_ctx *ctx,
struct nft_rbtree_elem *rbe;
struct rb_node *node;
- cancel_delayed_work_sync(&priv->gc_work);
- rcu_barrier();
while ((node = priv->root.rb_node) != NULL) {
rb_erase(node, &priv->root);
rbe = rb_entry(node, struct nft_rbtree_elem, node);
@@ -752,6 +741,21 @@ static bool nft_rbtree_estimate(const struct nft_set_desc *desc, u32 features,
return true;
}
+static void nft_rbtree_commit(struct nft_set *set)
+{
+ struct nft_rbtree *priv = nft_set_priv(set);
+
+ if (time_after_eq(jiffies, priv->last_gc + nft_set_gc_interval(set)))
+ nft_rbtree_gc(set);
+}
+
+static void nft_rbtree_gc_init(const struct nft_set *set)
+{
+ struct nft_rbtree *priv = nft_set_priv(set);
+
+ priv->last_gc = jiffies;
+}
+
const struct nft_set_type nft_set_rbtree_type = {
.features = NFT_SET_INTERVAL | NFT_SET_MAP | NFT_SET_OBJECT | NFT_SET_TIMEOUT,
.ops = {
@@ -765,6 +769,8 @@ const struct nft_set_type nft_set_rbtree_type = {
.deactivate = nft_rbtree_deactivate,
.flush = nft_rbtree_flush,
.activate = nft_rbtree_activate,
+ .commit = nft_rbtree_commit,
+ .gc_init = nft_rbtree_gc_init,
.lookup = nft_rbtree_lookup,
.walk = nft_rbtree_walk,
.get = nft_rbtree_get,
--
2.41.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH nf-next 2/3] netfilter: nft_set_rbtree: rename gc deactivate+erase function
2023-10-13 12:18 ` [PATCH nf-next 2/3] netfilter: nft_set_rbtree: rename gc deactivate+erase function Florian Westphal
@ 2023-10-25 9:45 ` Pablo Neira Ayuso
0 siblings, 0 replies; 6+ messages in thread
From: Pablo Neira Ayuso @ 2023-10-25 9:45 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Fri, Oct 13, 2023 at 02:18:15PM +0200, Florian Westphal wrote:
> Next patch adds a cllaer that doesn't hold the priv->write lock and
> will need a similar function.
>
> Rename the existing function to make it clear that it can only
> be used for opportunistic gc during insertion.
Applied to nf-next, thanks
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf-next 3/3] netfilter: nft_set_rbtree: prefer sync gc to async worker
2023-10-13 12:18 ` [PATCH nf-next 3/3] netfilter: nft_set_rbtree: prefer sync gc to async worker Florian Westphal
@ 2023-10-25 9:46 ` Pablo Neira Ayuso
0 siblings, 0 replies; 6+ messages in thread
From: Pablo Neira Ayuso @ 2023-10-25 9:46 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Fri, Oct 13, 2023 at 02:18:16PM +0200, Florian Westphal wrote:
> There is no need for asynchronous garbage collection, rbtree inserts
> can only happen from the netlink control plane.
>
> We already perform on-demand gc on insertion, in the area of the
> tree where the insertion takes place, but we don't do a full tree
> walk there for performance reasons.
>
> Do a full gc walk at the end of the transaction instead and
> remove the async worker.
Also applied, thanks
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-10-25 9:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-13 12:18 [PATCH nf-next 0/3] netfilter: nf_tables: remove rbtree async garbage collection Florian Westphal
2023-10-13 12:18 ` [PATCH nf-next 1/3] netfilter: nf_tables: de-constify set commit ops function argument Florian Westphal
2023-10-13 12:18 ` [PATCH nf-next 2/3] netfilter: nft_set_rbtree: rename gc deactivate+erase function Florian Westphal
2023-10-25 9:45 ` Pablo Neira Ayuso
2023-10-13 12:18 ` [PATCH nf-next 3/3] netfilter: nft_set_rbtree: prefer sync gc to async worker Florian Westphal
2023-10-25 9:46 ` Pablo Neira Ayuso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).