netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size
@ 2024-10-16 13:19 Florian Westphal
  2024-10-16 13:19 ` [PATCH nf-next v3 1/5] netfilter: nf_tables: add nft_trans_commit_list_add_elem helper Florian Westphal
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Florian Westphal @ 2024-10-16 13:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

v3:
I failed to realize that nft_audit leaks one implementation detail
to userspace: the length of the transaction log.

This is bad, but I do not know if we can change things to make
nft_audit NOT do that.  Hence add a new workaround patch that
inflates the length based on the number of set elements in the
container structure.

Also fix up notifications, for update case, notifications were
skipped but currently newsetelem notifications are done even if
existing set element is updated.

Most patches are unchanged.
"prefer nft_trans_elem_alloc helper" is already upstreamed so
its dropped from this batch.


v2: only change is in patch 3, and by extension, the last one:
During transaction abort, we need to handle an aggregate container to
contain both new set elements and updates.  The latter must be
skipped, else we remove element that already existed at start of the
transaction.

original cover letter:

When doing a flush on a set or mass adding/removing elements from a
set, each element needs to allocate 96 bytes to hold the transactional
state.

In such cases, virtually all the information in struct nft_trans_elem
is the same.

Change nft_trans_elem to a flex-array, i.e. a single nft_trans_elem
can hold multiple set element pointers.

The number of elements that can be stored in one nft_trans_elem is limited
by the slab allocator, this series limits the compaction to at most 62
elements as it caps the reallocation to 2048 bytes of memory.



Florian Westphal (5):
  netfilter: nf_tables: add nft_trans_commit_list_add_elem helper
  netfilter: nf_tables: prepare for multiple elements in nft_trans_elem
    structure
  netfiler: nf_tables: preemitve fix for audit failure
  netfilter: nf_tables: switch trans_elem to real flex array
  netfilter: nf_tables: allocate element update information dynamically

 include/net/netfilter/nf_tables.h |  25 +-
 net/netfilter/nf_tables_api.c     | 368 +++++++++++++++++++++++-------
 2 files changed, 304 insertions(+), 89 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH nf-next v3 1/5] netfilter: nf_tables: add nft_trans_commit_list_add_elem helper
  2024-10-16 13:19 [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Florian Westphal
@ 2024-10-16 13:19 ` Florian Westphal
  2024-10-16 13:19 ` [PATCH nf-next v3 2/5] netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure Florian Westphal
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2024-10-16 13:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Add and use a wrapper to append trans_elem structures to the
transaction log.

Unlike the existing helper, pass a gfp_t to indicate if sleeping
is allowed.

This will be used by a followup patch to realloc nft_trans_elem
structures after they gain a flexible array member to reduce
number of such container structures on the transaction list.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 No changes in v3.

 net/netfilter/nf_tables_api.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 30331688301e..10f9403b45da 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -421,6 +421,17 @@ static void nft_trans_commit_list_add_tail(struct net *net, struct nft_trans *tr
 	}
 }
 
+static void nft_trans_commit_list_add_elem(struct net *net, struct nft_trans *trans,
+					   gfp_t gfp)
+{
+	WARN_ON_ONCE(trans->msg_type != NFT_MSG_NEWSETELEM &&
+		     trans->msg_type != NFT_MSG_DELSETELEM);
+
+	might_alloc(gfp);
+
+	nft_trans_commit_list_add_tail(net, trans);
+}
+
 static int nft_trans_table_add(struct nft_ctx *ctx, int msg_type)
 {
 	struct nft_trans *trans;
@@ -7183,7 +7194,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 					if (update_flags) {
 						nft_trans_elem_priv(trans) = elem_priv;
 						nft_trans_elem_update_flags(trans) = update_flags;
-						nft_trans_commit_list_add_tail(ctx->net, trans);
+						nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
 						goto err_elem_free;
 					}
 				}
@@ -7207,7 +7218,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	}
 
 	nft_trans_elem_priv(trans) = elem.priv;
-	nft_trans_commit_list_add_tail(ctx->net, trans);
+	nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
 	return 0;
 
 err_set_full:
@@ -7424,7 +7435,7 @@ static int nft_del_setelem(struct nft_ctx *ctx, struct nft_set *set,
 	nft_setelem_data_deactivate(ctx->net, set, elem.priv);
 
 	nft_trans_elem_priv(trans) = elem.priv;
-	nft_trans_commit_list_add_tail(ctx->net, trans);
+	nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
 	return 0;
 
 fail_ops:
@@ -7460,7 +7471,7 @@ static int nft_setelem_flush(const struct nft_ctx *ctx,
 	nft_setelem_data_deactivate(ctx->net, set, elem_priv);
 	nft_trans_elem_set(trans) = set;
 	nft_trans_elem_priv(trans) = elem_priv;
-	nft_trans_commit_list_add_tail(ctx->net, trans);
+	nft_trans_commit_list_add_elem(ctx->net, trans, GFP_ATOMIC);
 
 	return 0;
 }
@@ -7477,7 +7488,7 @@ static int __nft_set_catchall_flush(const struct nft_ctx *ctx,
 
 	nft_setelem_data_deactivate(ctx->net, set, elem_priv);
 	nft_trans_elem_priv(trans) = elem_priv;
-	nft_trans_commit_list_add_tail(ctx->net, trans);
+	nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
 
 	return 0;
 }
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH nf-next v3 2/5] netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure
  2024-10-16 13:19 [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Florian Westphal
  2024-10-16 13:19 ` [PATCH nf-next v3 1/5] netfilter: nf_tables: add nft_trans_commit_list_add_elem helper Florian Westphal
@ 2024-10-16 13:19 ` Florian Westphal
  2024-10-16 13:19 ` [PATCH nf-next v3 3/5] netfiler: nf_tables: preemitve fix for audit failure Florian Westphal
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2024-10-16 13:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Add helpers to release the individual elements contained in the
trans_elem container structure.

No functional change intended.

Followup patch will add 'nelems' member and will turn 'priv' into
a flexible array.

These helpers can then loop over all elements.
Care needs to be taken to handle a mix of new elements and existing
elements that are being updated (e.g. timeout refresh).

Before this patch, NEWSETELEM transaction with update is released
early so nft_trans_set_elem_destroy() won't get called, so we need
to skip elements marked as update.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Pull one hunk from next patch into this one to prevent
 a kasan splat when running tests on this patch.
 Also emit notiifcations also for element updates, this
 got broken in earlier iterations.

 include/net/netfilter/nf_tables.h |  21 ++-
 net/netfilter/nf_tables_api.c     | 228 +++++++++++++++++++++---------
 2 files changed, 173 insertions(+), 76 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 91ae20cb7648..2a2631edab2b 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1754,28 +1754,25 @@ enum nft_trans_elem_flags {
 	NFT_TRANS_UPD_EXPIRATION	= (1 << 1),
 };
 
-struct nft_trans_elem {
-	struct nft_trans		nft_trans;
-	struct nft_set			*set;
-	struct nft_elem_priv		*elem_priv;
+struct nft_trans_one_elem {
+	struct nft_elem_priv		*priv;
 	u64				timeout;
 	u64				expiration;
 	u8				update_flags;
+};
+
+struct nft_trans_elem {
+	struct nft_trans		nft_trans;
+	struct nft_set			*set;
 	bool				bound;
+	unsigned int			nelems;
+	struct nft_trans_one_elem	elems[] __counted_by(nelems);
 };
 
 #define nft_trans_container_elem(t)			\
 	container_of(t, struct nft_trans_elem, nft_trans)
 #define nft_trans_elem_set(trans)			\
 	nft_trans_container_elem(trans)->set
-#define nft_trans_elem_priv(trans)			\
-	nft_trans_container_elem(trans)->elem_priv
-#define nft_trans_elem_update_flags(trans)		\
-	nft_trans_container_elem(trans)->update_flags
-#define nft_trans_elem_timeout(trans)			\
-	nft_trans_container_elem(trans)->timeout
-#define nft_trans_elem_expiration(trans)		\
-	nft_trans_container_elem(trans)->expiration
 #define nft_trans_elem_set_bound(trans)			\
 	nft_trans_container_elem(trans)->bound
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 10f9403b45da..d88adbd96ebe 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6424,13 +6424,17 @@ static struct nft_trans *nft_trans_elem_alloc(const struct nft_ctx *ctx,
 					      int msg_type,
 					      struct nft_set *set)
 {
+	struct nft_trans_elem *te;
 	struct nft_trans *trans;
 
-	trans = nft_trans_alloc(ctx, msg_type, sizeof(struct nft_trans_elem));
+	trans = nft_trans_alloc(ctx, msg_type, struct_size(te, elems, 1));
 	if (trans == NULL)
 		return NULL;
 
-	nft_trans_elem_set(trans) = set;
+	te = nft_trans_container_elem(trans);
+	te->nelems = 1;
+	te->set = set;
+
 	return trans;
 }
 
@@ -6552,28 +6556,51 @@ static void nft_set_elem_expr_destroy(const struct nft_ctx *ctx,
 }
 
 /* Drop references and destroy. Called from gc, dynset and abort path. */
-void nft_set_elem_destroy(const struct nft_set *set,
-			  const struct nft_elem_priv *elem_priv,
-			  bool destroy_expr)
+static void __nft_set_elem_destroy(const struct nft_ctx *ctx,
+				   const struct nft_set *set,
+				   const struct nft_elem_priv *elem_priv,
+				   bool destroy_expr)
 {
 	struct nft_set_ext *ext = nft_set_elem_ext(set, elem_priv);
-	struct nft_ctx ctx = {
-		.net	= read_pnet(&set->net),
-		.family	= set->table->family,
-	};
 
 	nft_data_release(nft_set_ext_key(ext), NFT_DATA_VALUE);
 	if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA))
 		nft_data_release(nft_set_ext_data(ext), set->dtype);
 	if (destroy_expr && nft_set_ext_exists(ext, NFT_SET_EXT_EXPRESSIONS))
-		nft_set_elem_expr_destroy(&ctx, nft_set_ext_expr(ext));
+		nft_set_elem_expr_destroy(ctx, nft_set_ext_expr(ext));
 	if (nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF))
 		nft_use_dec(&(*nft_set_ext_obj(ext))->use);
 
 	kfree(elem_priv);
 }
+
+/* Drop references and destroy. Called from gc and dynset. */
+void nft_set_elem_destroy(const struct nft_set *set,
+			  const struct nft_elem_priv *elem_priv,
+			  bool destroy_expr)
+{
+	struct nft_ctx ctx = {
+		.net	= read_pnet(&set->net),
+		.family	= set->table->family,
+	};
+
+	__nft_set_elem_destroy(&ctx, set, elem_priv, destroy_expr);
+}
 EXPORT_SYMBOL_GPL(nft_set_elem_destroy);
 
+/* Drop references and destroy. Called from abort path. */
+static void nft_trans_set_elem_destroy(const struct nft_ctx *ctx, struct nft_trans_elem *te)
+{
+	int i;
+
+	for (i = 0; i < te->nelems; i++) {
+		if (te->elems[i].update_flags)
+			continue;
+
+		__nft_set_elem_destroy(ctx, te->set, te->elems[i].priv, true);
+	}
+}
+
 /* Destroy element. References have been already dropped in the preparation
  * path via nft_setelem_data_deactivate().
  */
@@ -6589,6 +6616,15 @@ void nf_tables_set_elem_destroy(const struct nft_ctx *ctx,
 	kfree(elem_priv);
 }
 
+static void nft_trans_elems_destroy(const struct nft_ctx *ctx,
+				    const struct nft_trans_elem *te)
+{
+	int i;
+
+	for (i = 0; i < te->nelems; i++)
+		nf_tables_set_elem_destroy(ctx, te->set, te->elems[i].priv);
+}
+
 int nft_set_elem_expr_clone(const struct nft_ctx *ctx, struct nft_set *set,
 			    struct nft_expr *expr_array[])
 {
@@ -6745,6 +6781,36 @@ static void nft_setelem_activate(struct net *net, struct nft_set *set,
 	}
 }
 
+static void nft_trans_elem_update(const struct nft_set *set,
+				  const struct nft_trans_one_elem *elem)
+{
+	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
+
+	if (elem->update_flags & NFT_TRANS_UPD_TIMEOUT)
+		WRITE_ONCE(nft_set_ext_timeout(ext)->timeout, elem->timeout);
+
+	if (elem->update_flags & NFT_TRANS_UPD_EXPIRATION)
+		WRITE_ONCE(nft_set_ext_timeout(ext)->expiration, get_jiffies_64() + elem->expiration);
+}
+
+static void nft_trans_elems_add(const struct nft_ctx *ctx,
+				struct nft_trans_elem *te)
+{
+	int i;
+
+	for (i = 0; i < te->nelems; i++) {
+		const struct nft_trans_one_elem *elem = &te->elems[i];
+
+		if (elem->update_flags)
+			nft_trans_elem_update(te->set, elem);
+		else
+			nft_setelem_activate(ctx->net, te->set, elem->priv);
+
+		nf_tables_setelem_notify(ctx, te->set, elem->priv,
+					 NFT_MSG_NEWSETELEM);
+	}
+}
+
 static int nft_setelem_catchall_deactivate(const struct net *net,
 					   struct nft_set *set,
 					   struct nft_set_elem *elem)
@@ -6827,6 +6893,24 @@ static void nft_setelem_remove(const struct net *net,
 		set->ops->remove(net, set, elem_priv);
 }
 
+static void nft_trans_elems_remove(const struct nft_ctx *ctx,
+				   const struct nft_trans_elem *te)
+{
+	int i;
+
+	for (i = 0; i < te->nelems; i++) {
+		nf_tables_setelem_notify(ctx, te->set,
+					 te->elems[i].priv,
+					 te->nft_trans.msg_type);
+
+		nft_setelem_remove(ctx->net, te->set, te->elems[i].priv);
+		if (!nft_setelem_is_catchall(te->set, te->elems[i].priv)) {
+			atomic_dec(&te->set->nelems);
+			te->set->ndeact--;
+		}
+	}
+}
+
 static bool nft_setelem_valid_key_end(const struct nft_set *set,
 				      struct nlattr **nla, u32 flags)
 {
@@ -7178,22 +7262,26 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			else if (!(nlmsg_flags & NLM_F_EXCL)) {
 				err = 0;
 				if (nft_set_ext_exists(ext2, NFT_SET_EXT_TIMEOUT)) {
+					struct nft_trans_one_elem *update;
+
+					update = &nft_trans_container_elem(trans)->elems[0];
+
 					update_flags = 0;
 					if (timeout != nft_set_ext_timeout(ext2)->timeout) {
-						nft_trans_elem_timeout(trans) = timeout;
+						update->timeout = timeout;
 						if (expiration == 0)
 							expiration = timeout;
 
 						update_flags |= NFT_TRANS_UPD_TIMEOUT;
 					}
 					if (expiration) {
-						nft_trans_elem_expiration(trans) = expiration;
+						update->expiration = expiration;
 						update_flags |= NFT_TRANS_UPD_EXPIRATION;
 					}
 
 					if (update_flags) {
-						nft_trans_elem_priv(trans) = elem_priv;
-						nft_trans_elem_update_flags(trans) = update_flags;
+						update->priv = elem_priv;
+						update->update_flags = update_flags;
 						nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
 						goto err_elem_free;
 					}
@@ -7217,7 +7305,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		}
 	}
 
-	nft_trans_elem_priv(trans) = elem.priv;
+	nft_trans_container_elem(trans)->elems[0].priv = elem.priv;
 	nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
 	return 0;
 
@@ -7355,6 +7443,50 @@ void nft_setelem_data_deactivate(const struct net *net,
 		nft_use_dec(&(*nft_set_ext_obj(ext))->use);
 }
 
+/* similar to nft_trans_elems_remove, but called from abort path to undo newsetelem.
+ * No notifications and no ndeact changes.
+ *
+ * Returns true if set had been added to (i.e., elements need to be removed again).
+ */
+static bool nft_trans_elems_new_abort(const struct nft_ctx *ctx,
+				      const struct nft_trans_elem *te)
+{
+	bool removed = false;
+	int i;
+
+	for (i = 0; i < te->nelems; i++) {
+		if (te->elems[i].update_flags)
+			continue;
+
+		if (!te->set->ops->abort || nft_setelem_is_catchall(te->set, te->elems[i].priv))
+			nft_setelem_remove(ctx->net, te->set, te->elems[i].priv);
+
+		if (!nft_setelem_is_catchall(te->set, te->elems[i].priv))
+			atomic_dec(&te->set->nelems);
+
+		removed = true;
+	}
+
+	return removed;
+}
+
+/* Called from abort path to undo DELSETELEM/DESTROYSETELEM. */
+static void nft_trans_elems_destroy_abort(const struct nft_ctx *ctx,
+					  const struct nft_trans_elem *te)
+{
+	int i;
+
+	for (i = 0; i < te->nelems; i++) {
+		if (!nft_setelem_active_next(ctx->net, te->set, te->elems[i].priv)) {
+			nft_setelem_data_activate(ctx->net, te->set, te->elems[i].priv);
+			nft_setelem_activate(ctx->net, te->set, te->elems[i].priv);
+		}
+
+		if (!nft_setelem_is_catchall(te->set, te->elems[i].priv))
+			te->set->ndeact--;
+	}
+}
+
 static int nft_del_setelem(struct nft_ctx *ctx, struct nft_set *set,
 			   const struct nlattr *attr)
 {
@@ -7434,7 +7566,7 @@ static int nft_del_setelem(struct nft_ctx *ctx, struct nft_set *set,
 
 	nft_setelem_data_deactivate(ctx->net, set, elem.priv);
 
-	nft_trans_elem_priv(trans) = elem.priv;
+	nft_trans_container_elem(trans)->elems[0].priv = elem.priv;
 	nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
 	return 0;
 
@@ -7461,7 +7593,8 @@ static int nft_setelem_flush(const struct nft_ctx *ctx,
 		return 0;
 
 	trans = nft_trans_alloc_gfp(ctx, NFT_MSG_DELSETELEM,
-				    sizeof(struct nft_trans_elem), GFP_ATOMIC);
+				    struct_size_t(struct nft_trans_elem, elems, 1),
+				    GFP_ATOMIC);
 	if (!trans)
 		return -ENOMEM;
 
@@ -7470,7 +7603,8 @@ static int nft_setelem_flush(const struct nft_ctx *ctx,
 
 	nft_setelem_data_deactivate(ctx->net, set, elem_priv);
 	nft_trans_elem_set(trans) = set;
-	nft_trans_elem_priv(trans) = elem_priv;
+	nft_trans_container_elem(trans)->nelems = 1;
+	nft_trans_container_elem(trans)->elems[0].priv = elem_priv;
 	nft_trans_commit_list_add_elem(ctx->net, trans, GFP_ATOMIC);
 
 	return 0;
@@ -7487,7 +7621,7 @@ static int __nft_set_catchall_flush(const struct nft_ctx *ctx,
 		return -ENOMEM;
 
 	nft_setelem_data_deactivate(ctx->net, set, elem_priv);
-	nft_trans_elem_priv(trans) = elem_priv;
+	nft_trans_container_elem(trans)->elems[0].priv = elem_priv;
 	nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
 
 	return 0;
@@ -9666,9 +9800,7 @@ static void nft_commit_release(struct nft_trans *trans)
 		break;
 	case NFT_MSG_DELSETELEM:
 	case NFT_MSG_DESTROYSETELEM:
-		nf_tables_set_elem_destroy(&ctx,
-					   nft_trans_elem_set(trans),
-					   nft_trans_elem_priv(trans));
+		nft_trans_elems_destroy(&ctx, nft_trans_container_elem(trans));
 		break;
 	case NFT_MSG_DELOBJ:
 	case NFT_MSG_DESTROYOBJ:
@@ -10521,25 +10653,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 		case NFT_MSG_NEWSETELEM:
 			te = nft_trans_container_elem(trans);
 
-			if (te->update_flags) {
-				const struct nft_set_ext *ext =
-					nft_set_elem_ext(te->set, te->elem_priv);
+			nft_trans_elems_add(&ctx, te);
 
-				if (te->update_flags & NFT_TRANS_UPD_TIMEOUT) {
-					WRITE_ONCE(nft_set_ext_timeout(ext)->timeout,
-						   te->timeout);
-				}
-				if (te->update_flags & NFT_TRANS_UPD_EXPIRATION) {
-					WRITE_ONCE(nft_set_ext_timeout(ext)->expiration,
-						   get_jiffies_64() + te->expiration);
-				}
-			} else {
-				nft_setelem_activate(net, te->set, te->elem_priv);
-			}
-
-			nf_tables_setelem_notify(&ctx, te->set,
-						 te->elem_priv,
-						 NFT_MSG_NEWSETELEM);
 			if (te->set->ops->commit &&
 			    list_empty(&te->set->pending_update)) {
 				list_add_tail(&te->set->pending_update,
@@ -10551,14 +10666,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 		case NFT_MSG_DESTROYSETELEM:
 			te = nft_trans_container_elem(trans);
 
-			nf_tables_setelem_notify(&ctx, te->set,
-						 te->elem_priv,
-						 trans->msg_type);
-			nft_setelem_remove(net, te->set, te->elem_priv);
-			if (!nft_setelem_is_catchall(te->set, te->elem_priv)) {
-				atomic_dec(&te->set->nelems);
-				te->set->ndeact--;
-			}
+			nft_trans_elems_remove(&ctx, te);
+
 			if (te->set->ops->commit &&
 			    list_empty(&te->set->pending_update)) {
 				list_add_tail(&te->set->pending_update,
@@ -10678,8 +10787,7 @@ static void nf_tables_abort_release(struct nft_trans *trans)
 		nft_set_destroy(&ctx, nft_trans_set(trans));
 		break;
 	case NFT_MSG_NEWSETELEM:
-		nft_set_elem_destroy(nft_trans_elem_set(trans),
-				     nft_trans_elem_priv(trans), true);
+		nft_trans_set_elem_destroy(&ctx, nft_trans_container_elem(trans));
 		break;
 	case NFT_MSG_NEWOBJ:
 		nft_obj_destroy(&ctx, nft_trans_obj(trans));
@@ -10836,18 +10944,15 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_NEWSETELEM:
-			if (nft_trans_elem_update_flags(trans) ||
-			    nft_trans_elem_set_bound(trans)) {
+			if (nft_trans_elem_set_bound(trans)) {
 				nft_trans_destroy(trans);
 				break;
 			}
 			te = nft_trans_container_elem(trans);
-			if (!te->set->ops->abort ||
-			    nft_setelem_is_catchall(te->set, te->elem_priv))
-				nft_setelem_remove(net, te->set, te->elem_priv);
-
-			if (!nft_setelem_is_catchall(te->set, te->elem_priv))
-				atomic_dec(&te->set->nelems);
+			if (!nft_trans_elems_new_abort(&ctx, te)) {
+				nft_trans_destroy(trans);
+				break;
+			}
 
 			if (te->set->ops->abort &&
 			    list_empty(&te->set->pending_update)) {
@@ -10859,12 +10964,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 		case NFT_MSG_DESTROYSETELEM:
 			te = nft_trans_container_elem(trans);
 
-			if (!nft_setelem_active_next(net, te->set, te->elem_priv)) {
-				nft_setelem_data_activate(net, te->set, te->elem_priv);
-				nft_setelem_activate(net, te->set, te->elem_priv);
-			}
-			if (!nft_setelem_is_catchall(te->set, te->elem_priv))
-				te->set->ndeact--;
+			nft_trans_elems_destroy_abort(&ctx, te);
 
 			if (te->set->ops->abort &&
 			    list_empty(&te->set->pending_update)) {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH nf-next v3 3/5] netfiler: nf_tables: preemitve fix for audit failure
  2024-10-16 13:19 [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Florian Westphal
  2024-10-16 13:19 ` [PATCH nf-next v3 1/5] netfilter: nf_tables: add nft_trans_commit_list_add_elem helper Florian Westphal
  2024-10-16 13:19 ` [PATCH nf-next v3 2/5] netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure Florian Westphal
@ 2024-10-16 13:19 ` Florian Westphal
  2024-10-16 13:19 ` [PATCH nf-next v3 4/5] netfilter: nf_tables: switch trans_elem to real flex array Florian Westphal
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2024-10-16 13:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Nftables audit log unfortunately leaks implementation detail, the
transaction log size, to userspace.

Without this, nft_audit.sh selftest fails once subsequenct NEW/DELELEM
transactions can be compressed.

Thus increment the audit counter by the number of elements to keep
the output identical.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v3: this patch is new to prevent nft_audit.sh from breaking
 after next patch.

 net/netfilter/nf_tables_api.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d88adbd96ebe..bcb069057a53 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -10373,9 +10373,26 @@ static void nf_tables_commit_audit_free(struct list_head *adl)
 	}
 }
 
+/* Silly, but existing test audit test cases require a count
+ * value derived from the (INTERNAL!) transaction log length.
+ *
+ * Thus, compaction of NEW/DELSETELEM breaks such tests.
+ */
+static unsigned int nf_tables_commit_audit_entrycount(const struct nft_trans *trans)
+{
+	switch (trans->msg_type) {
+	case NFT_MSG_NEWSETELEM:
+	case NFT_MSG_DELSETELEM:
+		return nft_trans_container_elem(trans)->nelems;
+	}
+
+	return 1;
+}
+
 static void nf_tables_commit_audit_collect(struct list_head *adl,
-					   struct nft_table *table, u32 op)
+					   const struct nft_trans *trans, u32 op)
 {
+	const struct nft_table *table = trans->table;
 	struct nft_audit_data *adp;
 
 	list_for_each_entry(adp, adl, list) {
@@ -10385,7 +10402,7 @@ static void nf_tables_commit_audit_collect(struct list_head *adl,
 	WARN_ONCE(1, "table=%s not expected in commit list", table->name);
 	return;
 found:
-	adp->entries++;
+	adp->entries += nf_tables_commit_audit_entrycount(trans);
 	if (!adp->op || adp->op > op)
 		adp->op = op;
 }
@@ -10544,7 +10561,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 
 		nft_ctx_update(&ctx, trans);
 
-		nf_tables_commit_audit_collect(&adl, table, trans->msg_type);
+		nf_tables_commit_audit_collect(&adl, trans, trans->msg_type);
 		switch (trans->msg_type) {
 		case NFT_MSG_NEWTABLE:
 			if (nft_trans_table_update(trans)) {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH nf-next v3 4/5] netfilter: nf_tables: switch trans_elem to real flex array
  2024-10-16 13:19 [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Florian Westphal
                   ` (2 preceding siblings ...)
  2024-10-16 13:19 ` [PATCH nf-next v3 3/5] netfiler: nf_tables: preemitve fix for audit failure Florian Westphal
@ 2024-10-16 13:19 ` Florian Westphal
  2024-10-16 13:19 ` [PATCH nf-next v3 5/5] netfilter: nf_tables: allocate element update information dynamically Florian Westphal
  2024-10-16 14:36 ` [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Pablo Neira Ayuso
  5 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2024-10-16 13:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

When queueing a set element add or removal operation to the transaction
log, check if the previous operation already asks for a the identical
operation on the same set.

If so, store the element reference in the preceding operation.
This significantlty reduces memory consumption when many set add/delete
operations appear in a single transaction.

Example: 10k elements require 937kb of memory (10k allocations from
kmalloc-96 slab).

Assuming we can compact 4 elements in the same set, 468 kbytes
are needed (64 bytes for base struct, nft_trans_elemn, 32 bytes
for nft_trans_one_elem structure, so 2500 allocations from kmalloc-192
slab).

For large batch updates we can compact up to 62 elements
into one single nft_trans_elem structure (~65% mem reduction):
(64 bytes for base struct, nft_trans_elem, 32 byte for nft_trans_one_elem
 struct).

We can halve size of nft_trans_one_elem struct by moving
timeout/expire/update_flags into a dynamically allocated structure,
this allows to store 124 elements in a 2k slab nft_trans_elem struct.
This is done in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 no changes in this version.

 net/netfilter/nf_tables_api.c | 71 +++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index bcb069057a53..5e038051c112 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -25,6 +25,7 @@
 
 #define NFT_MODULE_AUTOLOAD_LIMIT (MODULE_NAME_LEN - sizeof("nft-expr-255-"))
 #define NFT_SET_MAX_ANONLEN 16
+#define NFT_MAX_SET_NELEMS ((2048 - sizeof(struct nft_trans_elem)) / sizeof(struct nft_trans_one_elem))
 
 unsigned int nf_tables_net_id __read_mostly;
 
@@ -391,6 +392,69 @@ static void nf_tables_unregister_hook(struct net *net,
 	return __nf_tables_unregister_hook(net, table, chain, false);
 }
 
+static bool nft_trans_collapse_set_elem_allowed(const struct nft_trans_elem *a, const struct nft_trans_elem *b)
+{
+	return a->set == b->set && a->bound == b->bound && a->nelems < NFT_MAX_SET_NELEMS;
+}
+
+static bool nft_trans_collapse_set_elem(struct nftables_pernet *nft_net,
+					struct nft_trans_elem *tail,
+					struct nft_trans_elem *trans,
+					gfp_t gfp)
+{
+	unsigned int nelems, old_nelems = tail->nelems;
+	struct nft_trans_elem *new_trans;
+
+	if (!nft_trans_collapse_set_elem_allowed(tail, trans))
+		return false;
+
+	if (WARN_ON_ONCE(trans->nelems != 1))
+		return false;
+
+	if (check_add_overflow(old_nelems, trans->nelems, &nelems))
+		return false;
+
+	/* krealloc might free tail which invalidates list pointers */
+	list_del_init(&tail->nft_trans.list);
+
+	new_trans = krealloc(tail, struct_size(tail, elems, nelems), gfp);
+	if (!new_trans) {
+		list_add_tail(&tail->nft_trans.list, &nft_net->commit_list);
+		return false;
+	}
+
+	INIT_LIST_HEAD(&new_trans->nft_trans.list);
+	new_trans->nelems = nelems;
+	new_trans->elems[old_nelems] = trans->elems[0];
+	list_add_tail(&new_trans->nft_trans.list, &nft_net->commit_list);
+
+	return true;
+}
+
+static bool nft_trans_try_collapse(struct nftables_pernet *nft_net,
+				   struct nft_trans *trans, gfp_t gfp)
+{
+	struct nft_trans *tail;
+
+	if (list_empty(&nft_net->commit_list))
+		return false;
+
+	tail = list_last_entry(&nft_net->commit_list, struct nft_trans, list);
+
+	if (tail->msg_type != trans->msg_type)
+		return false;
+
+	switch (trans->msg_type) {
+	case NFT_MSG_NEWSETELEM:
+	case NFT_MSG_DELSETELEM:
+		return nft_trans_collapse_set_elem(nft_net,
+						   nft_trans_container_elem(tail),
+						   nft_trans_container_elem(trans), gfp);
+	}
+
+	return false;
+}
+
 static void nft_trans_commit_list_add_tail(struct net *net, struct nft_trans *trans)
 {
 	struct nftables_pernet *nft_net = nft_pernet(net);
@@ -424,11 +488,18 @@ static void nft_trans_commit_list_add_tail(struct net *net, struct nft_trans *tr
 static void nft_trans_commit_list_add_elem(struct net *net, struct nft_trans *trans,
 					   gfp_t gfp)
 {
+	struct nftables_pernet *nft_net = nft_pernet(net);
+
 	WARN_ON_ONCE(trans->msg_type != NFT_MSG_NEWSETELEM &&
 		     trans->msg_type != NFT_MSG_DELSETELEM);
 
 	might_alloc(gfp);
 
+	if (nft_trans_try_collapse(nft_net, trans, gfp)) {
+		kfree(trans);
+		return;
+	}
+
 	nft_trans_commit_list_add_tail(net, trans);
 }
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH nf-next v3 5/5] netfilter: nf_tables: allocate element update information dynamically
  2024-10-16 13:19 [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Florian Westphal
                   ` (3 preceding siblings ...)
  2024-10-16 13:19 ` [PATCH nf-next v3 4/5] netfilter: nf_tables: switch trans_elem to real flex array Florian Westphal
@ 2024-10-16 13:19 ` Florian Westphal
  2024-10-16 14:36 ` [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Pablo Neira Ayuso
  5 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2024-10-16 13:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Move the timeout/expire/flag members from nft_trans_one_elem struct into
a dybamically allocated structure, only needed when timeout update was
requested.

This halves size of nft_trans_one_elem struct and allows to compact up to
124 elements in one transaction container rather than 62.

This halves memory requirements for a large flush or insert transaction,
where ->update remains NULL.

Care has to be taken to release the extra data in all spots, including
abort path.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v3: adjust for changes in earlier patch, nft_trans_elems_add() can
 now kfree() the update pointer unconditionally.

 include/net/netfilter/nf_tables.h | 10 ++++--
 net/netfilter/nf_tables_api.c     | 57 +++++++++++++++++++------------
 2 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 2a2631edab2b..1caf142900fe 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1754,11 +1754,15 @@ enum nft_trans_elem_flags {
 	NFT_TRANS_UPD_EXPIRATION	= (1 << 1),
 };
 
-struct nft_trans_one_elem {
-	struct nft_elem_priv		*priv;
+struct nft_elem_update {
 	u64				timeout;
 	u64				expiration;
-	u8				update_flags;
+	u8				flags;
+};
+
+struct nft_trans_one_elem {
+	struct nft_elem_priv		*priv;
+	struct nft_elem_update		*update;
 };
 
 struct nft_trans_elem {
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 5e038051c112..8662df68b03f 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6665,7 +6665,8 @@ static void nft_trans_set_elem_destroy(const struct nft_ctx *ctx, struct nft_tra
 	int i;
 
 	for (i = 0; i < te->nelems; i++) {
-		if (te->elems[i].update_flags)
+		/* skip update request, see nft_trans_elems_new_abort() */
+		if (!te->elems[i].priv)
 			continue;
 
 		__nft_set_elem_destroy(ctx, te->set, te->elems[i].priv, true);
@@ -6856,12 +6857,13 @@ static void nft_trans_elem_update(const struct nft_set *set,
 				  const struct nft_trans_one_elem *elem)
 {
 	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
+	const struct nft_elem_update *update = elem->update;
 
-	if (elem->update_flags & NFT_TRANS_UPD_TIMEOUT)
-		WRITE_ONCE(nft_set_ext_timeout(ext)->timeout, elem->timeout);
+	if (update->flags & NFT_TRANS_UPD_TIMEOUT)
+		WRITE_ONCE(nft_set_ext_timeout(ext)->timeout, update->timeout);
 
-	if (elem->update_flags & NFT_TRANS_UPD_EXPIRATION)
-		WRITE_ONCE(nft_set_ext_timeout(ext)->expiration, get_jiffies_64() + elem->expiration);
+	if (update->flags & NFT_TRANS_UPD_EXPIRATION)
+		WRITE_ONCE(nft_set_ext_timeout(ext)->expiration, get_jiffies_64() + update->expiration);
 }
 
 static void nft_trans_elems_add(const struct nft_ctx *ctx,
@@ -6870,15 +6872,16 @@ static void nft_trans_elems_add(const struct nft_ctx *ctx,
 	int i;
 
 	for (i = 0; i < te->nelems; i++) {
-		const struct nft_trans_one_elem *elem = &te->elems[i];
+		struct nft_trans_one_elem *elem = &te->elems[i];
 
-		if (elem->update_flags)
+		if (elem->update)
 			nft_trans_elem_update(te->set, elem);
 		else
 			nft_setelem_activate(ctx->net, te->set, elem->priv);
 
 		nf_tables_setelem_notify(ctx, te->set, elem->priv,
 					 NFT_MSG_NEWSETELEM);
+		kfree(elem->update);
 	}
 }
 
@@ -6970,6 +6973,8 @@ static void nft_trans_elems_remove(const struct nft_ctx *ctx,
 	int i;
 
 	for (i = 0; i < te->nelems; i++) {
+		WARN_ON_ONCE(te->elems[i].update);
+
 		nf_tables_setelem_notify(ctx, te->set,
 					 te->elems[i].priv,
 					 te->nft_trans.msg_type);
@@ -7018,7 +7023,6 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 	struct nft_data_desc desc;
 	enum nft_registers dreg;
 	struct nft_trans *trans;
-	u8 update_flags;
 	u64 expiration;
 	u64 timeout;
 	int err, i;
@@ -7333,26 +7337,32 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 			else if (!(nlmsg_flags & NLM_F_EXCL)) {
 				err = 0;
 				if (nft_set_ext_exists(ext2, NFT_SET_EXT_TIMEOUT)) {
-					struct nft_trans_one_elem *update;
-
-					update = &nft_trans_container_elem(trans)->elems[0];
+					struct nft_elem_update update = { };
 
-					update_flags = 0;
 					if (timeout != nft_set_ext_timeout(ext2)->timeout) {
-						update->timeout = timeout;
+						update.timeout = timeout;
 						if (expiration == 0)
 							expiration = timeout;
 
-						update_flags |= NFT_TRANS_UPD_TIMEOUT;
+						update.flags |= NFT_TRANS_UPD_TIMEOUT;
 					}
 					if (expiration) {
-						update->expiration = expiration;
-						update_flags |= NFT_TRANS_UPD_EXPIRATION;
+						update.expiration = expiration;
+						update.flags |= NFT_TRANS_UPD_EXPIRATION;
 					}
 
-					if (update_flags) {
-						update->priv = elem_priv;
-						update->update_flags = update_flags;
+					if (update.flags) {
+						struct nft_trans_one_elem *ue;
+
+						ue = &nft_trans_container_elem(trans)->elems[0];
+
+						ue->update = kmemdup(&update, sizeof(update), GFP_KERNEL);
+						if (!ue->update) {
+							err = -ENOMEM;
+							goto err_element_clash;
+						}
+
+						ue->priv = elem_priv;
 						nft_trans_commit_list_add_elem(ctx->net, trans, GFP_KERNEL);
 						goto err_elem_free;
 					}
@@ -7520,14 +7530,19 @@ void nft_setelem_data_deactivate(const struct net *net,
  * Returns true if set had been added to (i.e., elements need to be removed again).
  */
 static bool nft_trans_elems_new_abort(const struct nft_ctx *ctx,
-				      const struct nft_trans_elem *te)
+				      struct nft_trans_elem *te)
 {
 	bool removed = false;
 	int i;
 
 	for (i = 0; i < te->nelems; i++) {
-		if (te->elems[i].update_flags)
+		if (te->elems[i].update) {
+			kfree(te->elems[i].update);
+			te->elems[i].update = NULL;
+			/* Update request, so do not release this element */
+			te->elems[i].priv = NULL;
 			continue;
+		}
 
 		if (!te->set->ops->abort || nft_setelem_is_catchall(te->set, te->elems[i].priv))
 			nft_setelem_remove(ctx->net, te->set, te->elems[i].priv);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size
  2024-10-16 13:19 [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Florian Westphal
                   ` (4 preceding siblings ...)
  2024-10-16 13:19 ` [PATCH nf-next v3 5/5] netfilter: nf_tables: allocate element update information dynamically Florian Westphal
@ 2024-10-16 14:36 ` Pablo Neira Ayuso
  2024-10-16 16:10   ` Florian Westphal
  5 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-16 14:36 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Wed, Oct 16, 2024 at 03:19:07PM +0200, Florian Westphal wrote:
> v3:
> I failed to realize that nft_audit leaks one implementation detail
> to userspace: the length of the transaction log.
> 
> This is bad, but I do not know if we can change things to make
> nft_audit NOT do that.  Hence add a new workaround patch that
> inflates the length based on the number of set elements in the
> container structure.

It actually shows the number of entries that have been updated, right?

Before this series, there was a 1:1 mapping between transaction and
objects so it was easier to infer it from the number of transaction
objects.

> Also fix up notifications, for update case, notifications were
> skipped but currently newsetelem notifications are done even if
> existing set element is updated.
> 
> Most patches are unchanged.
> "prefer nft_trans_elem_alloc helper" is already upstreamed so
> its dropped from this batch.
> 
> 
> v2: only change is in patch 3, and by extension, the last one:
> During transaction abort, we need to handle an aggregate container to
> contain both new set elements and updates.  The latter must be
> skipped, else we remove element that already existed at start of the
> transaction.
> 
> original cover letter:
> 
> When doing a flush on a set or mass adding/removing elements from a
> set, each element needs to allocate 96 bytes to hold the transactional
> state.
> 
> In such cases, virtually all the information in struct nft_trans_elem
> is the same.
> 
> Change nft_trans_elem to a flex-array, i.e. a single nft_trans_elem
> can hold multiple set element pointers.
> 
> The number of elements that can be stored in one nft_trans_elem is limited
> by the slab allocator, this series limits the compaction to at most 62
> elements as it caps the reallocation to 2048 bytes of memory.
> 
> 
> 
> Florian Westphal (5):
>   netfilter: nf_tables: add nft_trans_commit_list_add_elem helper
>   netfilter: nf_tables: prepare for multiple elements in nft_trans_elem
>     structure
>   netfiler: nf_tables: preemitve fix for audit failure
>   netfilter: nf_tables: switch trans_elem to real flex array
>   netfilter: nf_tables: allocate element update information dynamically
> 
>  include/net/netfilter/nf_tables.h |  25 +-
>  net/netfilter/nf_tables_api.c     | 368 +++++++++++++++++++++++-------
>  2 files changed, 304 insertions(+), 89 deletions(-)
> 
> -- 
> 2.45.2
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size
  2024-10-16 14:36 ` [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Pablo Neira Ayuso
@ 2024-10-16 16:10   ` Florian Westphal
  2024-10-17 16:23     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2024-10-16 16:10 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > This is bad, but I do not know if we can change things to make
> > nft_audit NOT do that.  Hence add a new workaround patch that
> > inflates the length based on the number of set elements in the
> > container structure.
> 
> It actually shows the number of entries that have been updated, right?
> 
> Before this series, there was a 1:1 mapping between transaction and
> objects so it was easier to infer it from the number of transaction
> objects.

Yes, but... for element add (but not create), we used to not do anything
(no-op), so we did not allocate a new transaction and pretend request
did not exist.

Now we can enter update path, so we do allocate a transaction, hence,
audit record changes.

What if we add an internal special-case 'flush' op in the future?
It will break, and the workaround added in this series needs to be
extended.

Same for an other change that could elide a transaction request, or,
add expand something to multiple ones (as flush currently does).

Its doesn't *break* audit, but it changes the output.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size
  2024-10-16 16:10   ` Florian Westphal
@ 2024-10-17 16:23     ` Pablo Neira Ayuso
  2024-10-17 19:33       ` Paul Moore
  0 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-17 16:23 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, paul, rgb, audit

Cc'ing audit ML.

On Wed, Oct 16, 2024 at 06:10:44PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > This is bad, but I do not know if we can change things to make
> > > nft_audit NOT do that.  Hence add a new workaround patch that
> > > inflates the length based on the number of set elements in the
> > > container structure.
> > 
> > It actually shows the number of entries that have been updated, right?
> > 
> > Before this series, there was a 1:1 mapping between transaction and
> > objects so it was easier to infer it from the number of transaction
> > objects.
> 
> Yes, but... for element add (but not create), we used to not do anything
> (no-op), so we did not allocate a new transaction and pretend request
> did not exist.

You refer to element updates above, those used to be elided, yes. Now
they are shown. I think that is correct.

> Now we can enter update path, so we do allocate a transaction, hence,
> audit record changes.
>
> What if we add an internal special-case 'flush' op in the future?

You mean, if 'flush' does not get expanded to one delete transaction
for each element. Yes, that would require to look at ->nelems as in
this patch.

> It will break, and the workaround added in this series needs to be
> extended.
> 
> Same for an other change that could elide a transaction request, or,
> add expand something to multiple ones (as flush currently does).
> 
> Its doesn't *break* audit, but it changes the output.

My understanding is that audit is exposing the entries that have been
added/updated/deleted, which is already sparse: Note that nftables
audit works at 'table' granurality.

IIRC, one of the audit maintainers mentioned it should be possible to
add chain and set to the audit logs in the future, such change would
necessarily change the audit logs output.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size
  2024-10-17 16:23     ` Pablo Neira Ayuso
@ 2024-10-17 19:33       ` Paul Moore
  2024-10-17 22:53         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Moore @ 2024-10-17 19:33 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, rgb, audit

On Thu, Oct 17, 2024 at 12:24 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> Cc'ing audit ML.

Thank you.

> On Wed, Oct 16, 2024 at 06:10:44PM +0200, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > This is bad, but I do not know if we can change things to make
> > > > nft_audit NOT do that.  Hence add a new workaround patch that
> > > > inflates the length based on the number of set elements in the
> > > > container structure.
> > >
> > > It actually shows the number of entries that have been updated, right?
> > >
> > > Before this series, there was a 1:1 mapping between transaction and
> > > objects so it was easier to infer it from the number of transaction
> > > objects.
> >
> > Yes, but... for element add (but not create), we used to not do anything
> > (no-op), so we did not allocate a new transaction and pretend request
> > did not exist.
>
> You refer to element updates above, those used to be elided, yes. Now
> they are shown. I think that is correct.
>
> > Now we can enter update path, so we do allocate a transaction, hence,
> > audit record changes.
> >
> > What if we add an internal special-case 'flush' op in the future?
>
> You mean, if 'flush' does not get expanded to one delete transaction
> for each element. Yes, that would require to look at ->nelems as in
> this patch.
>
> > It will break, and the workaround added in this series needs to be
> > extended.
> >
> > Same for an other change that could elide a transaction request, or,
> > add expand something to multiple ones (as flush currently does).
> >
> > Its doesn't *break* audit, but it changes the output.
>
> My understanding is that audit is exposing the entries that have been
> added/updated/deleted, which is already sparse: Note that nftables
> audit works at 'table' granurality.
>
> IIRC, one of the audit maintainers mentioned it should be possible to
> add chain and set to the audit logs in the future, such change would
> necessarily change the audit logs output.

For those of us joining the conversation late, can you provide a quick
summary of what you want to change in audit and why?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size
  2024-10-17 19:33       ` Paul Moore
@ 2024-10-17 22:53         ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2024-10-17 22:53 UTC (permalink / raw)
  To: Paul Moore; +Cc: Florian Westphal, netfilter-devel, rgb, audit

On Thu, Oct 17, 2024 at 03:33:14PM -0400, Paul Moore wrote:
> > On Wed, Oct 16, 2024 at 06:10:44PM +0200, Florian Westphal wrote:
> For those of us joining the conversation late, can you provide a quick
> summary of what you want to change in audit and why?

Florian said:

"I failed to realize that nft_audit leaks one implementation detail
to userspace: the length of the transaction log."

        table=t1 family=2 entries=4 op=nft_register_set

He is referring to the 'entries' key.

So far, every object gets one transaction, but now batching several
objects in one transaction is possible.

We have been discussing what the expected semantics for this audit log
key is:

- If this is the transaction log length, then the internal update of the
  transaction logic results in a smaller number of 'entries' in the
  audit log.
- If 'entries' refers to the number of "affected" objects by this
  operation, then this means we have to carry a "workaround" in
  the kernel.

This is because:

1) Element updates are now supported, this currently handles it as a
   _REGISTER operation according to enum audit_nfcfgop. This changed
   the semantics of the add command, now it is "add if it does not exist,
   otherwise update what it already exists". Before, updates where simply
   elided (not counted by 'entries' key) because they were not supported.
   That is, 'entries' now tell how many set element has been added OR
   updated. I think this is fine, this is consistent with chain updates
   where 'entries' also report added OR updated chains. The difference
   is that old kernel do not count updates (because they are elided).

2) There is ongoing work to add more internal transaction batching, ie.
   use one single transaction for several elements. This requires a
   special case to bump the 'entries' key to add the elements that the
   transaction batch contains, see:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20241016131917.17193-4-fw@strlen.de/

   There is a nft_audit.sh selftest and Florian thinks this is a
   "workaround" patch only to make this test happy, because 'entries'
   refers to the transaction log length (which is now smaller given the
   internal transaction batching approach to accumulate several elements
   is used).

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-10-17 22:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-16 13:19 [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Florian Westphal
2024-10-16 13:19 ` [PATCH nf-next v3 1/5] netfilter: nf_tables: add nft_trans_commit_list_add_elem helper Florian Westphal
2024-10-16 13:19 ` [PATCH nf-next v3 2/5] netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure Florian Westphal
2024-10-16 13:19 ` [PATCH nf-next v3 3/5] netfiler: nf_tables: preemitve fix for audit failure Florian Westphal
2024-10-16 13:19 ` [PATCH nf-next v3 4/5] netfilter: nf_tables: switch trans_elem to real flex array Florian Westphal
2024-10-16 13:19 ` [PATCH nf-next v3 5/5] netfilter: nf_tables: allocate element update information dynamically Florian Westphal
2024-10-16 14:36 ` [PATCH nf-next v3 0/5] netfilter: nf_tables: reduce set element transaction size Pablo Neira Ayuso
2024-10-16 16:10   ` Florian Westphal
2024-10-17 16:23     ` Pablo Neira Ayuso
2024-10-17 19:33       ` Paul Moore
2024-10-17 22:53         ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).