All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Bommarito <michael.bommarito@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Kees Cook <kees@kernel.org>, Feng Yang <yangfeng@kylinos.cn>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Bommarito <michael.bommarito@gmail.com>
Subject: [PATCH net-next v2] netlink: clean up failed initial dump-start state
Date: Thu, 23 Apr 2026 17:28:27 -0400	[thread overview]
Message-ID: <20260423212827.1177552-1-michael.bommarito@gmail.com> (raw)
In-Reply-To: <20260420162734.854587-1-michael.bommarito@gmail.com>

__netlink_dump_start() installs cb->skb, takes the module reference,
and sets cb_running before calling netlink_dump(sk, true). If that
first call returns via errout_skb the callback state is left behind:
cb_running stays set, module_put() and consume_skb(cb->skb) are
deferred until recvmsg() drives the dump back through the success
path, or netlink_release() on close runs the catch-all cleanup. On
sustained alloc failure neither fires.

Factor the teardown into netlink_dump_cleanup(nlk, drop) shared by
the dump success path, the lock_taken=true errout_skb path, and
netlink_release(). The @drop flag preserves the existing split:
consume_skb() on normal completion, kfree_skb() on abort.

Validation on a UML guest: an unprivileged task opens NETLINK_ROUTE,
preloads sk_rmem_alloc, then issues RTM_GETLINK | NLM_F_DUMP. Stock
kernel leaves cb_running stuck at 1 until recvmsg() or close()
drives it. Patched kernel clears cb_running immediately on the
lock_taken=true failure; the recvmsg continuation path is unchanged.

At scale: 3500 wedged sockets in a 256M guest show about 3.8-3.9
MiB of extra unreclaimable slab (~1.1 KiB/sock) on stock vs zero on
patched. RLIMIT_NOFILE bounds the test before OOM, so this is a
local availability cleanup rather than an exhaustion primitive.

Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
v2 (per Jakub's review <20260420103715.347fbd4a@kernel.org>):
  * commit message names both paths that do clear the state
    (recvmsg-driven retry on drain, netlink_release() on close)
    and notes that neither fires on sustained alloc failure
  * moved the UML validation into the commit message
  * extracted netlink_dump_cleanup(nlk, bool drop); shared with
    netlink_release() and the success path. The bool preserves
    the existing kfree_skb / consume_skb split.

v1: https://lore.kernel.org/netdev/20260420162734.854587-1-michael.bommarito@gmail.com/

 net/netlink/af_netlink.c | 47 ++++++++++++++++++++++++++--------------
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 4d609d5cf406..ab21a6218631 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -131,6 +131,7 @@ static const char *const nlk_cb_mutex_key_strings[MAX_LINKS + 1] = {
 };
 
 static int netlink_dump(struct sock *sk, bool lock_taken);
+static void netlink_dump_cleanup(struct netlink_sock *nlk, bool drop);
 
 /* nl_table locking explained:
  * Lookup and traversal are protected with an RCU read-side lock. Insertion
@@ -763,13 +764,8 @@ static int netlink_release(struct socket *sock)
 	}
 
 	/* Terminate any outstanding dump */
-	if (nlk->cb_running) {
-		if (nlk->cb.done)
-			nlk->cb.done(&nlk->cb);
-		module_put(nlk->cb.module);
-		kfree_skb(nlk->cb.skb);
-		WRITE_ONCE(nlk->cb_running, false);
-	}
+	if (nlk->cb_running)
+		netlink_dump_cleanup(nlk, true);
 
 	module_put(nlk->module);
 
@@ -2250,6 +2246,26 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
 	return 0;
 }
 
+/* Must be called with nl_cb_mutex NOT held. @drop=true frees the skb
+ * via kfree_skb() so drop-monitor sees the teardown; @drop=false uses
+ * consume_skb() for the normal-completion path.
+ */
+static void netlink_dump_cleanup(struct netlink_sock *nlk, bool drop)
+{
+	struct module *module = nlk->cb.module;
+	struct sk_buff *skb = nlk->cb.skb;
+
+	if (nlk->cb.done)
+		nlk->cb.done(&nlk->cb);
+
+	WRITE_ONCE(nlk->cb_running, false);
+	module_put(module);
+	if (drop)
+		kfree_skb(skb);
+	else
+		consume_skb(skb);
+}
+
 static int netlink_dump(struct sock *sk, bool lock_taken)
 {
 	struct netlink_sock *nlk = nlk_sk(sk);
@@ -2258,7 +2274,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 	struct sk_buff *skb = NULL;
 	unsigned int rmem, rcvbuf;
 	size_t max_recvmsg_len;
-	struct module *module;
 	int err = -ENOBUFS;
 	int alloc_min_size;
 	int alloc_size;
@@ -2366,19 +2381,19 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 	else
 		__netlink_sendskb(sk, skb);
 
-	if (cb->done)
-		cb->done(cb);
-
-	WRITE_ONCE(nlk->cb_running, false);
-	module = cb->module;
-	skb = cb->skb;
 	mutex_unlock(&nlk->nl_cb_mutex);
-	module_put(module);
-	consume_skb(skb);
+	netlink_dump_cleanup(nlk, false);
 	return 0;
 
 errout_skb:
 	mutex_unlock(&nlk->nl_cb_mutex);
+	/* The recvmsg() retry path (lock_taken=false) keeps cb_running so
+	 * the next recvmsg() can drive the dump forward once receive room
+	 * is available; only the initial __netlink_dump_start() failure
+	 * owns the teardown.
+	 */
+	if (lock_taken)
+		netlink_dump_cleanup(nlk, true);
 	kfree_skb(skb);
 	return err;
 }
-- 
2.53.0


  parent reply	other threads:[~2026-04-23 21:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-20 16:27 [PATCH net-next] netlink: clean up failed initial dump-start state Michael Bommarito
2026-04-20 17:37 ` Jakub Kicinski
2026-04-20 17:56   ` Michael Bommarito
2026-04-23 21:28 ` Michael Bommarito [this message]
2026-04-24  1:50   ` [PATCH net-next v2] " Jakub Kicinski
2026-04-24 11:48     ` Michael Bommarito
2026-04-24  7:36   ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260423212827.1177552-1-michael.bommarito@gmail.com \
    --to=michael.bommarito@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=yangfeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.