All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Bommarito <michael.bommarito@gmail.com>
To: "David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org
Cc: Simon Horman <horms@kernel.org>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Kees Cook <kees@kernel.org>, Feng Yang <yangfeng@kylinos.cn>,
	linux-kernel@vger.kernel.org
Subject: [PATCH net-next] netlink: clean up failed initial dump-start state
Date: Mon, 20 Apr 2026 12:27:34 -0400	[thread overview]
Message-ID: <20260420162734.854587-1-michael.bommarito@gmail.com> (raw)

When __netlink_dump_start() has already installed cb->skb, taken the
module reference and set cb_running, a failure from the first
netlink_dump(sk, true) call returns via errout_skb without unwinding the
callback lifetime. That leaves cb_running set and defers module_put()
and consume_skb(cb->skb) until userspace drains the socket or closes it.

Share the normal callback teardown in a helper and use it on successful
completion and on the initial lock_taken=true failure path. Keep the
lock_taken=false continuation path unchanged, because recvmsg()-driven
retries legitimately preserve cb_running when they run out of receive
room.

Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
Validation inside a UML guest on current mainline:

  - An unprivileged local task (uid=65534, no CAP_NET_ADMIN) opens a
    plain NETLINK_ROUTE socket, preloads sk_rmem_alloc with echoed
    NLMSG_ERROR replies from an unsupported rtnetlink type, then issues
    RTM_GETLINK | NLM_F_DUMP | NLM_F_ACK.
  - Stock kernel: the initial __netlink_dump_start() hits the rmem gate
    and returns via errout_skb with cb_running stuck at 1 until
    recvmsg() or close() drives forward progress.
  - Patched kernel: the same probe leaves cb_running clear immediately
    on the lock_taken=true failure, and the larger-rcvbuf continuation
    path (legitimate dump in progress) is unchanged.

A scaling pass on 3500 such wedged sockets in a 256M UML guest shows
about 3.8-3.9 MiB of extra unreclaimable slab (/proc/meminfo
SUnreclaim) beyond the visible queued rmem on the vulnerable kernel,
roughly 1.1 KiB/socket. Real accumulation, but the test hits
RLIMIT_NOFILE long before the guest approaches OOM, so this still
looks like a local availability cleanup rather than an exhaustion
primitive.

No Cc: stable@ on the theory that the bug self-heals on
recvmsg()/close and the accumulation is mild. Happy to add it and
route to net if you'd rather see it backported.

 net/netlink/af_netlink.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 4d609d5cf406..7019c17e6879 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2250,6 +2250,20 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
 	return 0;
 }
 
+static void netlink_dump_cleanup(struct netlink_sock *nlk)
+{
+	struct module *module = nlk->cb.module;
+	struct sk_buff *skb = nlk->cb.skb;
+
+	if (nlk->cb.done)
+		nlk->cb.done(&nlk->cb);
+
+	WRITE_ONCE(nlk->cb_running, false);
+	mutex_unlock(&nlk->nl_cb_mutex);
+	module_put(module);
+	consume_skb(skb);
+}
+
 static int netlink_dump(struct sock *sk, bool lock_taken)
 {
 	struct netlink_sock *nlk = nlk_sk(sk);
@@ -2258,7 +2272,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 	struct sk_buff *skb = NULL;
 	unsigned int rmem, rcvbuf;
 	size_t max_recvmsg_len;
-	struct module *module;
 	int err = -ENOBUFS;
 	int alloc_min_size;
 	int alloc_size;
@@ -2366,19 +2379,14 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 	else
 		__netlink_sendskb(sk, skb);
 
-	if (cb->done)
-		cb->done(cb);
-
-	WRITE_ONCE(nlk->cb_running, false);
-	module = cb->module;
-	skb = cb->skb;
-	mutex_unlock(&nlk->nl_cb_mutex);
-	module_put(module);
-	consume_skb(skb);
+	netlink_dump_cleanup(nlk);
 	return 0;
 
 errout_skb:
-	mutex_unlock(&nlk->nl_cb_mutex);
+	if (lock_taken)
+		netlink_dump_cleanup(nlk);
+	else
+		mutex_unlock(&nlk->nl_cb_mutex);
 	kfree_skb(skb);
 	return err;
 }
-- 
2.53.0


             reply	other threads:[~2026-04-20 16:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-20 16:27 Michael Bommarito [this message]
2026-04-20 17:37 ` [PATCH net-next] netlink: clean up failed initial dump-start state Jakub Kicinski
2026-04-20 17:56   ` Michael Bommarito
2026-04-23 21:28 ` [PATCH net-next v2] " Michael Bommarito
2026-04-24  1:50   ` Jakub Kicinski
2026-04-24 11:48     ` Michael Bommarito
2026-04-24  7:36   ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260420162734.854587-1-michael.bommarito@gmail.com \
    --to=michael.bommarito@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=yangfeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.