* [PATCH net-next] netlink: clean up failed initial dump-start state
@ 2026-04-20 16:27 Michael Bommarito
2026-04-20 17:37 ` Jakub Kicinski
0 siblings, 1 reply; 3+ messages in thread
From: Michael Bommarito @ 2026-04-20 16:27 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev
Cc: Simon Horman, Kuniyuki Iwashima, Kees Cook, Feng Yang,
linux-kernel
When __netlink_dump_start() has already installed cb->skb, taken the
module reference and set cb_running, a failure from the first
netlink_dump(sk, true) call returns via errout_skb without unwinding the
callback lifetime. That leaves cb_running set and defers module_put()
and consume_skb(cb->skb) until userspace drains the socket or closes it.
Share the normal callback teardown in a helper and use it on successful
completion and on the initial lock_taken=true failure path. Keep the
lock_taken=false continuation path unchanged, because recvmsg()-driven
retries legitimately preserve cb_running when they run out of receive
room.
Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Assisted-by: Claude:claude-opus-4-6
Assisted-by: Codex:gpt-5-4
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
Validation inside a UML guest on current mainline:
- An unprivileged local task (uid=65534, no CAP_NET_ADMIN) opens a
plain NETLINK_ROUTE socket, preloads sk_rmem_alloc with echoed
NLMSG_ERROR replies from an unsupported rtnetlink type, then issues
RTM_GETLINK | NLM_F_DUMP | NLM_F_ACK.
- Stock kernel: the initial __netlink_dump_start() hits the rmem gate
and returns via errout_skb with cb_running stuck at 1 until
recvmsg() or close() drives forward progress.
- Patched kernel: the same probe leaves cb_running clear immediately
on the lock_taken=true failure, and the larger-rcvbuf continuation
path (legitimate dump in progress) is unchanged.
A scaling pass on 3500 such wedged sockets in a 256M UML guest shows
about 3.8-3.9 MiB of extra unreclaimable slab (/proc/meminfo
SUnreclaim) beyond the visible queued rmem on the vulnerable kernel,
roughly 1.1 KiB/socket. Real accumulation, but the test hits
RLIMIT_NOFILE long before the guest approaches OOM, so this still
looks like a local availability cleanup rather than an exhaustion
primitive.
No Cc: stable@ on the theory that the bug self-heals on
recvmsg()/close and the accumulation is mild. Happy to add it and
route to net if you'd rather see it backported.
net/netlink/af_netlink.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 4d609d5cf406..7019c17e6879 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2250,6 +2250,20 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
return 0;
}
+static void netlink_dump_cleanup(struct netlink_sock *nlk)
+{
+ struct module *module = nlk->cb.module;
+ struct sk_buff *skb = nlk->cb.skb;
+
+ if (nlk->cb.done)
+ nlk->cb.done(&nlk->cb);
+
+ WRITE_ONCE(nlk->cb_running, false);
+ mutex_unlock(&nlk->nl_cb_mutex);
+ module_put(module);
+ consume_skb(skb);
+}
+
static int netlink_dump(struct sock *sk, bool lock_taken)
{
struct netlink_sock *nlk = nlk_sk(sk);
@@ -2258,7 +2272,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
struct sk_buff *skb = NULL;
unsigned int rmem, rcvbuf;
size_t max_recvmsg_len;
- struct module *module;
int err = -ENOBUFS;
int alloc_min_size;
int alloc_size;
@@ -2366,19 +2379,14 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
else
__netlink_sendskb(sk, skb);
- if (cb->done)
- cb->done(cb);
-
- WRITE_ONCE(nlk->cb_running, false);
- module = cb->module;
- skb = cb->skb;
- mutex_unlock(&nlk->nl_cb_mutex);
- module_put(module);
- consume_skb(skb);
+ netlink_dump_cleanup(nlk);
return 0;
errout_skb:
- mutex_unlock(&nlk->nl_cb_mutex);
+ if (lock_taken)
+ netlink_dump_cleanup(nlk);
+ else
+ mutex_unlock(&nlk->nl_cb_mutex);
kfree_skb(skb);
return err;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH net-next] netlink: clean up failed initial dump-start state
2026-04-20 16:27 [PATCH net-next] netlink: clean up failed initial dump-start state Michael Bommarito
@ 2026-04-20 17:37 ` Jakub Kicinski
2026-04-20 17:56 ` Michael Bommarito
0 siblings, 1 reply; 3+ messages in thread
From: Jakub Kicinski @ 2026-04-20 17:37 UTC (permalink / raw)
To: Michael Bommarito
Cc: David S . Miller, Eric Dumazet, Paolo Abeni, netdev, Simon Horman,
Kuniyuki Iwashima, Kees Cook, Feng Yang, linux-kernel
On Mon, 20 Apr 2026 12:27:34 -0400 Michael Bommarito wrote:
> When __netlink_dump_start() has already installed cb->skb, taken the
> module reference and set cb_running, a failure from the first
> netlink_dump(sk, true) call returns via errout_skb without unwinding the
> callback lifetime. That leaves cb_running set and defers module_put()
> and consume_skb(cb->skb) until userspace drains the socket or closes it.
On a quick look I can't see which path clears the dump state in case we
keep failing to allocate an skb. Could you add more info on that?
> Share the normal callback teardown in a helper and use it on successful
> completion and on the initial lock_taken=true failure path. Keep the
> lock_taken=false continuation path unchanged, because recvmsg()-driven
> retries legitimately preserve cb_running when they run out of receive
> room.
>
> Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
> Assisted-by: Claude:claude-opus-4-6
> Assisted-by: Codex:gpt-5-4
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> Validation inside a UML guest on current mainline:
>
> - An unprivileged local task (uid=65534, no CAP_NET_ADMIN) opens a
> plain NETLINK_ROUTE socket, preloads sk_rmem_alloc with echoed
> NLMSG_ERROR replies from an unsupported rtnetlink type, then issues
> RTM_GETLINK | NLM_F_DUMP | NLM_F_ACK.
> - Stock kernel: the initial __netlink_dump_start() hits the rmem gate
> and returns via errout_skb with cb_running stuck at 1 until
> recvmsg() or close() drives forward progress.
> - Patched kernel: the same probe leaves cb_running clear immediately
> on the lock_taken=true failure, and the larger-rcvbuf continuation
> path (legitimate dump in progress) is unchanged.
>
> A scaling pass on 3500 such wedged sockets in a 256M UML guest shows
> about 3.8-3.9 MiB of extra unreclaimable slab (/proc/meminfo
> SUnreclaim) beyond the visible queued rmem on the vulnerable kernel,
> roughly 1.1 KiB/socket. Real accumulation, but the test hits
> RLIMIT_NOFILE long before the guest approaches OOM, so this still
> looks like a local availability cleanup rather than an exhaustion
> primitive.
This should be part of the commit message, it's useful to understanding
the problem. Actually more than the current commit msg TBH.
> No Cc: stable@ on the theory that the bug self-heals on
> recvmsg()/close and the accumulation is mild. Happy to add it and
> route to net if you'd rather see it backported.
>
> net/netlink/af_netlink.c | 30 +++++++++++++++++++-----------
> 1 file changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 4d609d5cf406..7019c17e6879 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2250,6 +2250,20 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
> return 0;
> }
>
> +static void netlink_dump_cleanup(struct netlink_sock *nlk)
> +{
> + struct module *module = nlk->cb.module;
> + struct sk_buff *skb = nlk->cb.skb;
> +
> + if (nlk->cb.done)
> + nlk->cb.done(&nlk->cb);
> +
> + WRITE_ONCE(nlk->cb_running, false);
> + mutex_unlock(&nlk->nl_cb_mutex);
> + module_put(module);
> + consume_skb(skb);
> +}
It's probably better to create a helper that shares the code with
the release path as well. And try not to switch the skb freeing
to consume_skb().
> static int netlink_dump(struct sock *sk, bool lock_taken)
> {
> struct netlink_sock *nlk = nlk_sk(sk);
> @@ -2258,7 +2272,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
> struct sk_buff *skb = NULL;
> unsigned int rmem, rcvbuf;
> size_t max_recvmsg_len;
> - struct module *module;
> int err = -ENOBUFS;
> int alloc_min_size;
> int alloc_size;
> @@ -2366,19 +2379,14 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
> else
> __netlink_sendskb(sk, skb);
>
> - if (cb->done)
> - cb->done(cb);
> -
> - WRITE_ONCE(nlk->cb_running, false);
> - module = cb->module;
> - skb = cb->skb;
> - mutex_unlock(&nlk->nl_cb_mutex);
> - module_put(module);
> - consume_skb(skb);
> + netlink_dump_cleanup(nlk);
> return 0;
>
> errout_skb:
> - mutex_unlock(&nlk->nl_cb_mutex);
> + if (lock_taken)
> + netlink_dump_cleanup(nlk);
> + else
> + mutex_unlock(&nlk->nl_cb_mutex);
> kfree_skb(skb);
> return err;
> }
If you're planning to repost - please wait until tomorrow, we ask that
revisions are at least 24h apart so that people across the timezones
have a chance to chime in.
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH net-next] netlink: clean up failed initial dump-start state
2026-04-20 17:37 ` Jakub Kicinski
@ 2026-04-20 17:56 ` Michael Bommarito
0 siblings, 0 replies; 3+ messages in thread
From: Michael Bommarito @ 2026-04-20 17:56 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Eric Dumazet, Paolo Abeni, netdev, Simon Horman,
Kuniyuki Iwashima, Kees Cook, Feng Yang, linux-kernel
On Mon, Apr 20, 2026 at 1:37 PM Jakub Kicinski <kuba@kernel.org> wrote:
> On a quick look I can't see which path clears the dump state in case we
> keep failing to allocate an skb. Could you add more info on that?
> ...
> This should be part of the commit message, it's useful to understanding
> the problem. Actually more than the current commit msg TBH.
> ...
> If you're planning to repost - please wait until tomorrow, we ask that
> revisions are at least 24h apart so that people across the timezones
> have a chance to chime in.
Thanks, good points. I'll set a reminder and follow up tomorrow with
your ideas if we don't hear from others.
Thanks,
Mike Bommarito
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-20 17:57 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 16:27 [PATCH net-next] netlink: clean up failed initial dump-start state Michael Bommarito
2026-04-20 17:37 ` Jakub Kicinski
2026-04-20 17:56 ` Michael Bommarito
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox