From: Jakub Kicinski <kuba@kernel.org>
To: Michael Bommarito <michael.bommarito@gmail.com>
Cc: "David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
netdev@vger.kernel.org, Simon Horman <horms@kernel.org>,
Kuniyuki Iwashima <kuniyu@google.com>,
Kees Cook <kees@kernel.org>, Feng Yang <yangfeng@kylinos.cn>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next] netlink: clean up failed initial dump-start state
Date: Mon, 20 Apr 2026 10:37:15 -0700 [thread overview]
Message-ID: <20260420103715.347fbd4a@kernel.org> (raw)
In-Reply-To: <20260420162734.854587-1-michael.bommarito@gmail.com>
On Mon, 20 Apr 2026 12:27:34 -0400 Michael Bommarito wrote:
> When __netlink_dump_start() has already installed cb->skb, taken the
> module reference and set cb_running, a failure from the first
> netlink_dump(sk, true) call returns via errout_skb without unwinding the
> callback lifetime. That leaves cb_running set and defers module_put()
> and consume_skb(cb->skb) until userspace drains the socket or closes it.
On a quick look I can't see which path clears the dump state in case we
keep failing to allocate an skb. Could you add more info on that?
> Share the normal callback teardown in a helper and use it on successful
> completion and on the initial lock_taken=true failure path. Keep the
> lock_taken=false continuation path unchanged, because recvmsg()-driven
> retries legitimately preserve cb_running when they run out of receive
> room.
>
> Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
> Assisted-by: Claude:claude-opus-4-6
> Assisted-by: Codex:gpt-5-4
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> Validation inside a UML guest on current mainline:
>
> - An unprivileged local task (uid=65534, no CAP_NET_ADMIN) opens a
> plain NETLINK_ROUTE socket, preloads sk_rmem_alloc with echoed
> NLMSG_ERROR replies from an unsupported rtnetlink type, then issues
> RTM_GETLINK | NLM_F_DUMP | NLM_F_ACK.
> - Stock kernel: the initial __netlink_dump_start() hits the rmem gate
> and returns via errout_skb with cb_running stuck at 1 until
> recvmsg() or close() drives forward progress.
> - Patched kernel: the same probe leaves cb_running clear immediately
> on the lock_taken=true failure, and the larger-rcvbuf continuation
> path (legitimate dump in progress) is unchanged.
>
> A scaling pass on 3500 such wedged sockets in a 256M UML guest shows
> about 3.8-3.9 MiB of extra unreclaimable slab (/proc/meminfo
> SUnreclaim) beyond the visible queued rmem on the vulnerable kernel,
> roughly 1.1 KiB/socket. Real accumulation, but the test hits
> RLIMIT_NOFILE long before the guest approaches OOM, so this still
> looks like a local availability cleanup rather than an exhaustion
> primitive.
This should be part of the commit message, it's useful to understanding
the problem. Actually more than the current commit msg TBH.
> No Cc: stable@ on the theory that the bug self-heals on
> recvmsg()/close and the accumulation is mild. Happy to add it and
> route to net if you'd rather see it backported.
>
> net/netlink/af_netlink.c | 30 +++++++++++++++++++-----------
> 1 file changed, 19 insertions(+), 11 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 4d609d5cf406..7019c17e6879 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2250,6 +2250,20 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
> return 0;
> }
>
> +static void netlink_dump_cleanup(struct netlink_sock *nlk)
> +{
> + struct module *module = nlk->cb.module;
> + struct sk_buff *skb = nlk->cb.skb;
> +
> + if (nlk->cb.done)
> + nlk->cb.done(&nlk->cb);
> +
> + WRITE_ONCE(nlk->cb_running, false);
> + mutex_unlock(&nlk->nl_cb_mutex);
> + module_put(module);
> + consume_skb(skb);
> +}
It's probably better to create a helper that shares the code with
the release path as well. And try not to switch the skb freeing
to consume_skb().
> static int netlink_dump(struct sock *sk, bool lock_taken)
> {
> struct netlink_sock *nlk = nlk_sk(sk);
> @@ -2258,7 +2272,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
> struct sk_buff *skb = NULL;
> unsigned int rmem, rcvbuf;
> size_t max_recvmsg_len;
> - struct module *module;
> int err = -ENOBUFS;
> int alloc_min_size;
> int alloc_size;
> @@ -2366,19 +2379,14 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
> else
> __netlink_sendskb(sk, skb);
>
> - if (cb->done)
> - cb->done(cb);
> -
> - WRITE_ONCE(nlk->cb_running, false);
> - module = cb->module;
> - skb = cb->skb;
> - mutex_unlock(&nlk->nl_cb_mutex);
> - module_put(module);
> - consume_skb(skb);
> + netlink_dump_cleanup(nlk);
> return 0;
>
> errout_skb:
> - mutex_unlock(&nlk->nl_cb_mutex);
> + if (lock_taken)
> + netlink_dump_cleanup(nlk);
> + else
> + mutex_unlock(&nlk->nl_cb_mutex);
> kfree_skb(skb);
> return err;
> }
If you're planning to repost - please wait until tomorrow, we ask that
revisions are at least 24h apart so that people across the timezones
have a chance to chime in.
next prev parent reply other threads:[~2026-04-20 17:37 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 16:27 [PATCH net-next] netlink: clean up failed initial dump-start state Michael Bommarito
2026-04-20 17:37 ` Jakub Kicinski [this message]
2026-04-20 17:56 ` Michael Bommarito
2026-04-23 21:28 ` [PATCH net-next v2] " Michael Bommarito
2026-04-24 1:50 ` Jakub Kicinski
2026-04-24 11:48 ` Michael Bommarito
2026-04-24 7:36 ` [syzbot ci] " syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260420103715.347fbd4a@kernel.org \
--to=kuba@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kees@kernel.org \
--cc=kuniyu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=michael.bommarito@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=yangfeng@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.