public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Michael Bommarito <michael.bommarito@gmail.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, Simon Horman <horms@kernel.org>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Kees Cook <kees@kernel.org>, Feng Yang <yangfeng@kylinos.cn>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next] netlink: clean up failed initial dump-start state
Date: Mon, 20 Apr 2026 10:37:15 -0700	[thread overview]
Message-ID: <20260420103715.347fbd4a@kernel.org> (raw)
In-Reply-To: <20260420162734.854587-1-michael.bommarito@gmail.com>

On Mon, 20 Apr 2026 12:27:34 -0400 Michael Bommarito wrote:
> When __netlink_dump_start() has already installed cb->skb, taken the
> module reference and set cb_running, a failure from the first
> netlink_dump(sk, true) call returns via errout_skb without unwinding the
> callback lifetime. That leaves cb_running set and defers module_put()
> and consume_skb(cb->skb) until userspace drains the socket or closes it.

On a quick look I can't see which path clears the dump state in case we
keep failing to allocate an skb. Could you add more info on that?

> Share the normal callback teardown in a helper and use it on successful
> completion and on the initial lock_taken=true failure path. Keep the
> lock_taken=false continuation path unchanged, because recvmsg()-driven
> retries legitimately preserve cb_running when they run out of receive
> room.
> 
> Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
> Assisted-by: Claude:claude-opus-4-6
> Assisted-by: Codex:gpt-5-4
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> Validation inside a UML guest on current mainline:
> 
>   - An unprivileged local task (uid=65534, no CAP_NET_ADMIN) opens a
>     plain NETLINK_ROUTE socket, preloads sk_rmem_alloc with echoed
>     NLMSG_ERROR replies from an unsupported rtnetlink type, then issues
>     RTM_GETLINK | NLM_F_DUMP | NLM_F_ACK.
>   - Stock kernel: the initial __netlink_dump_start() hits the rmem gate
>     and returns via errout_skb with cb_running stuck at 1 until
>     recvmsg() or close() drives forward progress.
>   - Patched kernel: the same probe leaves cb_running clear immediately
>     on the lock_taken=true failure, and the larger-rcvbuf continuation
>     path (legitimate dump in progress) is unchanged.
> 
> A scaling pass on 3500 such wedged sockets in a 256M UML guest shows
> about 3.8-3.9 MiB of extra unreclaimable slab (/proc/meminfo
> SUnreclaim) beyond the visible queued rmem on the vulnerable kernel,
> roughly 1.1 KiB/socket. Real accumulation, but the test hits
> RLIMIT_NOFILE long before the guest approaches OOM, so this still
> looks like a local availability cleanup rather than an exhaustion
> primitive.

This should be part of the commit message, it's useful to understanding
the problem. Actually more than the current commit msg TBH.

> No Cc: stable@ on the theory that the bug self-heals on
> recvmsg()/close and the accumulation is mild. Happy to add it and
> route to net if you'd rather see it backported.
> 
>  net/netlink/af_netlink.c | 30 +++++++++++++++++++-----------
>  1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 4d609d5cf406..7019c17e6879 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2250,6 +2250,20 @@ static int netlink_dump_done(struct netlink_sock *nlk, struct sk_buff *skb,
>  	return 0;
>  }
>  
> +static void netlink_dump_cleanup(struct netlink_sock *nlk)
> +{
> +	struct module *module = nlk->cb.module;
> +	struct sk_buff *skb = nlk->cb.skb;
> +
> +	if (nlk->cb.done)
> +		nlk->cb.done(&nlk->cb);
> +
> +	WRITE_ONCE(nlk->cb_running, false);
> +	mutex_unlock(&nlk->nl_cb_mutex);
> +	module_put(module);
> +	consume_skb(skb);
> +}

It's probably better to create a helper that shares the code with 
the release path as well. And try not to switch the skb freeing 
to consume_skb().

>  static int netlink_dump(struct sock *sk, bool lock_taken)
>  {
>  	struct netlink_sock *nlk = nlk_sk(sk);
> @@ -2258,7 +2272,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
>  	struct sk_buff *skb = NULL;
>  	unsigned int rmem, rcvbuf;
>  	size_t max_recvmsg_len;
> -	struct module *module;
>  	int err = -ENOBUFS;
>  	int alloc_min_size;
>  	int alloc_size;
> @@ -2366,19 +2379,14 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
>  	else
>  		__netlink_sendskb(sk, skb);
>  
> -	if (cb->done)
> -		cb->done(cb);
> -
> -	WRITE_ONCE(nlk->cb_running, false);
> -	module = cb->module;
> -	skb = cb->skb;
> -	mutex_unlock(&nlk->nl_cb_mutex);
> -	module_put(module);
> -	consume_skb(skb);
> +	netlink_dump_cleanup(nlk);
>  	return 0;
>  
>  errout_skb:
> -	mutex_unlock(&nlk->nl_cb_mutex);
> +	if (lock_taken)
> +		netlink_dump_cleanup(nlk);
> +	else
> +		mutex_unlock(&nlk->nl_cb_mutex);
>  	kfree_skb(skb);
>  	return err;
>  }

If you're planning to repost - please wait until tomorrow, we ask that
revisions are at least 24h apart so that people across the timezones
have a chance to chime in.

  reply	other threads:[~2026-04-20 17:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-20 16:27 [PATCH net-next] netlink: clean up failed initial dump-start state Michael Bommarito
2026-04-20 17:37 ` Jakub Kicinski [this message]
2026-04-20 17:56   ` Michael Bommarito

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260420103715.347fbd4a@kernel.org \
    --to=kuba@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kees@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.bommarito@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=yangfeng@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox