From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7171A156F4A; Tue, 26 Aug 2025 13:38:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756215535; cv=none; b=YDfcMGuLAowWtxExgF+dukDHLFyqSk88R9kXHAzrJ449tOzNc8WQp/cLzXaM0JqAINKpljG21qe+I5bBcFakFVXvowEg8qA6zU2wovl1poji3DWeUYvtiNVabliCQJQyl79wG70H7KDVGBbIJx67Zpik/mrdjEN4ZjMVk+A4HzA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756215535; c=relaxed/simple; bh=k8GUnIiEOU2hBB00nKuxEexoKMlmqYpOq/3lkxsW1EI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Aaeq0etOni+SKnXlSMCO6vdZC3jNHr/urfGYmBF+OafsasaXS7FNDsFg7D8ag9PjmnT4R9Wyc0o26be9UHzSit3KINkCKiZatTdvJ35Blq1n7S7c/2Cl+0rC2SKpLTcmII+3sLCaIhbR5PT3D1u2vSUDLKc/dprFH7FQPKDfqqU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=L/ndCUV3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="L/ndCUV3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EDB19C4CEF1; Tue, 26 Aug 2025 13:38:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1756215535; bh=k8GUnIiEOU2hBB00nKuxEexoKMlmqYpOq/3lkxsW1EI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=L/ndCUV3Db94PHuNrmDstgGXLza/ife24QX7hRZRR1v+KOoCIhE9KtTkjbayoPTlb yP1xwq9+UgnyWzfh8zdPJUUOHu8n/9Xe0ALZ3ag/le3LvNkfUhBCIua6oD9euvzvDh NMgCXJkMJN7ZrlrIbw2Dm0sFL+WAKrQnaA2ehuAI= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Jiayuan Chen , Alexei Starovoitov , Sasha Levin Subject: [PATCH 5.15 071/644] bpf, sockmap: Fix panic when calling skb_linearize Date: Tue, 26 Aug 2025 13:02:42 +0200 Message-ID: <20250826110948.256364349@linuxfoundation.org> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20250826110946.507083938@linuxfoundation.org> References: <20250826110946.507083938@linuxfoundation.org> User-Agent: quilt/0.68 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 5.15-stable review patch. If anyone has any objections, please let me know. ------------------ From: Jiayuan Chen commit 5ca2e29f6834c64c0e5a9ccf1278c21fb49b827e upstream. The panic can be reproduced by executing the command: ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress --rx-strp 100000 Then a kernel panic was captured: ''' [ 657.460555] kernel BUG at net/core/skbuff.c:2178! [ 657.462680] Tainted: [W]=WARN [ 657.463287] Workqueue: events sk_psock_backlog ... [ 657.469610] [ 657.469738] ? die+0x36/0x90 [ 657.469916] ? do_trap+0x1d0/0x270 [ 657.470118] ? pskb_expand_head+0x612/0xf40 [ 657.470376] ? pskb_expand_head+0x612/0xf40 [ 657.470620] ? do_error_trap+0xa3/0x170 [ 657.470846] ? pskb_expand_head+0x612/0xf40 [ 657.471092] ? handle_invalid_op+0x2c/0x40 [ 657.471335] ? pskb_expand_head+0x612/0xf40 [ 657.471579] ? exc_invalid_op+0x2d/0x40 [ 657.471805] ? asm_exc_invalid_op+0x1a/0x20 [ 657.472052] ? pskb_expand_head+0xd1/0xf40 [ 657.472292] ? pskb_expand_head+0x612/0xf40 [ 657.472540] ? lock_acquire+0x18f/0x4e0 [ 657.472766] ? find_held_lock+0x2d/0x110 [ 657.472999] ? __pfx_pskb_expand_head+0x10/0x10 [ 657.473263] ? __kmalloc_cache_noprof+0x5b/0x470 [ 657.473537] ? __pfx___lock_release.isra.0+0x10/0x10 [ 657.473826] __pskb_pull_tail+0xfd/0x1d20 [ 657.474062] ? __kasan_slab_alloc+0x4e/0x90 [ 657.474707] sk_psock_skb_ingress_enqueue+0x3bf/0x510 [ 657.475392] ? __kasan_kmalloc+0xaa/0xb0 [ 657.476010] sk_psock_backlog+0x5cf/0xd70 [ 657.476637] process_one_work+0x858/0x1a20 ''' The panic originates from the assertion BUG_ON(skb_shared(skb)) in skb_linearize(). A previous commit(see Fixes tag) introduced skb_get() to avoid race conditions between skb operations in the backlog and skb release in the recvmsg path. However, this caused the panic to always occur when skb_linearize is executed. The "--rx-strp 100000" parameter forces the RX path to use the strparser module which aggregates data until it reaches 100KB before calling sockmap logic. The 100KB payload exceeds MAX_MSG_FRAGS, triggering skb_linearize. To fix this issue, just move skb_get into sk_psock_skb_ingress_enqueue. ''' sk_psock_backlog: sk_psock_handle_skb skb_get(skb) <== we move it into 'sk_psock_skb_ingress_enqueue' sk_psock_skb_ingress____________ ↓ | | → sk_psock_skb_ingress_self | sk_psock_skb_ingress_enqueue sk_psock_verdict_apply_________________↑ skb_linearize ''' Note that for verdict_apply path, the skb_get operation is unnecessary so we add 'take_ref' param to control it's behavior. Fixes: a454d84ee20b ("bpf, sockmap: Fix skb refcnt race after locking changes") Signed-off-by: Jiayuan Chen Link: https://lore.kernel.org/r/20250407142234.47591-4-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov [ adapted skb_linearize() fix to 5.15's sk_psock_skb_ingress_enqueue implementation without the s_data parameter ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/core/skmsg.c | 50 +++++++++++++++++++++++++++----------------------- 1 file changed, 27 insertions(+), 23 deletions(-) --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -525,26 +525,35 @@ static int sk_psock_skb_ingress_enqueue( u32 off, u32 len, struct sk_psock *psock, struct sock *sk, - struct sk_msg *msg) + struct sk_msg *msg, + bool take_ref) { int num_sge, copied; - /* skb linearize may fail with ENOMEM, but lets simply try again - * later if this happens. Under memory pressure we don't want to - * drop the skb. We need to linearize the skb so that the mapping - * in skb_to_sgvec can not error. + /* skb_to_sgvec will fail when the total number of fragments in + * frag_list and frags exceeds MAX_MSG_FRAGS. For example, the + * caller may aggregate multiple skbs. */ - if (skb_linearize(skb)) - return -EAGAIN; num_sge = skb_to_sgvec(skb, msg->sg.data, off, len); - if (unlikely(num_sge < 0)) - return num_sge; + if (num_sge < 0) { + /* skb linearize may fail with ENOMEM, but lets simply try again + * later if this happens. Under memory pressure we don't want to + * drop the skb. We need to linearize the skb so that the mapping + * in skb_to_sgvec can not error. + * Note that skb_linearize requires the skb not to be shared. + */ + if (skb_linearize(skb)) + return -EAGAIN; + num_sge = skb_to_sgvec(skb, msg->sg.data, off, len); + if (unlikely(num_sge < 0)) + return num_sge; + } copied = len; msg->sg.start = 0; msg->sg.size = copied; msg->sg.end = num_sge; - msg->skb = skb; + msg->skb = take_ref ? skb_get(skb) : skb; sk_psock_queue_msg(psock, msg); sk_psock_data_ready(sk, psock); @@ -552,7 +561,7 @@ static int sk_psock_skb_ingress_enqueue( } static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb, - u32 off, u32 len); + u32 off, u32 len, bool take_ref); static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len) @@ -566,7 +575,7 @@ static int sk_psock_skb_ingress(struct s * correctly. */ if (unlikely(skb->sk == sk)) - return sk_psock_skb_ingress_self(psock, skb, off, len); + return sk_psock_skb_ingress_self(psock, skb, off, len, true); msg = sk_psock_create_ingress_msg(sk, skb); if (!msg) return -EAGAIN; @@ -578,7 +587,7 @@ static int sk_psock_skb_ingress(struct s * into user buffers. */ skb_set_owner_r(skb, sk); - err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg); + err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, true); if (err < 0) kfree(msg); return err; @@ -589,7 +598,7 @@ static int sk_psock_skb_ingress(struct s * because the skb is already accounted for here. */ static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb, - u32 off, u32 len) + u32 off, u32 len, bool take_ref) { struct sk_msg *msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); struct sock *sk = psock->sk; @@ -599,7 +608,7 @@ static int sk_psock_skb_ingress_self(str return -EAGAIN; sk_msg_init(msg); skb_set_owner_r(skb, sk); - err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg); + err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, take_ref); if (err < 0) kfree(msg); return err; @@ -608,18 +617,13 @@ static int sk_psock_skb_ingress_self(str static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len, bool ingress) { - int err = 0; - if (!ingress) { if (!sock_writeable(psock->sk)) return -EAGAIN; return skb_send_sock(psock->sk, skb, off, len); } - skb_get(skb); - err = sk_psock_skb_ingress(psock, skb, off, len); - if (err < 0) - kfree_skb(skb); - return err; + + return sk_psock_skb_ingress(psock, skb, off, len); } static void sk_psock_skb_state(struct sk_psock *psock, @@ -1016,7 +1020,7 @@ static int sk_psock_verdict_apply(struct off = stm->offset; len = stm->full_len; } - err = sk_psock_skb_ingress_self(psock, skb, off, len); + err = sk_psock_skb_ingress_self(psock, skb, off, len, false); } if (err < 0) { spin_lock_bh(&psock->ingress_lock);