From: Paolo Abeni <pabeni@redhat.com>
To: zhengguoyong <zhenggy@chinatelecom.cn>,
john.fastabend@gmail.com, jakub@cloudflare.com,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH] bpf, sockmap: Update tp->rcv_nxt in sk_psock_skb_ingress
Date: Thu, 9 Oct 2025 09:59:07 +0200 [thread overview]
Message-ID: <6ea5bc8e-5d77-4a9a-9a8d-72a8dc71ac38@redhat.com> (raw)
In-Reply-To: <3b78ca04-f4b9-4d12-998d-4e21a3a8397f@chinatelecom.cn>
On 10/9/25 5:07 AM, zhengguoyong wrote:
> When using sockmap to forward TCP traffic to the application
> layer of the peer socket, the peer socket's tcp_bpf_recvmsg_parser
> processing flow will synchronously update the tp->copied_seq field.
> This causes tp->rcv_nxt to become less than tp->copied_seq.
>
> Later, when this socket receives SKB packets from the protocol stack,
> in the call chain tcp_data_ready → tcp_epollin_ready, the function
> tcp_epollin_ready will return false, preventing the socket from being
> woken up to receive new packets.
>
> Therefore, it is necessary to synchronously update the tp->rcv_nxt
> information in sk_psock_skb_ingress.
>
> Signed-off-by: GuoYong Zheng <zhenggy@chinatelecom.cn>
> ---
> net/core/skmsg.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 9becadd..e9d841c 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -576,6 +576,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> struct sock *sk = psock->sk;
> struct sk_msg *msg;
> int err;
> + u32 seq;
>
> /* If we are receiving on the same sock skb->sk is already assigned,
> * skip memory accounting and owner transition seeing it already set
> @@ -595,8 +596,15 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> */
> skb_set_owner_r(skb, sk);
> err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, true);
> - if (err < 0)
> + if (err < 0) {
> kfree(msg);
> + } else {
> + bh_lock_sock_nested(sk);
Apparently this is triggering deadlock in our CI:
WARNING: inconsistent lock state
6.17.0-gb9bdadc5b6ca-dirty #8 Tainted: G OE
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kworker/1:36/3777 [HC0[0]:SC0[0]:HE1:SE1] takes:
00000000b80163e8 (slock-AF_INET/1){+.?.}-{2:2}, at:
sk_psock_backlog+0x656/0xf18
{IN-SOFTIRQ-W} state was registered at:
__lock_acquire+0x4dc/0xd58
lock_acquire.part.0+0x114/0x278
lock_acquire+0x9c/0x160
_raw_spin_lock_nested+0x58/0xa8
tcp_v4_rcv+0x23a0/0x32a8
ip_protocol_deliver_rcu+0x6c/0x418
ip_local_deliver_finish+0x364/0x5d0
ip_local_deliver+0x17a/0x3f8
ip_rcv+0xd6/0x318
__netif_receive_skb_one_core+0x11c/0x158
process_backlog+0x58c/0x1618
__napi_poll+0x86/0x488
net_rx_action+0x482/0xb08
handle_softirqs+0x3cc/0xc88
do_softirq+0x1fc/0x248
__local_bh_enable_ip+0x332/0x3a0
__dev_queue_xmit+0x90a/0x1738
neigh_resolve_output+0x4c4/0x848
ip_finish_output2+0x728/0x1bc8
ip_output+0x1ea/0x5d0
__ip_queue_xmit+0x71a/0x1088
__tcp_transmit_skb+0x118c/0x2470
tcp_connect+0x10ca/0x18b8
tcp_v4_connect+0x11cc/0x1788
__inet_stream_connect+0x324/0xc00
inet_stream_connect+0x70/0xb8
__sys_connect+0xea/0x148
__do_sys_socketcall+0x2b4/0x4c0
__do_syscall+0x138/0x3e0
system_call+0x6e/0x90
irq event stamp: 531707
hardirqs last enabled at (531707): [<0008bdf65ac55ad2>]
__local_bh_enable_ip+0x23a/0x3a0
hardirqs last disabled at (531705): [<0008bdf65ac55b6a>]
__local_bh_enable_ip+0x2d2/0x3a0
softirqs last enabled at (531706): [<0008bdf65c31f142>]
sk_psock_skb_ingress_enqueue+0x2aa/0x468
softirqs last disabled at (531704): [<0008bdf65c31f112>]
sk_psock_skb_ingress_enqueue+0x27a/0x468
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(slock-AF_INET/1);
<Interrupt>
lock(slock-AF_INET/1);
*** DEADLOCK ***
3 locks held by kworker/1:36/3777:
#0: 0000000080042158 ((wq_completion)events){+.+.}-{0:0}, at:
process_one_work+0x766/0x15e0
#1: 0008bd765cbffba8
((work_completion)(&(&psock->work)->work)){+.+.}-{0:0}, at:
process_one_work+0x794/0x15e0
#2: 000000008eb583c8 (&psock->work_mutex){+.+.}-{3:3}, at:
sk_psock_backlog+0x198/0xf18
stack backtrace:
CPU: 1 UID: 0 PID: 3777 Comm: kworker/1:36 Tainted: G OE
6.17.0-gb9bdadc5b6ca-dirty #8 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: IBM 8561 LT1 400 (KVM/Linux)
Workqueue: events sk_psock_backlog
Call Trace:
[<0008bdf65aaca9de>] dump_stack_lvl+0x106/0x168
[<0008bdf65adb5990>] print_usage_bug.part.0+0x2e8/0x398
[<0008bdf65adb616a>] mark_lock_irq+0x72a/0x9c0
[<0008bdf65adb670e>] mark_lock+0x30e/0x7c0
[<0008bdf65adb6f38>] mark_usage+0xc8/0x178
[<0008bdf65adb819c>] __lock_acquire+0x4dc/0xd58
[<0008bdf65adb8b2c>] lock_acquire.part.0+0x114/0x278
[<0008bdf65adb8d2c>] lock_acquire+0x9c/0x160
[<0008bdf65cbc6498>] _raw_spin_lock_nested+0x58/0xa8
[<0008bdf65c323716>] sk_psock_backlog+0x656/0xf18
[<0008bdf65ac9b236>] process_one_work+0x83e/0x15e0
[<0008bdf65ac9c790>] worker_thread+0x7b8/0x1020
[<0008bdf65acbb0f8>] kthread+0x3c0/0x6e8
[<0008bdf65aad02f4>] __ret_from_fork+0xdc/0x800
[<0008bdf65cbc7fd2>] ret_from_fork+0xa/0x30
INFO: lockdep is turned off.
More details at:
https://github.com/kernel-patches/bpf/actions/runs/18367014116/job/52322106520
/P
prev parent reply other threads:[~2025-10-09 7:59 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-09 3:07 [PATCH] bpf, sockmap: Update tp->rcv_nxt in sk_psock_skb_ingress zhengguoyong
2025-10-09 7:07 ` Eric Dumazet
2025-10-10 8:18 ` zhengguoyong
2025-10-10 9:16 ` Eric Dumazet
2025-10-09 7:59 ` Paolo Abeni [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6ea5bc8e-5d77-4a9a-9a8d-72a8dc71ac38@redhat.com \
--to=pabeni@redhat.com \
--cc=bpf@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=zhenggy@chinatelecom.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).