From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E04933A1D2 for ; Sat, 24 Jan 2026 06:15:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.188 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769235343; cv=none; b=oO0nE6iKyBBlPIGCOn3abRXBcMzpghmL/TmBaiVwESDFNoOadq9AK5NecjzqjnPXG+ubrM2Jb6Dsdqr2dtIANYVaSsLwGZXp4Tum7iRLwOPlorxA5hoTof1dQfsmtrPTqySrb+DFpnXutk5U224AWHIH7RPioGLS7tqsg06sjys= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769235343; c=relaxed/simple; bh=wgYELbTzeRKx53lNu3PGbO9UHZIelrY/ulnN/pnRapI=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=rmhWhuflqbypN5+jkx/OCSXM2o1YEhdU6SfSgn2F/ZXt8Zl0f6YnTXUbTz75PTZKUqfFEx3+/EPzkiI/1wFASqy9ew13DaticTu4lT8gl81Vn0JdtmIsGa197jqbCBwFpoMA640445EyTBliAAXr+AX8IIdluzvbcRmrrsCSy20= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=j3O2M7Fm; arc=none smtp.client-ip=95.215.58.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="j3O2M7Fm" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1769235329; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=KvJCH4XXdbWBtSibWgDG2N47XCPK7n8tQQHZT5gdhDc=; b=j3O2M7Fmq7bzXme8+GFH0leHRc8FvKjY8ezR8inSZDOrqPRjX/Llvvjpbg/CP1Ij3RV9yp U1dZSlIvRaP1m2JF8CliUsC9J3ZWWm5zRFYXocRzz+ZD/cGjTh8LSGH29TRVpy31AaSvfE PLiDjg8z89xaZ+05vM1SUCkMz4g1iLw= From: Jiayuan Chen To: bpf@vger.kernel.org Cc: Jiayuan Chen , John Fastabend , Jakub Sitnicki , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Neal Cardwell , Kuniyuki Iwashima , David Ahern , Andrii Nakryiko , Eduard Zingerman , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Stefano Garzarella , Michal Luczaj , Cong Wang , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH bpf-next v8 0/3] bpf: Fix FIONREAD and copied_seq issues Date: Sat, 24 Jan 2026 14:14:30 +0800 Message-ID: <20260124061502.24350-1-jiayuan.chen@linux.dev> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT syzkaller reported a bug [1] where a socket using sockmap, after being unloaded, exposed incorrect copied_seq calculation. The selftest I provided can be used to reproduce the issue reported by syzkaller. TCP recvmsg seq # bug 2: copied E92C873, seq E68D125, rcvnxt E7CEB7C, fl 40 WARNING: CPU: 1 PID: 5997 at net/ipv4/tcp.c:2724 tcp_recvmsg_locked+0xb2f/0x2910 net/ipv4/tcp.c:2724 Call Trace: receive_fallback_to_copy net/ipv4/tcp.c:1968 [inline] tcp_zerocopy_receive+0x131a/0x2120 net/ipv4/tcp.c:2200 do_tcp_getsockopt+0xe28/0x26c0 net/ipv4/tcp.c:4713 tcp_getsockopt+0xdf/0x100 net/ipv4/tcp.c:4812 do_sock_getsockopt+0x34d/0x440 net/socket.c:2421 __sys_getsockopt+0x12f/0x260 net/socket.c:2450 __do_sys_getsockopt net/socket.c:2457 [inline] __se_sys_getsockopt net/socket.c:2454 [inline] __x64_sys_getsockopt+0xbd/0x160 net/socket.c:2454 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f A sockmap socket maintains its own receive queue (ingress_msg) which may contain data from either its own protocol stack or forwarded from other sockets. FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack The issue occurs when reading from ingress_msg: we update tp->copied_seq by default, but if the data comes from other sockets (not the socket's own protocol stack), tcp->rcv_nxt remains unchanged. Later, when converting back to a native socket, reads may fail as copied_seq could be significantly larger than rcv_nxt. Additionally, FIONREAD calculation based on copied_seq and rcv_nxt is insufficient for sockmap sockets, requiring separate field tracking. [1] https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983 --- v7 -> v8: Address Jakub Sitnicki's feedback: - Remove sk_receive_queue check in tcp_bpf_ioctl, only report ingress_msg data length for FIONREAD - Minor nits fixes - Add Reviewed-by tag from John Fastabend https://lore.kernel.org/bpf/20260113025121.197535-1-jiayuan.chen@linux.dev/ v5 -> v7: Some modifications suggested by Jakub Sitnicki, and added Reviewed-by tag. https://lore.kernel.org/bpf/20260106051458.279151-1-jiayuan.chen@linux.dev/ v1 -> v5: Use skmsg.sk instead of extending BPF_F_XXX macro and fix CI failure reported by CI v1: https://lore.kernel.org/bpf/20251117110736.293040-1-jiayuan.chen@linux.dev/ Jiayuan Chen (3): bpf, sockmap: Fix incorrect copied_seq calculation bpf, sockmap: Fix FIONREAD for sockmap bpf, selftest: Add tests for FIONREAD and copied_seq include/linux/skmsg.h | 70 ++++- net/core/skmsg.c | 30 +- net/ipv4/tcp_bpf.c | 25 +- net/ipv4/udp_bpf.c | 23 +- .../selftests/bpf/prog_tests/sockmap_basic.c | 277 +++++++++++++++++- .../bpf/progs/test_sockmap_pass_prog.c | 14 + 6 files changed, 422 insertions(+), 17 deletions(-) -- 2.43.0