linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 0/4] bpf, sockmap: Fix data loss and panic issues
@ 2025-04-07 14:21 Jiayuan Chen
  2025-04-07 14:21 ` [PATCH bpf-next v1 1/4] bpf, sockmap: Fix data lost during EAGAIN retries Jiayuan Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Jiayuan Chen @ 2025-04-07 14:21 UTC (permalink / raw)
  To: bpf
  Cc: mrpre, Jiayuan Chen, John Fastabend, Jakub Sitnicki,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest

I was writing a benchmark based on sockmap + TCP and discovered several
issues:

1. When EAGAIN occurs, the direction of skb is incorrect, causing data
   loss when retry.
2. When sending partial data, the offset is not recorded, leading to
   duplicate data being sent when retry.
3. An unexpected BUG_ON() judgment in skb_linearize is triggered.
4. The memory of psock->ingress_skb is not limited by the socket buffer
   and memcg.

Issues 1, 2, and 3 are described in each patch's commit message.

Regarding issue 4, this patchset does not cover it as it is difficult to
handle in practice, and I am still working on it.

Here is a brief description of the issue:
When using sockmap to skb/stream redirect, if the receiving end does not
perform read operations, all data will be buffered in ingress_skb.

For example:
'''
// set memory limit to 50G
cgcreate -g memory:myGroup
cgset -r memory.max="5000M" myGroup

// start benchmark and disable consumer from reading
cgexec -g "memory:myGroup" ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress --delay-consumer=-1 -d 100
Iter   0 ( 29.179us): Send Speed 2668.548 MB/s (20360.406 calls/s), ... Rcv Speed    0.000 MB/s (   0.000 calls/s)
Iter   1 ( -7.237us): Send Speed 2694.467 MB/s (20557.149 calls/s), ... Rcv Speed    0.000 MB/s (   0.000 calls/s)
Iter   2 ( -1.918us): Send Speed 2693.404 MB/s (20548.039 calls/s), ... Rcv Speed    0.000 MB/s (   0.000 calls/s)
Iter   3 ( -0.684us): Send Speed 2693.138 MB/s (20548.014 calls/s), ... Rcv Speed    0.000 MB/s (   0.000 calls/s)
Iter   4 (  7.879us): Send Speed 2698.620 MB/s (20588.838 calls/s), ... Rcv Speed    0.000 MB/s (   0.000 calls/s)
Iter   5 ( -3.224us): Send Speed 2696.553 MB/s (20573.066 calls/s), ... Rcv Speed    0.000 MB/s (   0.000 calls/s)
Iter   6 ( -5.409us): Send Speed 2699.705 MB/s (20597.111 calls/s), ... Rcv Speed    0.000 MB/s (   0.000 calls/s)
Iter   7 ( -0.439us): Send Speed 2699.691 MB/s (20597.009 calls/s), ... Rcv Speed    0.000 MB/s (   0.000 calls/s)
...

// memory usage are not limited
cat /proc/slabinfo | grep skb
skbuff_small_head   11824024 11824024    704   46    8 : tunables    0    0    0 : slabdata 257044 257044      0
skbuff_fclone_cache 11822080 11822080    512   32    4 : tunables    0    0    0 : slabdata 369440 369440      0
'''
Thus, a simple socket in a large file upload/download model can eat the
entire OS memory.

We must charge the skb memory to psock->sk, and if we do not want losing
skb, we need to feedback the error info to read_sock/read_skb when the
enqueue operation of psock->ingress_skb fails.

---
My another patch related to stability also requires maintainers to spare
some time from their busy schedules for review.
https://lore.kernel.org/bpf/20250317092257.68760-1-jiayuan.chen@linux.dev/T/#t


Jiayuan Chen (4):
  bpf, sockmap: Fix data lost during EAGAIN retries
  bpf, sockmap: fix duplicated data transmission
  bpf, sockmap: Fix panic when calling skb_linearize
  selftest/bpf/benchs: Add benchmark for sockmap usage

 net/core/skmsg.c                              |  48 +-
 tools/testing/selftests/bpf/Makefile          |   2 +
 tools/testing/selftests/bpf/bench.c           |   4 +
 .../selftests/bpf/benchs/bench_sockmap.c      | 599 ++++++++++++++++++
 .../selftests/bpf/progs/bench_sockmap_prog.c  |  65 ++
 5 files changed, 697 insertions(+), 21 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_sockmap.c
 create mode 100644 tools/testing/selftests/bpf/progs/bench_sockmap_prog.c

-- 
2.47.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-04-10  5:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-07 14:21 [PATCH bpf-next v1 0/4] bpf, sockmap: Fix data loss and panic issues Jiayuan Chen
2025-04-07 14:21 ` [PATCH bpf-next v1 1/4] bpf, sockmap: Fix data lost during EAGAIN retries Jiayuan Chen
2025-04-07 14:21 ` [PATCH bpf-next v1 2/4] bpf, sockmap: fix duplicated data transmission Jiayuan Chen
2025-04-07 14:21 ` [PATCH bpf-next v1 3/4] bpf, sockmap: Fix panic when calling skb_linearize Jiayuan Chen
2025-04-07 14:21 ` [PATCH bpf-next v1 4/4] selftest/bpf/benchs: Add benchmark for sockmap usage Jiayuan Chen
2025-04-10  3:10 ` [PATCH bpf-next v1 0/4] bpf, sockmap: Fix data loss and panic issues patchwork-bot+netdevbpf
2025-04-10  5:50   ` John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).