BPF List
 help / color / mirror / Atom feed
* [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap
@ 2024-10-20 11:03 zijianzhang
  2024-10-20 11:03 ` [PATCH bpf 1/8] selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap zijianzhang
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: zijianzhang @ 2024-10-20 11:03 UTC (permalink / raw)
  To: bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

From: Zijian Zhang <zijianzhang@bytedance.com>

Several fixes to test_sockmap and added push/pop logic for msg_verify_data
Before the fixes, some of the tests in test_sockmap are problematic,
resulting in pseudo-correct result.

1. txmsg_pass is not set in some tests, as a result, no eBPF program is
attached to the sockmap.
2. In SENDPAGE, a wrong iov_length in test_send_large may result in some
test skippings and failures.
3. The calculation of total_bytes in msg_loop_rx is wrong, which may cause
msg_loop_rx end early and skip some data tests.

Besides, for msg_verify_data, I added push/pop checking logic to function
msg_verify_data and added more tests for different cases.

After that, I found that there are some bugs in bpf_msg_push_data,
bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason
why they have not been exposed is that because of the above problems, they
will not be triggered.

With the fixes, we can pass the sockmap test with data integrity test now.
However, the fixes to test_sockmap expose more problems in sockhash test
with SENDPAGE and ktls with SENDPAGE.

The problem I observed,
1. In sockhash test, a NULL pointer kernel BUG will be reported for nearly
every cork test. More inspections are needed for splice_to_socket.

BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 0 P4D 0 
Oops: Oops: 0000 [#3] PREEMPT SMP PTI
CPU: 3 UID: 0 PID: 2122 Comm: test_sockmap 6.12.0-rc2.bm.1-amd64+ #98
Tainted: [D]=DIE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:splice_to_socket+0x34a/0x480
Call Trace:
 <TASK>
 ? __die_body+0x1e/0x60
 ? page_fault_oops+0x159/0x4d0
 ? exc_page_fault+0x7e/0x180
 ? asm_exc_page_fault+0x26/0x30
 ? splice_to_socket+0x34a/0x480
? __memcg_slab_post_alloc_hook+0x205/0x3c0
? alloc_pipe_info+0xd6/0x1f0
? __kmalloc_noprof+0x37f/0x3b0
direct_splice_actor+0x40/0x100
splice_direct_to_actor+0xfd/0x290
? __pfx_direct_splice_actor+0x10/0x10
do_splice_direct_actor+0x82/0xb0
? __pfx_direct_file_splice_eof+0x10/0x10
do_splice_direct+0x13/0x20
? __pfx_direct_splice_actor+0x10/0x10
do_sendfile+0x33c/0x3f0
__x64_sys_sendfile64+0xa7/0xc0
do_syscall_64+0x62/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>
Modules linked in:
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---

2. txmsg_pass are not set before, and some tests are skipped. Now after
the fixes, we have some failure cases now. More fixes are needed either
for the selftest or the ktls kernel code.

1/ 6 sockhash:ktls:txmsg test passthrough:OK
2/ 6 sockhash:ktls:txmsg test redirect:OK
3/ 1 sockhash:ktls:txmsg test redirect wait send mem:OK
4/ 6 sockhash:ktls:txmsg test drop:OK
5/ 6 sockhash:ktls:txmsg test ingress redirect:OK
6/ 7 sockhash:ktls:txmsg test skb:OK
7/12 sockhash:ktls:txmsg test apply:OK
8/12 sockhash:ktls:txmsg test cork:OK
9/ 3 sockhash:ktls:txmsg test hanging corks:OK
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
10/11 sockhash:ktls:txmsg test push_data:FAIL
detected data corruption @iov[0]:0 17 != 00, 00 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 00 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
11/17 sockhash:ktls:txmsg test pull-data:FAIL
recv failed(): Invalid argument
rx thread exited with err 1.
recv failed(): Invalid argument
rx thread exited with err 1.
recv failed(): Bad message
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
detected data corruption @iov[0]:0 17 != 00, 03 ?= 01
data verify msg failed: Unknown error -2001
rx thread exited with err 1.
12/ 9 sockhash:ktls:txmsg test pop-data:FAIL
recv failed(): Bad message
rx thread exited with err 1.
recv failed(): Bad message
rx thread exited with err 1.
13/ 6 sockhash:ktls:txmsg test push/pop data:FAIL
14/ 1 sockhash:ktls:txmsg test ingress parser:OK
15/ 0 sockhash:ktls:txmsg test ingress parser2:OK
Pass: 11 Fail: 17

Zijian Zhang (8):
  selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap
  selftests/bpf: Fix SENDPAGE data logic in test_sockmap
  selftests/bpf: Fix total_bytes in msg_loop_rx in test_sockmap
  selftests/bpf: Add push/pop checking for msg_verify_data in
    test_sockmap
  selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap
  bpf, sockmap: Several fixes to bpf_msg_push_data
  bpf, sockmap: Several fixes to bpf_msg_pop_data
  bpf, sockmap: Fix sk_msg_reset_curr

 net/core/filter.c                          |  89 +++++-----
 tools/testing/selftests/bpf/test_sockmap.c | 180 +++++++++++++++++++--
 2 files changed, 215 insertions(+), 54 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf 1/8] selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap
  2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
@ 2024-10-20 11:03 ` zijianzhang
  2024-10-24  4:06   ` John Fastabend
  2024-10-20 11:03 ` [PATCH bpf 2/8] selftests/bpf: Fix SENDPAGE data logic " zijianzhang
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: zijianzhang @ 2024-10-20 11:03 UTC (permalink / raw)
  To: bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

From: Zijian Zhang <zijianzhang@bytedance.com>

Add txmsg_pass to test_txmsg_pull/push/pop. If txmsg_pass is missing,
tx_prog will be NULL, and no program will be attached to the sockmap.
As a result, pull/push/pop are never invoked.

Fixes: 328aa08a081b ("bpf: Selftests, break down test_sockmap into subtests")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 075c93ed143e..0f065273fde3 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1596,11 +1596,13 @@ static void test_txmsg_cork_hangs(int cgrp, struct sockmap_options *opt)
 static void test_txmsg_pull(int cgrp, struct sockmap_options *opt)
 {
 	/* Test basic start/end */
+	txmsg_pass = 1;
 	txmsg_start = 1;
 	txmsg_end = 2;
 	test_send(opt, cgrp);
 
 	/* Test >4k pull */
+	txmsg_pass = 1;
 	txmsg_start = 4096;
 	txmsg_end = 9182;
 	test_send_large(opt, cgrp);
@@ -1629,11 +1631,13 @@ static void test_txmsg_pull(int cgrp, struct sockmap_options *opt)
 static void test_txmsg_pop(int cgrp, struct sockmap_options *opt)
 {
 	/* Test basic pop */
+	txmsg_pass = 1;
 	txmsg_start_pop = 1;
 	txmsg_pop = 2;
 	test_send_many(opt, cgrp);
 
 	/* Test pop with >4k */
+	txmsg_pass = 1;
 	txmsg_start_pop = 4096;
 	txmsg_pop = 4096;
 	test_send_large(opt, cgrp);
@@ -1662,11 +1666,13 @@ static void test_txmsg_pop(int cgrp, struct sockmap_options *opt)
 static void test_txmsg_push(int cgrp, struct sockmap_options *opt)
 {
 	/* Test basic push */
+	txmsg_pass = 1;
 	txmsg_start_push = 1;
 	txmsg_end_push = 1;
 	test_send(opt, cgrp);
 
 	/* Test push 4kB >4k */
+	txmsg_pass = 1;
 	txmsg_start_push = 4096;
 	txmsg_end_push = 4096;
 	test_send_large(opt, cgrp);
@@ -1687,6 +1693,7 @@ static void test_txmsg_push(int cgrp, struct sockmap_options *opt)
 
 static void test_txmsg_push_pop(int cgrp, struct sockmap_options *opt)
 {
+	txmsg_pass = 1;
 	txmsg_start_push = 1;
 	txmsg_end_push = 10;
 	txmsg_start_pop = 5;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf 2/8] selftests/bpf: Fix SENDPAGE data logic in test_sockmap
  2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
  2024-10-20 11:03 ` [PATCH bpf 1/8] selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap zijianzhang
@ 2024-10-20 11:03 ` zijianzhang
  2024-10-24  4:47   ` John Fastabend
  2024-10-20 11:03 ` [PATCH bpf 3/8] selftests/bpf: Fix total_bytes in msg_loop_rx " zijianzhang
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: zijianzhang @ 2024-10-20 11:03 UTC (permalink / raw)
  To: bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

From: Zijian Zhang <zijianzhang@bytedance.com>

In the SENDPAGE test, "opt->iov_length * cnt" size of data will be sent
in cnt times of sendfile.
1. In push/pop tests, they will be invoked cnt times, for the simplicity of
msg_verify_data, change chunk_sz to iov_length
2. Change iov_length in test_send_large from 1024 to 8192. We have pop test
where txmsg_start_pop is 4096. 4096 > 1024, an error will be returned.

Fixes: 328aa08a081b ("bpf: Selftests, break down test_sockmap into subtests")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 0f065273fde3..1d59bed90d80 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -420,16 +420,18 @@ static int msg_loop_sendpage(int fd, int iov_length, int cnt,
 {
 	bool drop = opt->drop_expected;
 	unsigned char k = 0;
+	int i, j, fp;
 	FILE *file;
-	int i, fp;
 
 	file = tmpfile();
 	if (!file) {
 		perror("create file for sendpage");
 		return 1;
 	}
-	for (i = 0; i < iov_length * cnt; i++, k++)
-		fwrite(&k, sizeof(char), 1, file);
+	for (i = 0; i < cnt; i++, k = 0) {
+		for (j = 0; j < iov_length; j++, k++)
+			fwrite(&k, sizeof(char), 1, file);
+	}
 	fflush(file);
 	fseek(file, 0, SEEK_SET);
 
@@ -623,7 +625,9 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		 * This is really only useful for testing edge cases in code
 		 * paths.
 		 */
-		total_bytes = (float)iov_count * (float)iov_length * (float)cnt;
+		total_bytes = (float)iov_length * (float)cnt;
+		if (!opt->sendpage)
+			total_bytes *= (float)iov_count;
 		if (txmsg_apply)
 			txmsg_pop_total = txmsg_pop * (total_bytes / txmsg_apply);
 		else
@@ -701,7 +705,7 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 
 			if (data) {
 				int chunk_sz = opt->sendpage ?
-						iov_length * cnt :
+						iov_length :
 						iov_length * iov_count;
 
 				errno = msg_verify_data(&msg, recv, chunk_sz, &k, &bytes_cnt);
@@ -1466,8 +1470,8 @@ static void test_send_many(struct sockmap_options *opt, int cgrp)
 
 static void test_send_large(struct sockmap_options *opt, int cgrp)
 {
-	opt->iov_length = 256;
-	opt->iov_count = 1024;
+	opt->iov_length = 8192;
+	opt->iov_count = 32;
 	opt->rate = 2;
 	test_exec(cgrp, opt);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf 3/8] selftests/bpf: Fix total_bytes in msg_loop_rx in test_sockmap
  2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
  2024-10-20 11:03 ` [PATCH bpf 1/8] selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap zijianzhang
  2024-10-20 11:03 ` [PATCH bpf 2/8] selftests/bpf: Fix SENDPAGE data logic " zijianzhang
@ 2024-10-20 11:03 ` zijianzhang
  2024-10-24  4:48   ` John Fastabend
  2024-10-20 11:03 ` [PATCH bpf 4/8] selftests/bpf: Add push/pop checking for msg_verify_data " zijianzhang
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: zijianzhang @ 2024-10-20 11:03 UTC (permalink / raw)
  To: bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

From: Zijian Zhang <zijianzhang@bytedance.com>

total_bytes in msg_loop_rx should also take push into account, otherwise
total_bytes will be a smaller value, which makes the msg_loop_rx end early.

Besides, total_bytes has already taken pop, so we don't need to subtract
some bytes from iov_buf in sendmsg_test. The additional subtraction may
make total_bytes a negative number, and msg_loop_rx will just end without
checking anything.

Fixes: 18d4e900a450 ("bpf: Selftests, improve test_sockmap total bytes counter")
Fixes: d69672147faa ("selftests, bpf: Add one test for sockmap with strparser")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 1d59bed90d80..5f4558f1f004 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -606,8 +606,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		}
 		clock_gettime(CLOCK_MONOTONIC, &s->end);
 	} else {
+		float total_bytes, txmsg_pop_total, txmsg_push_total;
 		int slct, recvp = 0, recv, max_fd = fd;
-		float total_bytes, txmsg_pop_total;
 		int fd_flags = O_NONBLOCK;
 		struct timeval timeout;
 		unsigned char k = 0;
@@ -628,10 +628,14 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		total_bytes = (float)iov_length * (float)cnt;
 		if (!opt->sendpage)
 			total_bytes *= (float)iov_count;
-		if (txmsg_apply)
+		if (txmsg_apply) {
+			txmsg_push_total = txmsg_end_push * (total_bytes / txmsg_apply);
 			txmsg_pop_total = txmsg_pop * (total_bytes / txmsg_apply);
-		else
+		} else {
+			txmsg_push_total = txmsg_end_push * cnt;
 			txmsg_pop_total = txmsg_pop * cnt;
+		}
+		total_bytes += txmsg_push_total;
 		total_bytes -= txmsg_pop_total;
 		err = clock_gettime(CLOCK_MONOTONIC, &s->start);
 		if (err < 0)
@@ -800,8 +804,6 @@ static int sendmsg_test(struct sockmap_options *opt)
 
 	rxpid = fork();
 	if (rxpid == 0) {
-		if (txmsg_pop || txmsg_start_pop)
-			iov_buf -= (txmsg_pop - txmsg_start_pop + 1);
 		if (opt->drop_expected || txmsg_ktls_skb_drop)
 			_exit(0);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf 4/8] selftests/bpf: Add push/pop checking for msg_verify_data in test_sockmap
  2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
                   ` (2 preceding siblings ...)
  2024-10-20 11:03 ` [PATCH bpf 3/8] selftests/bpf: Fix total_bytes in msg_loop_rx " zijianzhang
@ 2024-10-20 11:03 ` zijianzhang
  2024-10-24  5:20   ` John Fastabend
  2024-10-20 11:03 ` [PATCH bpf 5/8] selftests/bpf: Add more tests for test_txmsg_push_pop " zijianzhang
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: zijianzhang @ 2024-10-20 11:03 UTC (permalink / raw)
  To: bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

From: Zijian Zhang <zijianzhang@bytedance.com>

Add push/pop checking for msg_verify_data in test_sockmap. In pop/push
and cork tests, the logic will be different,
1. It makes the layout of the received data difficult
2. It makes it hard to calculate the total_bytes in the recvmsg
Temporarily skip the data integrity test for these cases now, added a TODO

Fixes: ee9b352ce465 ("selftests/bpf: Fix msg_verify_data in test_sockmap")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 106 ++++++++++++++++++++-
 1 file changed, 101 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 5f4558f1f004..61a747afcd05 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -88,6 +88,10 @@ int ktls;
 int peek_flag;
 int skb_use_parser;
 int txmsg_omit_skb_parser;
+int verify_push_start;
+int verify_push_len;
+int verify_pop_start;
+int verify_pop_len;
 
 static const struct option long_options[] = {
 	{"help",	no_argument,		NULL, 'h' },
@@ -514,12 +518,41 @@ static int msg_alloc_iov(struct msghdr *msg,
 	return -ENOMEM;
 }
 
-/* TODO: Add verification logic for push, pull and pop data */
+/* In push or pop test, we need to do some calculations for msg_verify_data */
+static void msg_verify_date_prep(void)
+{
+	int push_range_end = txmsg_start_push + txmsg_end_push - 1;
+	int pop_range_end = txmsg_start_pop + txmsg_pop - 1;
+
+	if (txmsg_end_push && txmsg_pop &&
+	    txmsg_start_push <= pop_range_end && txmsg_start_pop <= push_range_end) {
+		/* The push range and the pop range overlap */
+		int overlap_len;
+
+		verify_push_start = txmsg_start_push;
+		verify_pop_start = txmsg_start_pop;
+		if (txmsg_start_push < txmsg_start_pop)
+			overlap_len = min(push_range_end - txmsg_start_pop + 1, txmsg_pop);
+		else
+			overlap_len = min(pop_range_end - txmsg_start_push + 1, txmsg_end_push);
+		verify_push_len = max(txmsg_end_push - overlap_len, 0);
+		verify_pop_len = max(txmsg_pop - overlap_len, 0);
+	} else {
+		/* Otherwise */
+		verify_push_start = txmsg_start_push;
+		verify_pop_start = txmsg_start_pop;
+		verify_push_len = txmsg_end_push;
+		verify_pop_len = txmsg_pop;
+	}
+}
+
 static int msg_verify_data(struct msghdr *msg, int size, int chunk_sz,
-				 unsigned char *k_p, int *bytes_cnt_p)
+			   unsigned char *k_p, int *bytes_cnt_p,
+			   int *check_cnt_p, int *push_p)
 {
-	int i, j, bytes_cnt = *bytes_cnt_p;
+	int bytes_cnt = *bytes_cnt_p, check_cnt = *check_cnt_p, push = *push_p;
 	unsigned char k = *k_p;
+	int i, j;
 
 	for (i = 0, j = 0; i < msg->msg_iovlen && size; i++, j = 0) {
 		unsigned char *d = msg->msg_iov[i].iov_base;
@@ -538,6 +571,37 @@ static int msg_verify_data(struct msghdr *msg, int size, int chunk_sz,
 		}
 
 		for (; j < msg->msg_iov[i].iov_len && size; j++) {
+			if (push > 0 &&
+			    check_cnt == verify_push_start + verify_push_len - push) {
+				int skipped;
+revisit_push:
+				skipped = push;
+				if (j + push >= msg->msg_iov[i].iov_len)
+					skipped = msg->msg_iov[i].iov_len - j;
+				push -= skipped;
+				size -= skipped;
+				j += skipped - 1;
+				check_cnt += skipped;
+				continue;
+			}
+
+			if (verify_pop_len > 0 && check_cnt == verify_pop_start) {
+				bytes_cnt += verify_pop_len;
+				check_cnt += verify_pop_len;
+				k += verify_pop_len;
+
+				if (bytes_cnt == chunk_sz) {
+					k = 0;
+					bytes_cnt = 0;
+					check_cnt = 0;
+					push = verify_push_len;
+				}
+
+				if (push > 0 &&
+				    check_cnt == verify_push_start + verify_push_len - push)
+					goto revisit_push;
+			}
+
 			if (d[j] != k++) {
 				fprintf(stderr,
 					"detected data corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
@@ -545,15 +609,20 @@ static int msg_verify_data(struct msghdr *msg, int size, int chunk_sz,
 				return -EDATAINTEGRITY;
 			}
 			bytes_cnt++;
+			check_cnt++;
 			if (bytes_cnt == chunk_sz) {
 				k = 0;
 				bytes_cnt = 0;
+				check_cnt = 0;
+				push = verify_push_len;
 			}
 			size--;
 		}
 	}
 	*k_p = k;
 	*bytes_cnt_p = bytes_cnt;
+	*check_cnt_p = check_cnt;
+	*push_p = push;
 	return 0;
 }
 
@@ -612,6 +681,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		struct timeval timeout;
 		unsigned char k = 0;
 		int bytes_cnt = 0;
+		int check_cnt = 0;
+		int push = 0;
 		fd_set w;
 
 		fcntl(fd, fd_flags);
@@ -637,6 +708,10 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		}
 		total_bytes += txmsg_push_total;
 		total_bytes -= txmsg_pop_total;
+		if (data) {
+			msg_verify_date_prep();
+			push = verify_push_len;
+		}
 		err = clock_gettime(CLOCK_MONOTONIC, &s->start);
 		if (err < 0)
 			perror("recv start time");
@@ -712,7 +787,8 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 						iov_length :
 						iov_length * iov_count;
 
-				errno = msg_verify_data(&msg, recv, chunk_sz, &k, &bytes_cnt);
+				errno = msg_verify_data(&msg, recv, chunk_sz, &k, &bytes_cnt,
+							&check_cnt, &push);
 				if (errno) {
 					perror("data verify msg failed");
 					goto out_errno;
@@ -722,7 +798,9 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 								recvp,
 								chunk_sz,
 								&k,
-								&bytes_cnt);
+								&bytes_cnt,
+								&check_cnt,
+								&push);
 					if (errno) {
 						perror("data verify msg_peek failed");
 						goto out_errno;
@@ -1636,6 +1714,8 @@ static void test_txmsg_pull(int cgrp, struct sockmap_options *opt)
 
 static void test_txmsg_pop(int cgrp, struct sockmap_options *opt)
 {
+	bool data = opt->data_test;
+
 	/* Test basic pop */
 	txmsg_pass = 1;
 	txmsg_start_pop = 1;
@@ -1654,6 +1734,12 @@ static void test_txmsg_pop(int cgrp, struct sockmap_options *opt)
 	txmsg_pop = 2;
 	test_send_many(opt, cgrp);
 
+	/* TODO: Test for pop + cork should be different,
+	 * - It makes the layout of the received data difficult
+	 * - It makes it hard to calculate the total_bytes in the recvmsg
+	 * Temporarily skip the data integrity test for this case now.
+	 */
+	opt->data_test = false;
 	/* Test pop + cork */
 	txmsg_redir = 0;
 	txmsg_cork = 512;
@@ -1667,10 +1753,13 @@ static void test_txmsg_pop(int cgrp, struct sockmap_options *opt)
 	txmsg_start_pop = 1;
 	txmsg_pop = 2;
 	test_send_many(opt, cgrp);
+	opt->data_test = data;
 }
 
 static void test_txmsg_push(int cgrp, struct sockmap_options *opt)
 {
+	bool data = opt->data_test;
+
 	/* Test basic push */
 	txmsg_pass = 1;
 	txmsg_start_push = 1;
@@ -1689,12 +1778,19 @@ static void test_txmsg_push(int cgrp, struct sockmap_options *opt)
 	txmsg_end_push = 2;
 	test_send_many(opt, cgrp);
 
+	/* TODO: Test for push + cork should be different,
+	 * - It makes the layout of the received data difficult
+	 * - It makes it hard to calculate the total_bytes in the recvmsg
+	 * Temporarily skip the data integrity test for this case now.
+	 */
+	opt->data_test = false;
 	/* Test push + cork */
 	txmsg_redir = 0;
 	txmsg_cork = 512;
 	txmsg_start_push = 1;
 	txmsg_end_push = 2;
 	test_send_many(opt, cgrp);
+	opt->data_test = data;
 }
 
 static void test_txmsg_push_pop(int cgrp, struct sockmap_options *opt)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf 5/8] selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap
  2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
                   ` (3 preceding siblings ...)
  2024-10-20 11:03 ` [PATCH bpf 4/8] selftests/bpf: Add push/pop checking for msg_verify_data " zijianzhang
@ 2024-10-20 11:03 ` zijianzhang
  2024-10-25  5:00   ` John Fastabend
  2024-10-20 11:03 ` [PATCH bpf 6/8] bpf, sockmap: Several fixes to bpf_msg_push_data zijianzhang
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: zijianzhang @ 2024-10-20 11:03 UTC (permalink / raw)
  To: bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

From: Zijian Zhang <zijianzhang@bytedance.com>

Add more tests for test_txmsg_push_pop in test_sockmap for stricter tests.

Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 37 ++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 61a747afcd05..e5c7ecbe57e3 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1795,12 +1795,49 @@ static void test_txmsg_push(int cgrp, struct sockmap_options *opt)
 
 static void test_txmsg_push_pop(int cgrp, struct sockmap_options *opt)
 {
+	/* Test push/pop range overlapping */
 	txmsg_pass = 1;
 	txmsg_start_push = 1;
 	txmsg_end_push = 10;
 	txmsg_start_pop = 5;
 	txmsg_pop = 4;
 	test_send_large(opt, cgrp);
+
+	txmsg_pass = 1;
+	txmsg_start_push = 1;
+	txmsg_end_push = 10;
+	txmsg_start_pop = 5;
+	txmsg_pop = 16;
+	test_send_large(opt, cgrp);
+
+	txmsg_pass = 1;
+	txmsg_start_push = 5;
+	txmsg_end_push = 4;
+	txmsg_start_pop = 1;
+	txmsg_pop = 10;
+	test_send_large(opt, cgrp);
+
+	txmsg_pass = 1;
+	txmsg_start_push = 5;
+	txmsg_end_push = 16;
+	txmsg_start_pop = 1;
+	txmsg_pop = 10;
+	test_send_large(opt, cgrp);
+
+	/* Test push/pop range non-overlapping */
+	txmsg_pass = 1;
+	txmsg_start_push = 1;
+	txmsg_end_push = 10;
+	txmsg_start_pop = 16;
+	txmsg_pop = 4;
+	test_send_large(opt, cgrp);
+
+	txmsg_pass = 1;
+	txmsg_start_push = 16;
+	txmsg_end_push = 10;
+	txmsg_start_pop = 5;
+	txmsg_pop = 4;
+	test_send_large(opt, cgrp);
 }
 
 static void test_txmsg_apply(int cgrp, struct sockmap_options *opt)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf 6/8] bpf, sockmap: Several fixes to bpf_msg_push_data
  2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
                   ` (4 preceding siblings ...)
  2024-10-20 11:03 ` [PATCH bpf 5/8] selftests/bpf: Add more tests for test_txmsg_push_pop " zijianzhang
@ 2024-10-20 11:03 ` zijianzhang
  2024-10-20 11:03 ` [PATCH bpf 7/8] bpf, sockmap: Several fixes to bpf_msg_pop_data zijianzhang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: zijianzhang @ 2024-10-20 11:03 UTC (permalink / raw)
  To: bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

From: Zijian Zhang <zijianzhang@bytedance.com>

Several fixes to bpf_msg_push_data,
1. test_sockmap has tests where bpf_msg_push_data be invoked to push some
data at the end of a message, but -EINVAL is returned for this case.
2. Add logic for corner case where msg->sg.size is zero
3. Before "if (!copy)", the logic for some corner cases related to
msg->sg.end is missing, thus add the logic to handle it.

Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
---
 net/core/filter.c | 53 +++++++++++++++++++++++++++++------------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index a88e6924c4c0..4fae427aa5ca 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2793,7 +2793,7 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 		sk_msg_iter_var_next(i);
 	} while (i != msg->sg.end);
 
-	if (start >= offset + l)
+	if (start > offset + l)
 		return -EINVAL;
 
 	space = MAX_MSG_FRAGS - sk_msg_elem_used(msg);
@@ -2818,6 +2818,8 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 
 		raw = page_address(page);
 
+		if (i == msg->sg.end)
+			sk_msg_iter_var_prev(i);
 		psge = sk_msg_elem(msg, i);
 		front = start - offset;
 		back = psge->length - front;
@@ -2834,7 +2836,13 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 		}
 
 		put_page(sg_page(psge));
-	} else if (start - offset) {
+		new = i;
+		goto place_new;
+	}
+
+	if (start - offset) {
+		if (i == msg->sg.end)
+			sk_msg_iter_var_prev(i);
 		psge = sk_msg_elem(msg, i);
 		rsge = sk_msg_elem_cpy(msg, i);
 
@@ -2845,39 +2853,44 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
 		sk_msg_iter_var_next(i);
 		sg_unmark_end(psge);
 		sg_unmark_end(&rsge);
-		sk_msg_iter_next(msg, end);
 	}
 
 	/* Slot(s) to place newly allocated data */
+	sk_msg_iter_next(msg, end);
 	new = i;
+	sk_msg_iter_var_next(i);
+
+	if (i == msg->sg.end) {
+		if (!rsge.length)
+			goto place_new;
+		sk_msg_iter_next(msg, end);
+		goto place_new;
+	}
 
 	/* Shift one or two slots as needed */
-	if (!copy) {
-		sge = sk_msg_elem_cpy(msg, i);
+	sge = sk_msg_elem_cpy(msg, new);
+	sg_unmark_end(&sge);
 
+	nsge = sk_msg_elem_cpy(msg, i);
+	if (rsge.length) {
 		sk_msg_iter_var_next(i);
-		sg_unmark_end(&sge);
+		nnsge = sk_msg_elem_cpy(msg, i);
 		sk_msg_iter_next(msg, end);
+	}
 
-		nsge = sk_msg_elem_cpy(msg, i);
+	while (i != msg->sg.end) {
+		msg->sg.data[i] = sge;
+		sge = nsge;
+		sk_msg_iter_var_next(i);
 		if (rsge.length) {
-			sk_msg_iter_var_next(i);
+			nsge = nnsge;
 			nnsge = sk_msg_elem_cpy(msg, i);
-		}
-
-		while (i != msg->sg.end) {
-			msg->sg.data[i] = sge;
-			sge = nsge;
-			sk_msg_iter_var_next(i);
-			if (rsge.length) {
-				nsge = nnsge;
-				nnsge = sk_msg_elem_cpy(msg, i);
-			} else {
-				nsge = sk_msg_elem_cpy(msg, i);
-			}
+		} else {
+			nsge = sk_msg_elem_cpy(msg, i);
 		}
 	}
 
+place_new:
 	/* Place newly allocated data buffer */
 	sk_mem_charge(msg->sk, len);
 	msg->sg.size += len;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf 7/8] bpf, sockmap: Several fixes to bpf_msg_pop_data
  2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
                   ` (5 preceding siblings ...)
  2024-10-20 11:03 ` [PATCH bpf 6/8] bpf, sockmap: Several fixes to bpf_msg_push_data zijianzhang
@ 2024-10-20 11:03 ` zijianzhang
  2024-10-25  5:20   ` John Fastabend
  2024-10-20 11:03 ` [PATCH bpf 8/8] bpf, sockmap: Fix sk_msg_reset_curr zijianzhang
  2024-10-24  4:06 ` [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap John Fastabend
  8 siblings, 1 reply; 22+ messages in thread
From: zijianzhang @ 2024-10-20 11:03 UTC (permalink / raw)
  To: bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

From: Zijian Zhang <zijianzhang@bytedance.com>

Several fixes to bpf_msg_pop_data,
1. In sk_msg_shift_left, we should put_page
2. if (len == 0), return early is better
3. pop the entire sk_msg (last == msg->sg.size) should be supported
4. Fix for the value of variable a and sge->length
5. In sk_msg_shift_left, after shifting, i has already pointed to the next
element. Addtional sk_msg_iter_var_next may result in BUG.

Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
---
 net/core/filter.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 4fae427aa5ca..8e1a8a8d8d55 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2919,8 +2919,10 @@ static const struct bpf_func_proto bpf_msg_push_data_proto = {
 
 static void sk_msg_shift_left(struct sk_msg *msg, int i)
 {
+	struct scatterlist *sge = sk_msg_elem(msg, i);
 	int prev;
 
+	put_page(sg_page(sge));
 	do {
 		prev = i;
 		sk_msg_iter_var_next(i);
@@ -2957,6 +2959,9 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 	if (unlikely(flags))
 		return -EINVAL;
 
+	if (unlikely(len == 0))
+		return 0;
+
 	/* First find the starting scatterlist element */
 	i = msg->sg.start;
 	do {
@@ -2969,7 +2974,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 	} while (i != msg->sg.end);
 
 	/* Bounds checks: start and pop must be inside message */
-	if (start >= offset + l || last >= msg->sg.size)
+	if (start >= offset + l || last > msg->sg.size)
 		return -EINVAL;
 
 	space = MAX_MSG_FRAGS - sk_msg_elem_used(msg);
@@ -2998,12 +3003,12 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 	 */
 	if (start != offset) {
 		struct scatterlist *nsge, *sge = sk_msg_elem(msg, i);
-		int a = start;
+		int a = start - offset;
 		int b = sge->length - pop - a;
 
 		sk_msg_iter_var_next(i);
 
-		if (pop < sge->length - a) {
+		if (b > 0) {
 			if (space) {
 				sge->length = a;
 				sk_msg_shift_right(msg, i);
@@ -3022,7 +3027,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 				if (unlikely(!page))
 					return -ENOMEM;
 
-				sge->length = a;
+				sge->length = a + b;
 				orig = sg_page(sge);
 				from = sg_virt(sge);
 				to = page_address(page);
@@ -3032,7 +3037,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 				put_page(orig);
 			}
 			pop = 0;
-		} else if (pop >= sge->length - a) {
+		} else {
 			pop -= (sge->length - a);
 			sge->length = a;
 		}
@@ -3066,7 +3071,6 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
 			pop -= sge->length;
 			sk_msg_shift_left(msg, i);
 		}
-		sk_msg_iter_var_next(i);
 	}
 
 	sk_mem_uncharge(msg->sk, len - pop);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf 8/8] bpf, sockmap: Fix sk_msg_reset_curr
  2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
                   ` (6 preceding siblings ...)
  2024-10-20 11:03 ` [PATCH bpf 7/8] bpf, sockmap: Several fixes to bpf_msg_pop_data zijianzhang
@ 2024-10-20 11:03 ` zijianzhang
  2024-10-26  5:05   ` John Fastabend
  2024-10-24  4:06 ` [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap John Fastabend
  8 siblings, 1 reply; 22+ messages in thread
From: zijianzhang @ 2024-10-20 11:03 UTC (permalink / raw)
  To: bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

From: Zijian Zhang <zijianzhang@bytedance.com>

Found in the test_txmsg_pull in test_sockmap,
```
txmsg_cork = 512;
opt->iov_length = 3;
opt->iov_count = 1;
opt->rate = 512;
```
The first sendmsg will send an sk_msg with size 3, and bpf_msg_pull_data
will be invoked the first time. sk_msg_reset_curr will reset the copybreak
from 3 to 0, then the second sendmsg will write into copybreak starting at
0 which overwrites the first sendmsg. The same problem happens in push and
pop test. Thus, fix sk_msg_reset_curr to restore the correct copybreak.

Fixes: bb9aefde5bba ("bpf: sockmap, updating the sg structure should also update curr")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
---
 net/core/filter.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 8e1a8a8d8d55..b725d3a2fdb8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2619,18 +2619,16 @@ BPF_CALL_2(bpf_msg_cork_bytes, struct sk_msg *, msg, u32, bytes)
 
 static void sk_msg_reset_curr(struct sk_msg *msg)
 {
-	u32 i = msg->sg.start;
-	u32 len = 0;
-
-	do {
-		len += sk_msg_elem(msg, i)->length;
-		sk_msg_iter_var_next(i);
-		if (len >= msg->sg.size)
-			break;
-	} while (i != msg->sg.end);
+	if (!msg->sg.size) {
+		msg->sg.curr = msg->sg.start;
+		msg->sg.copybreak = 0;
+	} else {
+		u32 i = msg->sg.end;
 
-	msg->sg.curr = i;
-	msg->sg.copybreak = 0;
+		sk_msg_iter_var_prev(i);
+		msg->sg.curr = i;
+		msg->sg.copybreak = msg->sg.data[i].length;
+	}
 }
 
 static const struct bpf_func_proto bpf_msg_cork_bytes_proto = {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap
  2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
                   ` (7 preceding siblings ...)
  2024-10-20 11:03 ` [PATCH bpf 8/8] bpf, sockmap: Fix sk_msg_reset_curr zijianzhang
@ 2024-10-24  4:06 ` John Fastabend
  2024-10-24 14:43   ` Daniel Borkmann
  8 siblings, 1 reply; 22+ messages in thread
From: John Fastabend @ 2024-10-24  4:06 UTC (permalink / raw)
  To: zijianzhang, bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> Several fixes to test_sockmap and added push/pop logic for msg_verify_data
> Before the fixes, some of the tests in test_sockmap are problematic,
> resulting in pseudo-correct result.
> 
> 1. txmsg_pass is not set in some tests, as a result, no eBPF program is
> attached to the sockmap.
> 2. In SENDPAGE, a wrong iov_length in test_send_large may result in some
> test skippings and failures.
> 3. The calculation of total_bytes in msg_loop_rx is wrong, which may cause
> msg_loop_rx end early and skip some data tests.
> 
> Besides, for msg_verify_data, I added push/pop checking logic to function
> msg_verify_data and added more tests for different cases.

Thanks! Yep I think push/pop are not widely used anywhere unfortunately.
There are some interesting uses for push/pop to add/edit headers, but
I've not gotten there yet clearly.

> 
> After that, I found that there are some bugs in bpf_msg_push_data,
> bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason
> why they have not been exposed is that because of the above problems, they
> will not be triggered.

Good. I'll review these quickly tonight/tomorrow and run some testing.
We don't currently have any longer running tests with push/pop.

> 
> With the fixes, we can pass the sockmap test with data integrity test now.
> However, the fixes to test_sockmap expose more problems in sockhash test
> with SENDPAGE and ktls with SENDPAGE.
> 
> The problem I observed,

Thanks for digging into these.

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf 1/8] selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap
  2024-10-20 11:03 ` [PATCH bpf 1/8] selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap zijianzhang
@ 2024-10-24  4:06   ` John Fastabend
  0 siblings, 0 replies; 22+ messages in thread
From: John Fastabend @ 2024-10-24  4:06 UTC (permalink / raw)
  To: zijianzhang, bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> Add txmsg_pass to test_txmsg_pull/push/pop. If txmsg_pass is missing,
> tx_prog will be NULL, and no program will be attached to the sockmap.
> As a result, pull/push/pop are never invoked.
> 
> Fixes: 328aa08a081b ("bpf: Selftests, break down test_sockmap into subtests")
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> ---
>  tools/testing/selftests/bpf/test_sockmap.c | 7 +++++++
>  1 file changed, 7 insertions(+)

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf 2/8] selftests/bpf: Fix SENDPAGE data logic in test_sockmap
  2024-10-20 11:03 ` [PATCH bpf 2/8] selftests/bpf: Fix SENDPAGE data logic " zijianzhang
@ 2024-10-24  4:47   ` John Fastabend
  0 siblings, 0 replies; 22+ messages in thread
From: John Fastabend @ 2024-10-24  4:47 UTC (permalink / raw)
  To: zijianzhang, bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> In the SENDPAGE test, "opt->iov_length * cnt" size of data will be sent
> in cnt times of sendfile.
> 1. In push/pop tests, they will be invoked cnt times, for the simplicity of
> msg_verify_data, change chunk_sz to iov_length
> 2. Change iov_length in test_send_large from 1024 to 8192. We have pop test
> where txmsg_start_pop is 4096. 4096 > 1024, an error will be returned.
> 
> Fixes: 328aa08a081b ("bpf: Selftests, break down test_sockmap into subtests")
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> ---

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf 3/8] selftests/bpf: Fix total_bytes in msg_loop_rx in test_sockmap
  2024-10-20 11:03 ` [PATCH bpf 3/8] selftests/bpf: Fix total_bytes in msg_loop_rx " zijianzhang
@ 2024-10-24  4:48   ` John Fastabend
  0 siblings, 0 replies; 22+ messages in thread
From: John Fastabend @ 2024-10-24  4:48 UTC (permalink / raw)
  To: zijianzhang, bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> total_bytes in msg_loop_rx should also take push into account, otherwise
> total_bytes will be a smaller value, which makes the msg_loop_rx end early.
> 
> Besides, total_bytes has already taken pop, so we don't need to subtract
> some bytes from iov_buf in sendmsg_test. The additional subtraction may
> make total_bytes a negative number, and msg_loop_rx will just end without
> checking anything.
> 
> Fixes: 18d4e900a450 ("bpf: Selftests, improve test_sockmap total bytes counter")
> Fixes: d69672147faa ("selftests, bpf: Add one test for sockmap with strparser")
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> ---

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf 4/8] selftests/bpf: Add push/pop checking for msg_verify_data in test_sockmap
  2024-10-20 11:03 ` [PATCH bpf 4/8] selftests/bpf: Add push/pop checking for msg_verify_data " zijianzhang
@ 2024-10-24  5:20   ` John Fastabend
  0 siblings, 0 replies; 22+ messages in thread
From: John Fastabend @ 2024-10-24  5:20 UTC (permalink / raw)
  To: zijianzhang, bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> Add push/pop checking for msg_verify_data in test_sockmap. In pop/push
> and cork tests, the logic will be different,
> 1. It makes the layout of the received data difficult
> 2. It makes it hard to calculate the total_bytes in the recvmsg
> Temporarily skip the data integrity test for these cases now, added a TODO
> 
> Fixes: ee9b352ce465 ("selftests/bpf: Fix msg_verify_data in test_sockmap")
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> ---

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap
  2024-10-24  4:06 ` [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap John Fastabend
@ 2024-10-24 14:43   ` Daniel Borkmann
  2024-10-24 17:56     ` Zijian Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Borkmann @ 2024-10-24 14:43 UTC (permalink / raw)
  To: John Fastabend, zijianzhang, bpf
  Cc: martin.lau, ast, andrii, eddyz87, song, yonghong.song, kpsingh,
	sdf, haoluo, jolsa, davem, edumazet, kuba, pabeni, mykolal, shuah,
	jakub, liujian56, cong.wang

Hi Zijian,

On 10/24/24 6:06 AM, John Fastabend wrote:
> zijianzhang@ wrote:
>> From: Zijian Zhang <zijianzhang@bytedance.com>
>>
>> Several fixes to test_sockmap and added push/pop logic for msg_verify_data
>> Before the fixes, some of the tests in test_sockmap are problematic,
>> resulting in pseudo-correct result.
>>
>> 1. txmsg_pass is not set in some tests, as a result, no eBPF program is
>> attached to the sockmap.
>> 2. In SENDPAGE, a wrong iov_length in test_send_large may result in some
>> test skippings and failures.
>> 3. The calculation of total_bytes in msg_loop_rx is wrong, which may cause
>> msg_loop_rx end early and skip some data tests.
>>
>> Besides, for msg_verify_data, I added push/pop checking logic to function
>> msg_verify_data and added more tests for different cases.
> 
> Thanks! Yep I think push/pop are not widely used anywhere unfortunately.
> There are some interesting uses for push/pop to add/edit headers, but
> I've not gotten there yet clearly.
> 
>> After that, I found that there are some bugs in bpf_msg_push_data,
>> bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason
>> why they have not been exposed is that because of the above problems, they
>> will not be triggered.
> 
> Good. I'll review these quickly tonight/tomorrow and run some testing.
> We don't currently have any longer running tests with push/pop.

Looks like the series needs a rebase to latest bpf tree.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap
  2024-10-24 14:43   ` Daniel Borkmann
@ 2024-10-24 17:56     ` Zijian Zhang
  2024-10-24 18:12       ` Daniel Borkmann
  0 siblings, 1 reply; 22+ messages in thread
From: Zijian Zhang @ 2024-10-24 17:56 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend, bpf
  Cc: martin.lau, ast, andrii, eddyz87, song, yonghong.song, kpsingh,
	sdf, haoluo, jolsa, davem, edumazet, kuba, pabeni, mykolal, shuah,
	jakub, liujian56, cong.wang

On 10/24/24 7:43 AM, Daniel Borkmann wrote:
> Hi Zijian,
> 
> On 10/24/24 6:06 AM, John Fastabend wrote:
>> zijianzhang@ wrote:
>>> From: Zijian Zhang <zijianzhang@bytedance.com>
>>>
>>> Several fixes to test_sockmap and added push/pop logic for 
>>> msg_verify_data
>>> Before the fixes, some of the tests in test_sockmap are problematic,
>>> resulting in pseudo-correct result.
>>>
>>> 1. txmsg_pass is not set in some tests, as a result, no eBPF program is
>>> attached to the sockmap.
>>> 2. In SENDPAGE, a wrong iov_length in test_send_large may result in some
>>> test skippings and failures.
>>> 3. The calculation of total_bytes in msg_loop_rx is wrong, which may 
>>> cause
>>> msg_loop_rx end early and skip some data tests.
>>>
>>> Besides, for msg_verify_data, I added push/pop checking logic to 
>>> function
>>> msg_verify_data and added more tests for different cases.
>>
>> Thanks! Yep I think push/pop are not widely used anywhere unfortunately.
>> There are some interesting uses for push/pop to add/edit headers, but
>> I've not gotten there yet clearly.
>>

Thanks for the reviewing :)

>>> After that, I found that there are some bugs in bpf_msg_push_data,
>>> bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason
>>> why they have not been exposed is that because of the above problems, 
>>> they
>>> will not be triggered.
>>
>> Good. I'll review these quickly tonight/tomorrow and run some testing.
>> We don't currently have any longer running tests with push/pop.
> 
> Looks like the series needs a rebase to latest bpf tree.
> 
> Thanks,
> Daniel

This series depends on my previous fixes to test_sockmap("Two fixes for
test_sockmap"), and they were merged to bpf/bpf-next.git (net branch) a
week ago. Shall I wait for merging of them to the latest bpf, and then
rebase?

Thanks,
Zijian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap
  2024-10-24 17:56     ` Zijian Zhang
@ 2024-10-24 18:12       ` Daniel Borkmann
  2024-10-24 20:48         ` Zijian Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Borkmann @ 2024-10-24 18:12 UTC (permalink / raw)
  To: Zijian Zhang, John Fastabend, bpf
  Cc: martin.lau, ast, andrii, eddyz87, song, yonghong.song, kpsingh,
	sdf, haoluo, jolsa, davem, edumazet, kuba, pabeni, mykolal, shuah,
	jakub, liujian56, cong.wang

On 10/24/24 7:56 PM, Zijian Zhang wrote:
> On 10/24/24 7:43 AM, Daniel Borkmann wrote:
>> On 10/24/24 6:06 AM, John Fastabend wrote:
>>> zijianzhang@ wrote:
>>>> From: Zijian Zhang <zijianzhang@bytedance.com>
>>>>
>>>> Several fixes to test_sockmap and added push/pop logic for msg_verify_data
>>>> Before the fixes, some of the tests in test_sockmap are problematic,
>>>> resulting in pseudo-correct result.
>>>>
>>>> 1. txmsg_pass is not set in some tests, as a result, no eBPF program is
>>>> attached to the sockmap.
>>>> 2. In SENDPAGE, a wrong iov_length in test_send_large may result in some
>>>> test skippings and failures.
>>>> 3. The calculation of total_bytes in msg_loop_rx is wrong, which may cause
>>>> msg_loop_rx end early and skip some data tests.
>>>>
>>>> Besides, for msg_verify_data, I added push/pop checking logic to function
>>>> msg_verify_data and added more tests for different cases.
>>>
>>> Thanks! Yep I think push/pop are not widely used anywhere unfortunately.
>>> There are some interesting uses for push/pop to add/edit headers, but
>>> I've not gotten there yet clearly.
> 
> Thanks for the reviewing :)
> 
>>>> After that, I found that there are some bugs in bpf_msg_push_data,
>>>> bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason
>>>> why they have not been exposed is that because of the above problems, they
>>>> will not be triggered.
>>>
>>> Good. I'll review these quickly tonight/tomorrow and run some testing.
>>> We don't currently have any longer running tests with push/pop.
>>
>> Looks like the series needs a rebase to latest bpf tree.
>>
>> Thanks,
>> Daniel
> 
> This series depends on my previous fixes to test_sockmap("Two fixes for
> test_sockmap"), and they were merged to bpf/bpf-next.git (net branch) a
> week ago. Shall I wait for merging of them to the latest bpf, and then
> rebase?

Then this series also needs to be based against bpf-next, net branch (along
with PATCH bpf-next in $subj) so that the CI can pick it up.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap
  2024-10-24 18:12       ` Daniel Borkmann
@ 2024-10-24 20:48         ` Zijian Zhang
  0 siblings, 0 replies; 22+ messages in thread
From: Zijian Zhang @ 2024-10-24 20:48 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend, bpf
  Cc: martin.lau, ast, andrii, eddyz87, song, yonghong.song, kpsingh,
	sdf, haoluo, jolsa, davem, edumazet, kuba, pabeni, mykolal, shuah,
	jakub, liujian56, cong.wang

On 10/24/24 11:12 AM, Daniel Borkmann wrote:
>> This series depends on my previous fixes to test_sockmap("Two fixes for
>> test_sockmap"), and they were merged to bpf/bpf-next.git (net branch) a
>> week ago. Shall I wait for merging of them to the latest bpf, and then
>> rebase?
> 
> Then this series also needs to be based against bpf-next, net branch (along
> with PATCH bpf-next in $subj) so that the CI can pick it up.
> 
> Thanks,
> Daniel

I tried bpf-next, and found I still could not pass the apply test.
Then, I sent another series with bpf-next/next in $subj, and it also
failed, but the CI is running somehow. Not sure what is the right way to
target bpf-next, net branch?

Apologize for the flood of emails. Could you help me change the state of
the wrong patch series, so that it won't bother others?

Thanks,
Zijian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf 5/8] selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap
  2024-10-20 11:03 ` [PATCH bpf 5/8] selftests/bpf: Add more tests for test_txmsg_push_pop " zijianzhang
@ 2024-10-25  5:00   ` John Fastabend
  0 siblings, 0 replies; 22+ messages in thread
From: John Fastabend @ 2024-10-25  5:00 UTC (permalink / raw)
  To: zijianzhang, bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> Add more tests for test_txmsg_push_pop in test_sockmap for stricter tests.
> 
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> ---

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf 7/8] bpf, sockmap: Several fixes to bpf_msg_pop_data
  2024-10-20 11:03 ` [PATCH bpf 7/8] bpf, sockmap: Several fixes to bpf_msg_pop_data zijianzhang
@ 2024-10-25  5:20   ` John Fastabend
  0 siblings, 0 replies; 22+ messages in thread
From: John Fastabend @ 2024-10-25  5:20 UTC (permalink / raw)
  To: zijianzhang, bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> Several fixes to bpf_msg_pop_data,
> 1. In sk_msg_shift_left, we should put_page
> 2. if (len == 0), return early is better
> 3. pop the entire sk_msg (last == msg->sg.size) should be supported
> 4. Fix for the value of variable a and sge->length
> 5. In sk_msg_shift_left, after shifting, i has already pointed to the next
> element. Addtional sk_msg_iter_var_next may result in BUG.
> 
> Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages")
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> ---

Thanks!

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf 8/8] bpf, sockmap: Fix sk_msg_reset_curr
  2024-10-20 11:03 ` [PATCH bpf 8/8] bpf, sockmap: Fix sk_msg_reset_curr zijianzhang
@ 2024-10-26  5:05   ` John Fastabend
  2024-10-26  6:17     ` Zijian Zhang
  0 siblings, 1 reply; 22+ messages in thread
From: John Fastabend @ 2024-10-26  5:05 UTC (permalink / raw)
  To: zijianzhang, bpf
  Cc: martin.lau, daniel, john.fastabend, ast, andrii, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
	pabeni, mykolal, shuah, jakub, liujian56, zijianzhang, cong.wang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> Found in the test_txmsg_pull in test_sockmap,
> ```
> txmsg_cork = 512;
> opt->iov_length = 3;
> opt->iov_count = 1;
> opt->rate = 512;
> ```
> The first sendmsg will send an sk_msg with size 3, and bpf_msg_pull_data
> will be invoked the first time. sk_msg_reset_curr will reset the copybreak
> from 3 to 0, then the second sendmsg will write into copybreak starting at
> 0 which overwrites the first sendmsg. The same problem happens in push and
> pop test. Thus, fix sk_msg_reset_curr to restore the correct copybreak.
> 
> Fixes: bb9aefde5bba ("bpf: sockmap, updating the sg structure should also update curr")
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>

Hi Zijian, question on below.

> ---
>  net/core/filter.c | 20 +++++++++-----------
>  1 file changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 8e1a8a8d8d55..b725d3a2fdb8 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2619,18 +2619,16 @@ BPF_CALL_2(bpf_msg_cork_bytes, struct sk_msg *, msg, u32, bytes)
>  

I find push_data a bit easier to think through so allow me to walk
through a push example.

If we setup so that curr=0 and copybreak=3 then call

 push_data(skmsg, 2, 2);

When we get to the sk_msg_reset_curr we should have a layout,

  msg->sg.data[0] = length(2) equal to original [0,2]
  msg->sg.data[1] = length(2)
  msg->sg.data[2] = legnth(1) equal to original [3] 

The current before the reset curr will be,

 curr = 1
 copybreak = 3

>  static void sk_msg_reset_curr(struct sk_msg *msg)
>  {
> -	u32 i = msg->sg.start;
> -	u32 len = 0;
> -

with above context i = 0

> -	do {
> -		len += sk_msg_elem(msg, i)->length;
> -		sk_msg_iter_var_next(i);
> -		if (len >= msg->sg.size)
> -			break;
> -	} while (i != msg->sg.end);

When we exit loop,

  i = 3
  len = 5
  
  msg->sg.curr = 3
  msg->sg.copybreak = 0

So we zero the copy break and set curr = 3. The next send
should happen over sg.curr=3? What did I miss?

> +	if (!msg->sg.size) {
> +		msg->sg.curr = msg->sg.start;
> +		msg->sg.copybreak = 0;
> +	} else {
> +		u32 i = msg->sg.end;
>  
> -	msg->sg.curr = i;
> -	msg->sg.copybreak = 0;
> +		sk_msg_iter_var_prev(i);

With this curr will always point to the end-1 but I'm not sure this can
handle the case where we have done sk_msg_alloc() so we have start/end
setup. And then on a copy fault for example we might have curr pointing
somewhere in the middle of that. I think I will need to construct the
example, but I believe this is originally why the 'i' is discovered
by sg walk vs simpler end.

> +		msg->sg.curr = i;
> +		msg->sg.copybreak = msg->sg.data[i].length;

This does seem more accurate then simply zero'ing out the copybreak
which is a good thing.

> +	}
>  }
>  
>  static const struct bpf_func_proto bpf_msg_cork_bytes_proto = {
> -- 
> 2.20.1
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH bpf 8/8] bpf, sockmap: Fix sk_msg_reset_curr
  2024-10-26  5:05   ` John Fastabend
@ 2024-10-26  6:17     ` Zijian Zhang
  0 siblings, 0 replies; 22+ messages in thread
From: Zijian Zhang @ 2024-10-26  6:17 UTC (permalink / raw)
  To: John Fastabend, bpf
  Cc: martin.lau, daniel, ast, andrii, eddyz87, song, yonghong.song,
	kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba, pabeni,
	mykolal, shuah, jakub, liujian56, cong.wang

On 10/25/24 10:05 PM, John Fastabend wrote:
> zijianzhang@ wrote:
>> From: Zijian Zhang <zijianzhang@bytedance.com>
>>
>> Found in the test_txmsg_pull in test_sockmap,
>> ```
>> txmsg_cork = 512;
>> opt->iov_length = 3;
>> opt->iov_count = 1;
>> opt->rate = 512;
>> ```
>> The first sendmsg will send an sk_msg with size 3, and bpf_msg_pull_data
>> will be invoked the first time. sk_msg_reset_curr will reset the copybreak
>> from 3 to 0, then the second sendmsg will write into copybreak starting at
>> 0 which overwrites the first sendmsg. The same problem happens in push and
>> pop test. Thus, fix sk_msg_reset_curr to restore the correct copybreak.
>>
>> Fixes: bb9aefde5bba ("bpf: sockmap, updating the sg structure should also update curr")
>> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> 
> Hi Zijian, question on below.
>  
> 
> I find push_data a bit easier to think through so allow me to walk
> through a push example.
> 
> If we setup so that curr=0 and copybreak=3 then call
> 
>   push_data(skmsg, 2, 2);
> 
> When we get to the sk_msg_reset_curr we should have a layout,
> 
>    msg->sg.data[0] = length(2) equal to original [0,2]
>    msg->sg.data[1] = length(2)
>    msg->sg.data[2] = legnth(1) equal to original [3]
> 
> The current before the reset curr will be,
> 
>   curr = 1
>   copybreak = 3
> 
>>   static void sk_msg_reset_curr(struct sk_msg *msg)
>>   {
>> -	u32 i = msg->sg.start;
>> -	u32 len = 0;
>> -
> 
> with above context i = 0
> 
>> -	do {
>> -		len += sk_msg_elem(msg, i)->length;
>> -		sk_msg_iter_var_next(i);
>> -		if (len >= msg->sg.size)
>> -			break;
>> -	} while (i != msg->sg.end);
> 
> When we exit loop,
> 
>    i = 3
>    len = 5
>    
>    msg->sg.curr = 3
>    msg->sg.copybreak = 0
> 
> So we zero the copy break and set curr = 3. The next send
> should happen over sg.curr=3? What did I miss?
> 

That's true, for common cases without corking, it should work well.
For this fix, we have to consider cork at the same time.

I still use pull here for simplicity, for example,
```
// user program
txmsg_cork = 8;
opt->iov_length = 3;
opt->iov_count = 1;
opt->rate = 3;

// eBPF program
pull_data(sk_msg, 0, 1);
```
In the first sendmsg,
pull_data will be invoked and reset msg->sg.copybreak from 3 to 0,
msg->sg.curr to 0, which is the same. However, because of the corking,
the data will not be sent out, and it is stored in the psock->cork.

In the second sendmsg,
since we are in the stage of corking, psock->cork will be reused in func
sk_msg_alloc. msg->sg.copybreak is 0 now, the second msg will overwrite
the first msg. As a result, we could not pass the data integrity test.

push + cork and pop + cork may have the same problem, with the same
reason that corking may have different sendmsgs reuse the same skmsg.

>> +	if (!msg->sg.size) {
>> +		msg->sg.curr = msg->sg.start;
>> +		msg->sg.copybreak = 0;
>> +	} else {
>> +		u32 i = msg->sg.end;
>>   
>> -	msg->sg.curr = i;
>> -	msg->sg.copybreak = 0;
>> +		sk_msg_iter_var_prev(i);
> 
> With this curr will always point to the end-1 but I'm not sure this can
> handle the case where we have done sk_msg_alloc() so we have start/end
> setup. And then on a copy fault for example we might have curr pointing
> somewhere in the middle of that. I think I will need to construct the
> example, but I believe this is originally why the 'i' is discovered
> by sg walk vs simpler end.
> 

Good point! I am not sure if corking + pull/push/pop are common cases.
I guess they are not? If we take these into account, then instead of
resetting, we may need to carefully maintain the curr and copybreak when
we shift the sgs?

>> +		msg->sg.curr = i;
>> +		msg->sg.copybreak = msg->sg.data[i].length;
> 
> This does seem more accurate then simply zero'ing out the copybreak
> which is a good thing.
> 
>> +	}
>>   }
>>   
>>   static const struct bpf_func_proto bpf_msg_cork_bytes_proto = {
>> -- 
>> 2.20.1
>>
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-10-26  6:17 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-20 11:03 [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap zijianzhang
2024-10-20 11:03 ` [PATCH bpf 1/8] selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap zijianzhang
2024-10-24  4:06   ` John Fastabend
2024-10-20 11:03 ` [PATCH bpf 2/8] selftests/bpf: Fix SENDPAGE data logic " zijianzhang
2024-10-24  4:47   ` John Fastabend
2024-10-20 11:03 ` [PATCH bpf 3/8] selftests/bpf: Fix total_bytes in msg_loop_rx " zijianzhang
2024-10-24  4:48   ` John Fastabend
2024-10-20 11:03 ` [PATCH bpf 4/8] selftests/bpf: Add push/pop checking for msg_verify_data " zijianzhang
2024-10-24  5:20   ` John Fastabend
2024-10-20 11:03 ` [PATCH bpf 5/8] selftests/bpf: Add more tests for test_txmsg_push_pop " zijianzhang
2024-10-25  5:00   ` John Fastabend
2024-10-20 11:03 ` [PATCH bpf 6/8] bpf, sockmap: Several fixes to bpf_msg_push_data zijianzhang
2024-10-20 11:03 ` [PATCH bpf 7/8] bpf, sockmap: Several fixes to bpf_msg_pop_data zijianzhang
2024-10-25  5:20   ` John Fastabend
2024-10-20 11:03 ` [PATCH bpf 8/8] bpf, sockmap: Fix sk_msg_reset_curr zijianzhang
2024-10-26  5:05   ` John Fastabend
2024-10-26  6:17     ` Zijian Zhang
2024-10-24  4:06 ` [PATCH bpf 0/8] Fixes to bpf_msg_push/pop_data and test_sockmap John Fastabend
2024-10-24 14:43   ` Daniel Borkmann
2024-10-24 17:56     ` Zijian Zhang
2024-10-24 18:12       ` Daniel Borkmann
2024-10-24 20:48         ` Zijian Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox