[PATCH bpf] bpf/selftests: fix test_tcpnotify

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf] bpf/selftests: fix test_tcpnotify_user
@ 2025-08-15 12:12 Matt Bobrowski
  2025-08-15 15:44 ` Stanislav Fomichev
  2025-08-15 20:10 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 5+ messages in thread
From: Matt Bobrowski @ 2025-08-15 12:12 UTC (permalink / raw)
  To: bpf; +Cc: ast, daniel, andrii, eddyz87, mykolal

Based on a bisect, it appears that commit 7ee988770326 ("timers:
Implement the hierarchical pull model") has somehow inadvertently
broken BPF selftest test_tcpnotify_user. The error that is being
generated by this test is as follows:

	FAILED: Wrong stats Expected 10 calls, got 8

It looks like the change allows timer functions to be run on CPUs
different from the one they are armed on. The test had pinned itself
to CPU 0, and in the past the retransmit attempts also occurred on CPU
0. The test had set the max_entries attribute for
BPF_MAP_TYPE_PERF_EVENT_ARRAY to 2 and was calling
bpf_perf_event_output() with BPF_F_CURRENT_CPU, so the entry was
likely to be in range. With the change to allow timers to run on other
CPUs, the current CPU tasked with performing the retransmit might be
bumped and in turn fall out of range, as the event will be filtered
out via __bpf_perf_event_output() using:

    if (unlikely(index >= array->map.max_entries))
            return -E2BIG;

A possible change would be to explicitly set the max_entries attribute
for perf_event_map in test_tcpnotify_kern.c to a value that's at least
as large as the number of CPUs. As it turns out however, if the field
is left unset, then the BPF selftest library will determine the number
of CPUs available on the underlying system and update the max_entries
attribute accordingly.

A further problem with the test is that it has a thread that continues
running up until the program exits. The main thread cleans up some
LIBBPF data structures, while the other thread continues to use them,
which inevitably will trigger a SIGSEGV. This can be dealt with by
telling the thread to run for as long as necessary and doing a
pthread_join on it before exiting the program.

Finally, I don't think binding the process to CPU 0 is meaningful for
this test any more, so get rid of that.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
---
 .../selftests/bpf/progs/test_tcpnotify_kern.c |  1 -
 .../selftests/bpf/test_tcpnotify_user.c       | 20 +++++++++----------
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/test_tcpnotify_kern.c b/tools/testing/selftests/bpf/progs/test_tcpnotify_kern.c
index 540181c115a8..ef00d38b0a8d 100644
--- a/tools/testing/selftests/bpf/progs/test_tcpnotify_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcpnotify_kern.c
@@ -23,7 +23,6 @@ struct {

 struct {
 	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
-	__uint(max_entries, 2);
 	__type(key, int);
 	__type(value, __u32);
 } perf_event_map SEC(".maps");
diff --git a/tools/testing/selftests/bpf/test_tcpnotify_user.c b/tools/testing/selftests/bpf/test_tcpnotify_user.c
index 595194453ff8..35b4893ccdf8 100644
--- a/tools/testing/selftests/bpf/test_tcpnotify_user.c
+++ b/tools/testing/selftests/bpf/test_tcpnotify_user.c
@@ -15,20 +15,18 @@
 #include <bpf/libbpf.h>
 #include <sys/ioctl.h>
 #include <linux/rtnetlink.h>
-#include <signal.h>
 #include <linux/perf_event.h>
-#include <linux/err.h>

-#include "bpf_util.h"
 #include "cgroup_helpers.h"

 #include "test_tcpnotify.h"
-#include "trace_helpers.h"
 #include "testing_helpers.h"

 #define SOCKET_BUFFER_SIZE (getpagesize() < 8192L ? getpagesize() : 8192L)

 pthread_t tid;
+static bool exit_thread;
+
 int rx_callbacks;

 static void dummyfn(void *ctx, int cpu, void *data, __u32 size)
@@ -45,7 +43,7 @@ void tcp_notifier_poller(struct perf_buffer *pb)
 {
 	int err;

-	while (1) {
+	while (!exit_thread) {
 		err = perf_buffer__poll(pb, 100);
 		if (err < 0 && err != -EINTR) {
 			printf("failed perf_buffer__poll: %d\n", err);
@@ -78,15 +76,10 @@ int main(int argc, char **argv)
 	int error = EXIT_FAILURE;
 	struct bpf_object *obj;
 	char test_script[80];
-	cpu_set_t cpuset;
 	__u32 key = 0;

 	libbpf_set_strict_mode(LIBBPF_STRICT_ALL);

-	CPU_ZERO(&cpuset);
-	CPU_SET(0, &cpuset);
-	pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
-
 	cg_fd = cgroup_setup_and_join(cg_path);
 	if (cg_fd < 0)
 		goto err;
@@ -151,6 +144,13 @@ int main(int argc, char **argv)

 	sleep(10);

+	exit_thread = true;
+	int ret = pthread_join(tid, NULL);
+	if (ret) {
+		printf("FAILED: pthread_join\n");
+		goto err;
+	}
+
 	if (verify_result(&g)) {
 		printf("FAILED: Wrong stats Expected %d calls, got %d\n",
 			g.ncalls, rx_callbacks);
-- 
2.51.0.rc1.167.g924127e9c0-goog

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf/selftests: fix test_tcpnotify_user
  2025-08-15 12:12 [PATCH bpf] bpf/selftests: fix test_tcpnotify_user Matt Bobrowski
@ 2025-08-15 15:44 ` Stanislav Fomichev
  2025-08-15 20:17   ` Martin KaFai Lau
  2025-08-15 20:10 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 5+ messages in thread
From: Stanislav Fomichev @ 2025-08-15 15:44 UTC (permalink / raw)
  To: Matt Bobrowski; +Cc: bpf, ast, daniel, andrii, eddyz87, mykolal

On 08/15, Matt Bobrowski wrote:
> Based on a bisect, it appears that commit 7ee988770326 ("timers:
> Implement the hierarchical pull model") has somehow inadvertently
> broken BPF selftest test_tcpnotify_user. The error that is being
> generated by this test is as follows:
> 
> 	FAILED: Wrong stats Expected 10 calls, got 8
> 
> It looks like the change allows timer functions to be run on CPUs
> different from the one they are armed on. The test had pinned itself
> to CPU 0, and in the past the retransmit attempts also occurred on CPU
> 0. The test had set the max_entries attribute for
> BPF_MAP_TYPE_PERF_EVENT_ARRAY to 2 and was calling
> bpf_perf_event_output() with BPF_F_CURRENT_CPU, so the entry was
> likely to be in range. With the change to allow timers to run on other
> CPUs, the current CPU tasked with performing the retransmit might be
> bumped and in turn fall out of range, as the event will be filtered
> out via __bpf_perf_event_output() using:
> 
>     if (unlikely(index >= array->map.max_entries))
>             return -E2BIG;

[..]

> A possible change would be to explicitly set the max_entries attribute
> for perf_event_map in test_tcpnotify_kern.c to a value that's at least
> as large as the number of CPUs. As it turns out however, if the field
> is left unset, then the BPF selftest library will determine the number
> of CPUs available on the underlying system and update the max_entries
> attribute accordingly.

nit: the max_entries is set by libbpf in map_set_def_max_entries. 'BPF
selftest library' seems a bit vague. But not a reason for respin.
 
> A further problem with the test is that it has a thread that continues
> running up until the program exits. The main thread cleans up some
> LIBBPF data structures, while the other thread continues to use them,
> which inevitably will trigger a SIGSEGV. This can be dealt with by
> telling the thread to run for as long as necessary and doing a
> pthread_join on it before exiting the program.
> 
> Finally, I don't think binding the process to CPU 0 is meaningful for
> this test any more, so get rid of that.
> 
> Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf/selftests: fix test_tcpnotify_user
  2025-08-15 12:12 [PATCH bpf] bpf/selftests: fix test_tcpnotify_user Matt Bobrowski
  2025-08-15 15:44 ` Stanislav Fomichev
@ 2025-08-15 20:10 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-15 20:10 UTC (permalink / raw)
  To: Matt Bobrowski; +Cc: bpf, ast, daniel, andrii, eddyz87, mykolal

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:

On Fri, 15 Aug 2025 12:12:14 +0000 you wrote:
> Based on a bisect, it appears that commit 7ee988770326 ("timers:
> Implement the hierarchical pull model") has somehow inadvertently
> broken BPF selftest test_tcpnotify_user. The error that is being
> generated by this test is as follows:
> 
> 	FAILED: Wrong stats Expected 10 calls, got 8
> 
> [...]

Here is the summary with links:
  - [bpf] bpf/selftests: fix test_tcpnotify_user
    https://git.kernel.org/bpf/bpf-next/c/c80d79720647

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf/selftests: fix test_tcpnotify_user
  2025-08-15 15:44 ` Stanislav Fomichev
@ 2025-08-15 20:17   ` Martin KaFai Lau
  2025-08-18  6:43     ` Matt Bobrowski
  0 siblings, 1 reply; 5+ messages in thread
From: Martin KaFai Lau @ 2025-08-15 20:17 UTC (permalink / raw)
  To: Stanislav Fomichev, Matt Bobrowski
  Cc: bpf, ast, daniel, andrii, eddyz87, mykolal

On 8/15/25 8:44 AM, Stanislav Fomichev wrote:
> On 08/15, Matt Bobrowski wrote:
>> Based on a bisect, it appears that commit 7ee988770326 ("timers:
>> Implement the hierarchical pull model") has somehow inadvertently
>> broken BPF selftest test_tcpnotify_user. The error that is being
>> generated by this test is as follows:
>>
>> 	FAILED: Wrong stats Expected 10 calls, got 8
>>
>> It looks like the change allows timer functions to be run on CPUs
>> different from the one they are armed on. The test had pinned itself
>> to CPU 0, and in the past the retransmit attempts also occurred on CPU
>> 0. The test had set the max_entries attribute for
>> BPF_MAP_TYPE_PERF_EVENT_ARRAY to 2 and was calling
>> bpf_perf_event_output() with BPF_F_CURRENT_CPU, so the entry was
>> likely to be in range. With the change to allow timers to run on other
>> CPUs, the current CPU tasked with performing the retransmit might be
>> bumped and in turn fall out of range, as the event will be filtered
>> out via __bpf_perf_event_output() using:
>>
>>      if (unlikely(index >= array->map.max_entries))
>>              return -E2BIG;
> 
> [..]
> 
>> A possible change would be to explicitly set the max_entries attribute
>> for perf_event_map in test_tcpnotify_kern.c to a value that's at least
>> as large as the number of CPUs. As it turns out however, if the field
>> is left unset, then the BPF selftest library will determine the number
>> of CPUs available on the underlying system and update the max_entries
>> attribute accordingly.
> 
> nit: the max_entries is set by libbpf in map_set_def_max_entries. 'BPF
> selftest library' seems a bit vague. But not a reason for respin.

Fixed the commit message. Thanks. Applied.

>   
>> A further problem with the test is that it has a thread that continues
>> running up until the program exits. The main thread cleans up some
>> LIBBPF data structures, while the other thread continues to use them,
>> which inevitably will trigger a SIGSEGV. This can be dealt with by
>> telling the thread to run for as long as necessary and doing a
>> pthread_join on it before exiting the program.

Some of the "goto err" seems to have similar problem but ok-ish as long as the 
iptables runs fine. I didn't look why the test needs to start a thread at all, 
so I leave it as is. The CI is not running this test. The test is getting rotten 
overall. It should be moved to test_progs. Probably as a subtest in some of the 
existing sockops test in test_progs.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf] bpf/selftests: fix test_tcpnotify_user
  2025-08-15 20:17   ` Martin KaFai Lau
@ 2025-08-18  6:43     ` Matt Bobrowski
  0 siblings, 0 replies; 5+ messages in thread
From: Matt Bobrowski @ 2025-08-18  6:43 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Stanislav Fomichev, bpf, ast, daniel, andrii, eddyz87, mykolal

On Fri, Aug 15, 2025 at 01:17:39PM -0700, Martin KaFai Lau wrote:
> On 8/15/25 8:44 AM, Stanislav Fomichev wrote:
> > On 08/15, Matt Bobrowski wrote:
> > > Based on a bisect, it appears that commit 7ee988770326 ("timers:
> > > Implement the hierarchical pull model") has somehow inadvertently
> > > broken BPF selftest test_tcpnotify_user. The error that is being
> > > generated by this test is as follows:
> > > 
> > > 	FAILED: Wrong stats Expected 10 calls, got 8
> > > 
> > > It looks like the change allows timer functions to be run on CPUs
> > > different from the one they are armed on. The test had pinned itself
> > > to CPU 0, and in the past the retransmit attempts also occurred on CPU
> > > 0. The test had set the max_entries attribute for
> > > BPF_MAP_TYPE_PERF_EVENT_ARRAY to 2 and was calling
> > > bpf_perf_event_output() with BPF_F_CURRENT_CPU, so the entry was
> > > likely to be in range. With the change to allow timers to run on other
> > > CPUs, the current CPU tasked with performing the retransmit might be
> > > bumped and in turn fall out of range, as the event will be filtered
> > > out via __bpf_perf_event_output() using:
> > > 
> > >      if (unlikely(index >= array->map.max_entries))
> > >              return -E2BIG;
> > 
> > [..]
> > 
> > > A possible change would be to explicitly set the max_entries attribute
> > > for perf_event_map in test_tcpnotify_kern.c to a value that's at least
> > > as large as the number of CPUs. As it turns out however, if the field
> > > is left unset, then the BPF selftest library will determine the number
> > > of CPUs available on the underlying system and update the max_entries
> > > attribute accordingly.
> > 
> > nit: the max_entries is set by libbpf in map_set_def_max_entries. 'BPF
> > selftest library' seems a bit vague. But not a reason for respin.
> 
> Fixed the commit message. Thanks. Applied.

ACK. Apologies about the oversimplification there.

> > > A further problem with the test is that it has a thread that continues
> > > running up until the program exits. The main thread cleans up some
> > > LIBBPF data structures, while the other thread continues to use them,
> > > which inevitably will trigger a SIGSEGV. This can be dealt with by
> > > telling the thread to run for as long as necessary and doing a
> > > pthread_join on it before exiting the program.
> 
> Some of the "goto err" seems to have similar problem but ok-ish as long as
> the iptables runs fine. I didn't look why the test needs to start a thread
> at all, so I leave it as is. The CI is not running this test. The test is
> getting rotten overall. It should be moved to test_progs. Probably as a
> subtest in some of the existing sockops test in test_progs.

Agree, this test is archaic and needs should be converted or folded
such that it runs under test_progs. If I have time, I'll perform this
clean up.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-08-18  6:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15 12:12 [PATCH bpf] bpf/selftests: fix test_tcpnotify_user Matt Bobrowski
2025-08-15 15:44 ` Stanislav Fomichev
2025-08-15 20:17   ` Martin KaFai Lau
2025-08-18  6:43     ` Matt Bobrowski
2025-08-15 20:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).