[PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit
@ 2026-02-24  1:58 Ihor Solodrai
  2026-02-24 15:23 ` Mykyta Yatsenko
  2026-02-24 18:09 ` Amery Hung
  0 siblings, 2 replies; 5+ messages in thread
From: Ihor Solodrai @ 2026-02-24  1:58 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Amery Hung
  Cc: bpf, kernel-team

The test_sys_enter_exit test was setting target_pid before attaching
the BPF programs, which causes syscalls made during the attach phase
to be counted. This is flaky because, apparently, there is no
guarantee that both on_enter and on_exit will trigger during the
attachment.

Move the target_pid assignment to after task_local_storage__attach()
so that only explicit sys_gettid() calls are counted.

Reported-by: BPF CI Bot (Claude Opus 4.6) <bot+bpf-ci@kernel.org>
Closes: https://github.com/kernel-patches/vmtest/issues/448
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>

---

I've been experimenting with running AI on BPF CI to investigate test
failures. This is an example of a thing it may come up with.

I don't want to spam the list with these, so for starters I'll be
relaying only patches that I evaluated and/or tested.

The AI generated reports will be posted with "[bpf-ci-bot]" prefix
here: https://github.com/kernel-patches/vmtest/issues

The goal of this particular application of AI is to make BPF CI more
stable, less noisy/flaky, and potentially find and fix more kernel
bugs. We'll see how it goes.

---
 .../selftests/bpf/prog_tests/task_local_storage.c  | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
index 7bee33797c71..2820a604aaa6 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
@@ -25,24 +25,28 @@
 static void test_sys_enter_exit(void)
 {
 	struct task_local_storage *skel;
+	pid_t pid = sys_gettid();
 	int err;
 
 	skel = task_local_storage__open_and_load();
 	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
 		return;
 
-	skel->bss->target_pid = sys_gettid();
-
 	err = task_local_storage__attach(skel);
 	if (!ASSERT_OK(err, "skel_attach"))
 		goto out;
 
+	/* Set target_pid after attach so that syscalls made during
+	 * attach are not counted.
+	 */
+	skel->bss->target_pid = pid;
+
 	sys_gettid();
 	sys_gettid();
 
-	/* 3x syscalls: 1x attach and 2x gettid */
-	ASSERT_EQ(skel->bss->enter_cnt, 3, "enter_cnt");
-	ASSERT_EQ(skel->bss->exit_cnt, 3, "exit_cnt");
+	/* 2x gettid syscalls */
+	ASSERT_EQ(skel->bss->enter_cnt, 2, "enter_cnt");
+	ASSERT_EQ(skel->bss->exit_cnt, 2, "exit_cnt");
 	ASSERT_EQ(skel->bss->mismatch_cnt, 0, "mismatch_cnt");
 out:
 	task_local_storage__destroy(skel);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit
  2026-02-24  1:58 [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit Ihor Solodrai
@ 2026-02-24 15:23 ` Mykyta Yatsenko
  2026-02-24 16:34   ` Ihor Solodrai
  2026-02-24 18:09 ` Amery Hung
  1 sibling, 1 reply; 5+ messages in thread
From: Mykyta Yatsenko @ 2026-02-24 15:23 UTC (permalink / raw)
  To: Ihor Solodrai, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Amery Hung
  Cc: bpf, kernel-team

Ihor Solodrai <ihor.solodrai@linux.dev> writes:

> The test_sys_enter_exit test was setting target_pid before attaching
> the BPF programs, which causes syscalls made during the attach phase
> to be counted. This is flaky because, apparently, there is no
> guarantee that both on_enter and on_exit will trigger during the
> attachment.
>
> Move the target_pid assignment to after task_local_storage__attach()
> so that only explicit sys_gettid() calls are counted.
>
> Reported-by: BPF CI Bot (Claude Opus 4.6) <bot+bpf-ci@kernel.org>
> Closes: https://github.com/kernel-patches/vmtest/issues/448
> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>
> ---
>
> I've been experimenting with running AI on BPF CI to investigate test
> failures. This is an example of a thing it may come up with.
>
> I don't want to spam the list with these, so for starters I'll be
> relaying only patches that I evaluated and/or tested.
>
> The AI generated reports will be posted with "[bpf-ci-bot]" prefix
> here: https://github.com/kernel-patches/vmtest/issues
>
> The goal of this particular application of AI is to make BPF CI more
> stable, less noisy/flaky, and potentially find and fix more kernel
> bugs. We'll see how it goes.
>
> ---
>  .../selftests/bpf/prog_tests/task_local_storage.c  | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> index 7bee33797c71..2820a604aaa6 100644
> --- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> @@ -25,24 +25,28 @@
>  static void test_sys_enter_exit(void)
>  {
>  	struct task_local_storage *skel;
> +	pid_t pid = sys_gettid();
>  	int err;
>  
>  	skel = task_local_storage__open_and_load();
>  	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
>  		return;
>  
> -	skel->bss->target_pid = sys_gettid();
> -
>  	err = task_local_storage__attach(skel);
>  	if (!ASSERT_OK(err, "skel_attach"))
>  		goto out;
>  
> +	/* Set target_pid after attach so that syscalls made during
> +	 * attach are not counted.
> +	 */
> +	skel->bss->target_pid = pid;
> +
>  	sys_gettid();
>  	sys_gettid();
Maybe a simpler and less fragile fix would be to add syscall number
filter in the BPF program, so that we don't count those unexpected
syscalls.
>  
> -	/* 3x syscalls: 1x attach and 2x gettid */
> -	ASSERT_EQ(skel->bss->enter_cnt, 3, "enter_cnt");
> -	ASSERT_EQ(skel->bss->exit_cnt, 3, "exit_cnt");
> +	/* 2x gettid syscalls */
> +	ASSERT_EQ(skel->bss->enter_cnt, 2, "enter_cnt");
> +	ASSERT_EQ(skel->bss->exit_cnt, 2, "exit_cnt");
>  	ASSERT_EQ(skel->bss->mismatch_cnt, 0, "mismatch_cnt");
>  out:
>  	task_local_storage__destroy(skel);
> -- 
> 2.53.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit
  2026-02-24 15:23 ` Mykyta Yatsenko
@ 2026-02-24 16:34   ` Ihor Solodrai
  2026-02-24 17:02     ` Mykyta Yatsenko
  0 siblings, 1 reply; 5+ messages in thread
From: Ihor Solodrai @ 2026-02-24 16:34 UTC (permalink / raw)
  To: Mykyta Yatsenko, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Amery Hung
  Cc: bpf, kernel-team

On 2/24/26 7:23 AM, Mykyta Yatsenko wrote:
> Ihor Solodrai <ihor.solodrai@linux.dev> writes:
> 
>> The test_sys_enter_exit test was setting target_pid before attaching
>> the BPF programs, which causes syscalls made during the attach phase
>> to be counted. This is flaky because, apparently, there is no
>> guarantee that both on_enter and on_exit will trigger during the
>> attachment.
>>
>> Move the target_pid assignment to after task_local_storage__attach()
>> so that only explicit sys_gettid() calls are counted.
>>
>> Reported-by: BPF CI Bot (Claude Opus 4.6) <bot+bpf-ci@kernel.org>
>> Closes: https://github.com/kernel-patches/vmtest/issues/448
>> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>>
>> ---
>>
>> I've been experimenting with running AI on BPF CI to investigate test
>> failures. This is an example of a thing it may come up with.
>>
>> I don't want to spam the list with these, so for starters I'll be
>> relaying only patches that I evaluated and/or tested.
>>
>> The AI generated reports will be posted with "[bpf-ci-bot]" prefix
>> here: https://github.com/kernel-patches/vmtest/issues
>>
>> The goal of this particular application of AI is to make BPF CI more
>> stable, less noisy/flaky, and potentially find and fix more kernel
>> bugs. We'll see how it goes.
>>
>> ---
>>  .../selftests/bpf/prog_tests/task_local_storage.c  | 14 +++++++++-----
>>  1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
>> index 7bee33797c71..2820a604aaa6 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
>> @@ -25,24 +25,28 @@
>>  static void test_sys_enter_exit(void)
>>  {
>>  	struct task_local_storage *skel;
>> +	pid_t pid = sys_gettid();
>>  	int err;
>>  
>>  	skel = task_local_storage__open_and_load();
>>  	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
>>  		return;
>>  
>> -	skel->bss->target_pid = sys_gettid();
>> -
>>  	err = task_local_storage__attach(skel);
>>  	if (!ASSERT_OK(err, "skel_attach"))
>>  		goto out;
>>  
>> +	/* Set target_pid after attach so that syscalls made during
>> +	 * attach are not counted.
>> +	 */
>> +	skel->bss->target_pid = pid;
>> +
>>  	sys_gettid();
>>  	sys_gettid();

> Maybe a simpler and less fragile fix would be to add syscall number
> filter in the BPF program, so that we don't count those unexpected
> syscalls.

Why do you think this is fragile?  We are not counting the syscalls
made during attachment at all this way.

You may be right, just trying to understand.

>>  
>> -	/* 3x syscalls: 1x attach and 2x gettid */

The way the test was written, those syscalls during attachment were
expected. But the flakiness wasn't obvious.

>> -	ASSERT_EQ(skel->bss->enter_cnt, 3, "enter_cnt");
>> -	ASSERT_EQ(skel->bss->exit_cnt, 3, "exit_cnt");
>> +	/* 2x gettid syscalls */
>> +	ASSERT_EQ(skel->bss->enter_cnt, 2, "enter_cnt");
>> +	ASSERT_EQ(skel->bss->exit_cnt, 2, "exit_cnt");
>>  	ASSERT_EQ(skel->bss->mismatch_cnt, 0, "mismatch_cnt");
>>  out:
>>  	task_local_storage__destroy(skel);
>> -- 
>> 2.53.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit
  2026-02-24 16:34   ` Ihor Solodrai
@ 2026-02-24 17:02     ` Mykyta Yatsenko
  0 siblings, 0 replies; 5+ messages in thread
From: Mykyta Yatsenko @ 2026-02-24 17:02 UTC (permalink / raw)
  To: Ihor Solodrai, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Amery Hung
  Cc: bpf, kernel-team

Ihor Solodrai <ihor.solodrai@linux.dev> writes:

> On 2/24/26 7:23 AM, Mykyta Yatsenko wrote:
>> Ihor Solodrai <ihor.solodrai@linux.dev> writes:
>> 
>>> The test_sys_enter_exit test was setting target_pid before attaching
>>> the BPF programs, which causes syscalls made during the attach phase
>>> to be counted. This is flaky because, apparently, there is no
>>> guarantee that both on_enter and on_exit will trigger during the
>>> attachment.
>>>
>>> Move the target_pid assignment to after task_local_storage__attach()
>>> so that only explicit sys_gettid() calls are counted.
>>>
>>> Reported-by: BPF CI Bot (Claude Opus 4.6) <bot+bpf-ci@kernel.org>
>>> Closes: https://github.com/kernel-patches/vmtest/issues/448
>>> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>>>
>>> ---
>>>
>>> I've been experimenting with running AI on BPF CI to investigate test
>>> failures. This is an example of a thing it may come up with.
>>>
>>> I don't want to spam the list with these, so for starters I'll be
>>> relaying only patches that I evaluated and/or tested.
>>>
>>> The AI generated reports will be posted with "[bpf-ci-bot]" prefix
>>> here: https://github.com/kernel-patches/vmtest/issues
>>>
>>> The goal of this particular application of AI is to make BPF CI more
>>> stable, less noisy/flaky, and potentially find and fix more kernel
>>> bugs. We'll see how it goes.
>>>
>>> ---
>>>  .../selftests/bpf/prog_tests/task_local_storage.c  | 14 +++++++++-----
>>>  1 file changed, 9 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
>>> index 7bee33797c71..2820a604aaa6 100644
>>> --- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
>>> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
>>> @@ -25,24 +25,28 @@
>>>  static void test_sys_enter_exit(void)
>>>  {
>>>  	struct task_local_storage *skel;
>>> +	pid_t pid = sys_gettid();
>>>  	int err;
>>>  
>>>  	skel = task_local_storage__open_and_load();
>>>  	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
>>>  		return;
>>>  
>>> -	skel->bss->target_pid = sys_gettid();
>>> -
>>>  	err = task_local_storage__attach(skel);
>>>  	if (!ASSERT_OK(err, "skel_attach"))
>>>  		goto out;
>>>  
>>> +	/* Set target_pid after attach so that syscalls made during
>>> +	 * attach are not counted.
>>> +	 */
>>> +	skel->bss->target_pid = pid;
>>> +
>>>  	sys_gettid();
>>>  	sys_gettid();
>
>> Maybe a simpler and less fragile fix would be to add syscall number
>> filter in the BPF program, so that we don't count those unexpected
>> syscalls.
>
> Why do you think this is fragile?  We are not counting the syscalls
> made during attachment at all this way.
I think it's more fragile if we add some code after sys_gettid() that
makes syscall or for example ASSERT_EQ makes one. Another example: if
someone decides to debug this test case and adds printf. I suggest to
tighten up the condition in the BPF program, so potential changes in the
user space have less chances breaking things.
I'm fine with your change, as it fixes the flakiness, but it is not
ideal in my opinion if we use this approach in more tests widely.
>
> You may be right, just trying to understand.
>
>>>  
>>> -	/* 3x syscalls: 1x attach and 2x gettid */
>
> The way the test was written, those syscalls during attachment were
> expected. But the flakiness wasn't obvious.
>
>>> -	ASSERT_EQ(skel->bss->enter_cnt, 3, "enter_cnt");
>>> -	ASSERT_EQ(skel->bss->exit_cnt, 3, "exit_cnt");
>>> +	/* 2x gettid syscalls */
>>> +	ASSERT_EQ(skel->bss->enter_cnt, 2, "enter_cnt");
>>> +	ASSERT_EQ(skel->bss->exit_cnt, 2, "exit_cnt");
>>>  	ASSERT_EQ(skel->bss->mismatch_cnt, 0, "mismatch_cnt");
>>>  out:
>>>  	task_local_storage__destroy(skel);
>>> -- 
>>> 2.53.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit
  2026-02-24  1:58 [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit Ihor Solodrai
  2026-02-24 15:23 ` Mykyta Yatsenko
@ 2026-02-24 18:09 ` Amery Hung
  1 sibling, 0 replies; 5+ messages in thread
From: Amery Hung @ 2026-02-24 18:09 UTC (permalink / raw)
  To: Ihor Solodrai
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, bpf, kernel-team

On Mon, Feb 23, 2026 at 5:59 PM Ihor Solodrai <ihor.solodrai@linux.dev> wrote:
>
> The test_sys_enter_exit test was setting target_pid before attaching
> the BPF programs, which causes syscalls made during the attach phase
> to be counted. This is flaky because, apparently, there is no
> guarantee that both on_enter and on_exit will trigger during the
> attachment.
>
> Move the target_pid assignment to after task_local_storage__attach()
> so that only explicit sys_gettid() calls are counted.
>
> Reported-by: BPF CI Bot (Claude Opus 4.6) <bot+bpf-ci@kernel.org>
> Closes: https://github.com/kernel-patches/vmtest/issues/448
> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>
> ---
>
> I've been experimenting with running AI on BPF CI to investigate test
> failures. This is an example of a thing it may come up with.
>
> I don't want to spam the list with these, so for starters I'll be
> relaying only patches that I evaluated and/or tested.
>
> The AI generated reports will be posted with "[bpf-ci-bot]" prefix
> here: https://github.com/kernel-patches/vmtest/issues
>
> The goal of this particular application of AI is to make BPF CI more
> stable, less noisy/flaky, and potentially find and fix more kernel
> bugs. We'll see how it goes.
>
> ---
>  .../selftests/bpf/prog_tests/task_local_storage.c  | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> index 7bee33797c71..2820a604aaa6 100644
> --- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> @@ -25,24 +25,28 @@
>  static void test_sys_enter_exit(void)
>  {
>         struct task_local_storage *skel;
> +       pid_t pid = sys_gettid();
>         int err;
>
>         skel = task_local_storage__open_and_load();
>         if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
>                 return;
>
> -       skel->bss->target_pid = sys_gettid();
> -
>         err = task_local_storage__attach(skel);
>         if (!ASSERT_OK(err, "skel_attach"))
>                 goto out;
>
> +       /* Set target_pid after attach so that syscalls made during
> +        * attach are not counted.
> +        */
> +       skel->bss->target_pid = pid;
> +
>         sys_gettid();
>         sys_gettid();
>

The test is done at this point and perhaps we should also add
skel->bss->target_pid = 0 to avoid some global variables from being
changed by bpf programs again. This is already done in
prog_tests/cgrp_local_storage.c. Not solving interference from
parallel subtests, but at least it will be easier to reason about the
test result.

> -       /* 3x syscalls: 1x attach and 2x gettid */
> -       ASSERT_EQ(skel->bss->enter_cnt, 3, "enter_cnt");
> -       ASSERT_EQ(skel->bss->exit_cnt, 3, "exit_cnt");
> +       /* 2x gettid syscalls */
> +       ASSERT_EQ(skel->bss->enter_cnt, 2, "enter_cnt");
> +       ASSERT_EQ(skel->bss->exit_cnt, 2, "exit_cnt");
>         ASSERT_EQ(skel->bss->mismatch_cnt, 0, "mismatch_cnt");
>  out:
>         task_local_storage__destroy(skel);
> --
> 2.53.0
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-24 18:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24  1:58 [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit Ihor Solodrai
2026-02-24 15:23 ` Mykyta Yatsenko
2026-02-24 16:34   ` Ihor Solodrai
2026-02-24 17:02     ` Mykyta Yatsenko
2026-02-24 18:09 ` Amery Hung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox