public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
To: Ihor Solodrai <ihor.solodrai@linux.dev>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit
Date: Tue, 24 Feb 2026 17:02:24 +0000	[thread overview]
Message-ID: <87ecma6pnj.fsf@gmail.com> (raw)
In-Reply-To: <801bd6f8-f1c3-4655-8cad-7f211979f330@linux.dev>

Ihor Solodrai <ihor.solodrai@linux.dev> writes:

> On 2/24/26 7:23 AM, Mykyta Yatsenko wrote:
>> Ihor Solodrai <ihor.solodrai@linux.dev> writes:
>> 
>>> The test_sys_enter_exit test was setting target_pid before attaching
>>> the BPF programs, which causes syscalls made during the attach phase
>>> to be counted. This is flaky because, apparently, there is no
>>> guarantee that both on_enter and on_exit will trigger during the
>>> attachment.
>>>
>>> Move the target_pid assignment to after task_local_storage__attach()
>>> so that only explicit sys_gettid() calls are counted.
>>>
>>> Reported-by: BPF CI Bot (Claude Opus 4.6) <bot+bpf-ci@kernel.org>
>>> Closes: https://github.com/kernel-patches/vmtest/issues/448
>>> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
>>>
>>> ---
>>>
>>> I've been experimenting with running AI on BPF CI to investigate test
>>> failures. This is an example of a thing it may come up with.
>>>
>>> I don't want to spam the list with these, so for starters I'll be
>>> relaying only patches that I evaluated and/or tested.
>>>
>>> The AI generated reports will be posted with "[bpf-ci-bot]" prefix
>>> here: https://github.com/kernel-patches/vmtest/issues
>>>
>>> The goal of this particular application of AI is to make BPF CI more
>>> stable, less noisy/flaky, and potentially find and fix more kernel
>>> bugs. We'll see how it goes.
>>>
>>> ---
>>>  .../selftests/bpf/prog_tests/task_local_storage.c  | 14 +++++++++-----
>>>  1 file changed, 9 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
>>> index 7bee33797c71..2820a604aaa6 100644
>>> --- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
>>> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
>>> @@ -25,24 +25,28 @@
>>>  static void test_sys_enter_exit(void)
>>>  {
>>>  	struct task_local_storage *skel;
>>> +	pid_t pid = sys_gettid();
>>>  	int err;
>>>  
>>>  	skel = task_local_storage__open_and_load();
>>>  	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
>>>  		return;
>>>  
>>> -	skel->bss->target_pid = sys_gettid();
>>> -
>>>  	err = task_local_storage__attach(skel);
>>>  	if (!ASSERT_OK(err, "skel_attach"))
>>>  		goto out;
>>>  
>>> +	/* Set target_pid after attach so that syscalls made during
>>> +	 * attach are not counted.
>>> +	 */
>>> +	skel->bss->target_pid = pid;
>>> +
>>>  	sys_gettid();
>>>  	sys_gettid();
>
>> Maybe a simpler and less fragile fix would be to add syscall number
>> filter in the BPF program, so that we don't count those unexpected
>> syscalls.
>
> Why do you think this is fragile?  We are not counting the syscalls
> made during attachment at all this way.
I think it's more fragile if we add some code after sys_gettid() that
makes syscall or for example ASSERT_EQ makes one. Another example: if
someone decides to debug this test case and adds printf. I suggest to
tighten up the condition in the BPF program, so potential changes in the
user space have less chances breaking things.
I'm fine with your change, as it fixes the flakiness, but it is not
ideal in my opinion if we use this approach in more tests widely.
>
> You may be right, just trying to understand.
>
>>>  
>>> -	/* 3x syscalls: 1x attach and 2x gettid */
>
> The way the test was written, those syscalls during attachment were
> expected. But the flakiness wasn't obvious.
>
>>> -	ASSERT_EQ(skel->bss->enter_cnt, 3, "enter_cnt");
>>> -	ASSERT_EQ(skel->bss->exit_cnt, 3, "exit_cnt");
>>> +	/* 2x gettid syscalls */
>>> +	ASSERT_EQ(skel->bss->enter_cnt, 2, "enter_cnt");
>>> +	ASSERT_EQ(skel->bss->exit_cnt, 2, "exit_cnt");
>>>  	ASSERT_EQ(skel->bss->mismatch_cnt, 0, "mismatch_cnt");
>>>  out:
>>>  	task_local_storage__destroy(skel);
>>> -- 
>>> 2.53.0

  reply	other threads:[~2026-02-24 17:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  1:58 [PATCH bpf-next v1] selftests/bpf: Fix flakiness of task_local_storage/sys_enter_exit Ihor Solodrai
2026-02-24 15:23 ` Mykyta Yatsenko
2026-02-24 16:34   ` Ihor Solodrai
2026-02-24 17:02     ` Mykyta Yatsenko [this message]
2026-02-24 18:09 ` Amery Hung

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ecma6pnj.fsf@gmail.com \
    --to=mykyta.yatsenko5@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=ihor.solodrai@linux.dev \
    --cc=kernel-team@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox