* [PATCH bpf-next v3] selftests/bpf: fix task_local_storage/exit_creds rcu usage
@ 2022-10-21 19:36 Delyan Kratunov
2022-10-21 21:10 ` patchwork-bot+netdevbpf
0 siblings, 1 reply; 2+ messages in thread
From: Delyan Kratunov @ 2022-10-21 19:36 UTC (permalink / raw)
To: daniel@iogearbox.net, Song Liu, ast@kernel.org, andrii@kernel.org,
bpf@vger.kernel.org
BPF CI has revealed flakiness in the task_local_storage/exit_creds test.
The failure point in CI [1] is that null_ptr_count is equal to 0,
which indicates that the program hasn't run yet. This points to the
kern_sync_rcu (sys_membarrier -> synchronize_rcu underneath) not
waiting sufficiently.
Indeed, synchronize_rcu only waits for read-side sections that started
before the call. If the program execution starts *during* the
synchronize_rcu invocation (due to, say, preemption), the test won't
wait long enough.
As a speculative fix, make the synchornize_rcu calls in a loop until
an explicit run counter has gone up.
[1]: https://github.com/kernel-patches/bpf/actions/runs/3268263235/jobs/5374940791
Signed-off-by: Delyan Kratunov <delyank@meta.com>
---
v2 -> v3:
Fix Signed-off-by line. Love when my email gets silently rewritten.
v1 -> v2:
Explicit loop counter and MAX_SYNC_RCU_CALLS guard.
.../bpf/prog_tests/task_local_storage.c | 18 +++++++++++++++---
.../bpf/progs/task_local_storage_exit_creds.c | 3 +++
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
index 035c263aab1b..99a42a2b6e14 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
@@ -39,7 +39,8 @@ static void test_sys_enter_exit(void)
static void test_exit_creds(void)
{
struct task_local_storage_exit_creds *skel;
- int err;
+ int err, run_count, sync_rcu_calls = 0;
+ const int MAX_SYNC_RCU_CALLS = 1000;
skel = task_local_storage_exit_creds__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
@@ -53,8 +54,19 @@ static void test_exit_creds(void)
if (CHECK_FAIL(system("ls > /dev/null")))
goto out;
- /* sync rcu to make sure exit_creds() is called for "ls" */
- kern_sync_rcu();
+ /* kern_sync_rcu is not enough on its own as the read section we want
+ * to wait for may start after we enter synchronize_rcu, so our call
+ * won't wait for the section to finish. Loop on the run counter
+ * as well to ensure the program has run.
+ */
+ do {
+ kern_sync_rcu();
+ run_count = __atomic_load_n(&skel->bss->run_count, __ATOMIC_SEQ_CST);
+ } while (run_count == 0 && ++sync_rcu_calls < MAX_SYNC_RCU_CALLS);
+
+ ASSERT_NEQ(sync_rcu_calls, MAX_SYNC_RCU_CALLS,
+ "sync_rcu count too high");
+ ASSERT_NEQ(run_count, 0, "run_count");
ASSERT_EQ(skel->bss->valid_ptr_count, 0, "valid_ptr_count");
ASSERT_NEQ(skel->bss->null_ptr_count, 0, "null_ptr_count");
out:
diff --git a/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c b/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
index 81758c0aef99..41d88ed222ff 100644
--- a/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
+++ b/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
@@ -14,6 +14,7 @@ struct {
__type(value, __u64);
} task_storage SEC(".maps");
+int run_count = 0;
int valid_ptr_count = 0;
int null_ptr_count = 0;
@@ -28,5 +29,7 @@ int BPF_PROG(trace_exit_creds, struct task_struct *task)
__sync_fetch_and_add(&valid_ptr_count, 1);
else
__sync_fetch_and_add(&null_ptr_count, 1);
+
+ __sync_fetch_and_add(&run_count, 1);
return 0;
}
--
2.37.3
^ permalink raw reply related [flat|nested] 2+ messages in thread* Re: [PATCH bpf-next v3] selftests/bpf: fix task_local_storage/exit_creds rcu usage
2022-10-21 19:36 [PATCH bpf-next v3] selftests/bpf: fix task_local_storage/exit_creds rcu usage Delyan Kratunov
@ 2022-10-21 21:10 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 2+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-10-21 21:10 UTC (permalink / raw)
To: Delyan Kratunov; +Cc: daniel, songliubraving, ast, andrii, bpf
Hello:
This patch was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Fri, 21 Oct 2022 19:36:38 +0000 you wrote:
> BPF CI has revealed flakiness in the task_local_storage/exit_creds test.
> The failure point in CI [1] is that null_ptr_count is equal to 0,
> which indicates that the program hasn't run yet. This points to the
> kern_sync_rcu (sys_membarrier -> synchronize_rcu underneath) not
> waiting sufficiently.
>
> Indeed, synchronize_rcu only waits for read-side sections that started
> before the call. If the program execution starts *during* the
> synchronize_rcu invocation (due to, say, preemption), the test won't
> wait long enough.
>
> [...]
Here is the summary with links:
- [bpf-next,v3] selftests/bpf: fix task_local_storage/exit_creds rcu usage
https://git.kernel.org/bpf/bpf-next/c/eb814cf1adea
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-10-21 21:10 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-21 19:36 [PATCH bpf-next v3] selftests/bpf: fix task_local_storage/exit_creds rcu usage Delyan Kratunov
2022-10-21 21:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox