From: sdf@google.com
To: Delyan Kratunov <delyank@meta.com>
Cc: "daniel@iogearbox.net" <daniel@iogearbox.net>,
Song Liu <songliubraving@meta.com>,
"ast@kernel.org" <ast@kernel.org>,
"andrii@kernel.org" <andrii@kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2] selftests/bpf: fix task_local_storage/exit_creds rcu usage
Date: Wed, 19 Oct 2022 12:57:48 -0700 [thread overview]
Message-ID: <Y1BWvNdHHwHbPXDk@google.com> (raw)
In-Reply-To: <f04cf27b05047cfb2c90db160383e2e9c2c40b93.camel@fb.com>
On 10/19, Delyan Kratunov wrote:
> BPF CI has revealed flakiness in the task_local_storage/exit_creds test.
> The failure point in CI [1] is that null_ptr_count is equal to 0,
> which indicates that the program hasn't run yet. This points to the
> kern_sync_rcu (sys_membarrier -> synchronize_rcu underneath) not
> waiting sufficiently.
> Indeed, synchronize_rcu only waits for read-side sections that started
> before the call. If the program execution starts *during* the
> synchronize_rcu invocation (due to, say, preemption), the test won't
> wait long enough.
> As a speculative fix, make the synchornize_rcu calls in a loop until
> an explicit run counter has gone up.
> [1]:
> https://github.com/kernel-patches/bpf/actions/runs/3268263235/jobs/5374940791
> Signed-off-by: Delyan Kratunov <delyank@fb.com>
> ---
> v1 -> v2:
> Explicit loop counter and MAX_SYNC_RCU_CALLS guard.
> .../bpf/prog_tests/task_local_storage.c | 18 +++++++++++++++---
> .../bpf/progs/task_local_storage_exit_creds.c | 3 +++
> 2 files changed, 18 insertions(+), 3 deletions(-)
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> index 035c263aab1b..99a42a2b6e14 100644
> --- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
> @@ -39,7 +39,8 @@ static void test_sys_enter_exit(void)
> static void test_exit_creds(void)
> {
> struct task_local_storage_exit_creds *skel;
> - int err;
> + int err, run_count, sync_rcu_calls = 0;
> + const int MAX_SYNC_RCU_CALLS = 1000;
> skel = task_local_storage_exit_creds__open_and_load();
> if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
> @@ -53,8 +54,19 @@ static void test_exit_creds(void)
> if (CHECK_FAIL(system("ls > /dev/null")))
> goto out;
> - /* sync rcu to make sure exit_creds() is called for "ls" */
> - kern_sync_rcu();
> + /* kern_sync_rcu is not enough on its own as the read section we want
> + * to wait for may start after we enter synchronize_rcu, so our call
> + * won't wait for the section to finish. Loop on the run counter
> + * as well to ensure the program has run.
> + */
> + do {
> + kern_sync_rcu();
> + run_count = __atomic_load_n(&skel->bss->run_count, __ATOMIC_SEQ_CST);
> + } while (run_count == 0 && ++sync_rcu_calls < MAX_SYNC_RCU_CALLS);
Acked-by: Stanislav Fomichev <sdf@google.com>
Might have been easier to do the following instead?
int sync_rcu_calls = 1000;
do {
} while (run_count == 0 && --sync_rcu_calls);
> +
> + ASSERT_NEQ(sync_rcu_calls, MAX_SYNC_RCU_CALLS,
> + "sync_rcu count too high");
> + ASSERT_NEQ(run_count, 0, "run_count");
> ASSERT_EQ(skel->bss->valid_ptr_count, 0, "valid_ptr_count");
> ASSERT_NEQ(skel->bss->null_ptr_count, 0, "null_ptr_count");
> out:
> diff --git
> a/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
> b/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
> index 81758c0aef99..41d88ed222ff 100644
> --- a/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
> +++ b/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
> @@ -14,6 +14,7 @@ struct {
> __type(value, __u64);
> } task_storage SEC(".maps");
> +int run_count = 0;
> int valid_ptr_count = 0;
> int null_ptr_count = 0;
> @@ -28,5 +29,7 @@ int BPF_PROG(trace_exit_creds, struct task_struct *task)
> __sync_fetch_and_add(&valid_ptr_count, 1);
> else
> __sync_fetch_and_add(&null_ptr_count, 1);
> +
> + __sync_fetch_and_add(&run_count, 1);
> return 0;
> }
> --
> 2.37.3
next prev parent reply other threads:[~2022-10-19 19:57 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-19 17:56 [PATCH bpf-next v2] selftests/bpf: fix task_local_storage/exit_creds rcu usage Delyan Kratunov
2022-10-19 19:57 ` sdf [this message]
2022-10-19 23:52 ` Alexei Starovoitov
2022-10-20 0:05 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y1BWvNdHHwHbPXDk@google.com \
--to=sdf@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=delyank@meta.com \
--cc=songliubraving@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.