BPF List
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Justin Suess" <utilityemal77@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [bpf-next v2 2/2] selftests/bpf: Add kptr destructor NMI exerciser
Date: Tue, 05 May 2026 20:15:32 +0000	[thread overview]
Message-ID: <20260505201532.A1E6AC2BCB4@smtp.kernel.org> (raw)
In-Reply-To: <20260505150851.3090688-3-utilityemal77@gmail.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Read-modify-write data race on `kptr_dtor_nmi_live_mask` between syscall and NMI contexts.
- [High] Non-atomic increment of `kptr_dtor_nmi_release_calls` in a global fentry hook leads to lost updates.
- [Medium] API Contract Violation: Incorrect BPF program signature for the `tp_btf/nmi_handler` tracepoint.
- [Medium] Test flakiness due to reliance on natural RCU grace periods and a misplaced `kern_sync_rcu()` call.
--

commit cb04c2d31ab28ab2b4ff2bbc2cd6c029cabdc851
Author: Justin Suess <utilityemal77@gmail.com>

selftests/bpf: Add kptr destructor NMI exerciser

Programs attached to tp_btf/nmi_handler can drop refcounted kptrs from
NMI context by deleting map entries or clearing map values. Add a
dedicated BPF-side selftest program that populates hash and array maps
with cpumask kptrs and clears them again from the NMI handler.

> diff --git a/tools/testing/selftests/bpf/prog_tests/kptr_dtor_nmi.c b/tools/testing/selftests/bpf/prog_tests/kptr_dtor_nmi.c
> index 0000000000000..a1aa8b6646b36
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/kptr_dtor_nmi.c

[ ... ]

> @@ -193,12 +193,12 @@ static void run_kptr_dtor_nmi_case(const struct kptr_dtor_nmi_case *test)
>  			       "kptr_dtor_nmi_setup_created"))
>  			goto cleanup;
>  
>  		if (!wait_for_nmi_drain(skel, expected_total, expected_total))
>  			goto cleanup;
>  
>  		kern_sync_rcu();

Is it possible for the test to flake here due to relying on the natural
background RCU grace period?

For hash maps, deleted elements are freed and their kptrs are destructed via
RCU callbacks, which increment the release calls counter. The
wait_for_nmi_drain() loop waits for up to 5 seconds for this counter to hit
the target, but we only force an RCU grace period with kern_sync_rcu() after
the wait loop completes.

Under heavy system load, could the natural grace period take longer than
5 seconds and cause the wait loop to time out?

>  	}
>  
>  	if (!run_syscall_prog(skel->progs.cleanup_kptrs, "cleanup_kptrs"))
>  		goto cleanup;

[ ... ]

> diff --git a/tools/testing/selftests/bpf/progs/kptr_dtor_nmi.c b/tools/testing/selftests/bpf/progs/kptr_dtor_nmi.c
> index 0000000000000..1ab5951a7a223
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/kptr_dtor_nmi.c

[ ... ]

> @@ -66,15 +66,15 @@ static void set_err(__u32 *err_dst, int *errno_dst, __u32 err, int err_no)
>  static bool slot_is_live(__u32 slot)
>  {
>  	return kptr_dtor_nmi_live_mask & (1U << slot);
>  }
>  
>  static void mark_slot_live(__u32 slot)
>  {
>  	kptr_dtor_nmi_live_mask |= 1U << slot;
>  }
>  
>  static void clear_slot_live(__u32 slot)
>  {
>  	kptr_dtor_nmi_live_mask &= ~(1U << slot);
>  }

Can this sequence cause updates to the live mask to be lost?

Since kptr_dtor_nmi_live_mask is updated using non-atomic bitwise operations,
if an NMI interrupts the syscall program exactly between the load and store
instructions of mark_slot_live():

syscall context (mark_slot_live)
    loads kptr_dtor_nmi_live_mask
    <NMI interrupts>

NMI context (clear_slot_live)
    loads, clears bit, stores kptr_dtor_nmi_live_mask

syscall context resumes
    stores modified mask, overwriting the NMI's cleared bit

If this happens, the NMI handler on subsequent runs will encounter an
already-deleted slot, fail with -ENOENT, and permanently abort the loop.
Should these updates use atomic builtins like __sync_fetch_and_or()?

>  
>  static struct kptr_dtor_nmi_value *lookup_hash_value(__u32 slot)

[ ... ]

> @@ -375,12 +375,12 @@ int cleanup_kptrs(void *ctx)
>  		break;
>  	}
>  
>  	return 0;
>  }
>  
>  SEC("tp_btf/nmi_handler")
>  int BPF_PROG(clear_kptrs_from_nmi, void *handler, void *regs, s64 delta_ns,
>  	     int handled)
>  {
>  	(void)handler;

Does this signature match the kernel tracepoint for nmi_handler?

The nmi_handler tracepoint in the kernel has exactly 3 arguments: handler,
delta_ns, and handled. By including a regs parameter here, the BPF_PROG
macro maps delta_ns to ctx[1] and handled to ctx[3].

Since the actual context size is 3 elements, accessing handled would read
out-of-bounds. Should the regs parameter be removed to match the tracepoint?

>  	(void)regs;
>  	(void)delta_ns;

[ ... ]

> @@ -402,10 +402,10 @@ int BPF_PROG(clear_kptrs_from_nmi, void *handler, void *regs, s64 delta_ns,
>  		break;
>  	}
>  
>  	return 0;
>  }
>  
>  SEC("fentry/bpf_cpumask_release")
>  int BPF_PROG(count_cpumask_release, struct bpf_cpumask *mask)
>  {
>  	(void)mask;
>  	kptr_dtor_nmi_release_calls++;
>  	return 0;
>  }

Could concurrent global calls to bpf_cpumask_release() cause lost updates
to this counter?

This fentry hook traces the function globally across all CPUs. The counter
is incremented non-atomically, which translates to a load-add-store
sequence. If bpf_cpumask_release is executed concurrently on multiple CPUs
(e.g., when running selftests in parallel), these increments can race.

If an increment generated by the test itself is lost, the test will fall
short of expected_release_calls, causing wait_for_nmi_drain() to hang and
time out. Should this use __sync_fetch_and_add() instead?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260505150851.3090688-1-utilityemal77@gmail.com?part=2

      reply	other threads:[~2026-05-05 20:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 15:08 [bpf-next v2 0/2] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-05-05 15:08 ` [bpf-next v2 1/2] bpf: Offload kptr destructors that run from NMI Justin Suess
2026-05-05 16:06   ` bot+bpf-ci
2026-05-05 19:48     ` Justin Suess
2026-05-05 19:49   ` sashiko-bot
2026-05-06 16:43   ` Mykyta Yatsenko
2026-05-06 19:52     ` Justin Suess
2026-05-07 14:59       ` Mykyta Yatsenko
2026-05-07 16:41         ` Justin Suess
2026-05-07 17:19           ` Mykyta Yatsenko
2026-05-05 15:08 ` [bpf-next v2 2/2] selftests/bpf: Add kptr destructor NMI exerciser Justin Suess
2026-05-05 20:15   ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260505201532.A1E6AC2BCB4@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    --cc=utilityemal77@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox