BPF List
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Justin Suess" <utilityemal77@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [bpf-next v3 2/2] selftests/bpf: Add kptr destructor NMI exerciser
Date: Fri, 08 May 2026 00:03:33 +0000	[thread overview]
Message-ID: <20260508000334.A1BE8C2BCB2@smtp.kernel.org> (raw)
In-Reply-To: <20260507175453.1140400-3-utilityemal77@gmail.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] Failure to acquire the PMU file descriptor (e.g., due to EACCES permissions) causes the test to silently pass rather than fail or skip.
- [Medium] The test executes the `cleanup_kptrs` BPF program but fails to verify if it encountered any internal errors.
- [Medium] Concurrent NMI handlers can cause double-increments of `kptr_dtor_nmi_deleted` and loop aborts due to a TOCTOU race on `slot_is_live`.
--

commit f53640a66945f8f1050d74f78ad482b51aeaf343
Author: Justin Suess <utilityemal77@gmail.com>

selftests/bpf: Add kptr destructor NMI exerciser

This patch adds a BPF selftest program that populates hash and array maps
with cpumask kptrs and clears them from the NMI handler. The test verifies
that the destructor is called on all objects queued for destruction in
hardirq from NMI.

> diff --git a/tools/testing/selftests/bpf/prog_tests/kptr_dtor_nmi.c b/tools/testing/selftests/bpf/prog_tests/kptr_dtor_nmi.c
> new file mode 100644
> index 0000000000000..21452b3cf9eb8
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/kptr_dtor_nmi.c
> @@ -0,0 +1,258 @@
[ ... ]
> +__maybe_unused
> +static int open_nmi_pmu_event_on_cpu(int cpu)
> +{
> +	struct perf_event_attr attr = {
> +		.size = sizeof(attr),
> +		.type = PERF_TYPE_HARDWARE,
> +		.config = PERF_COUNT_HW_CPU_CYCLES,
> +		.freq = 1,
> +		.sample_freq = 1000,
> +	};
> +	int pmu_fd;
> +
> +	pmu_fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1,
> +			 PERF_FLAG_FD_CLOEXEC);
> +	if (pmu_fd == -1) {
> +		if (errno == ENOENT || errno == EOPNOTSUPP) {
> +			printf("SKIP:no PERF_COUNT_HW_CPU_CYCLES\n");
> +			test__skip();
> +		}
> +		return -1;
> +	}

If the syscall fails with an error other than ENOENT or EOPNOTSUPP (like
EACCES), this returns -1 without calling test__skip().

In run_kptr_dtor_nmi_case():

	pmu_fd = open_nmi_pmu_event_on_cpu(cpu);
	if (pmu_fd < 0)
		goto cleanup;

Since no assertions are triggered and the test isn't explicitly skipped,
will the selftest framework register this as a PASS, masking the resource
acquisition failure?

[ ... ]
> +static void run_kptr_dtor_nmi_case(const struct kptr_dtor_nmi_case *test)
> +{
[ ... ]
> +	if (!run_syscall_prog(skel->progs.cleanup_kptrs, "cleanup_kptrs"))
> +		goto cleanup;
> +	/*
> +	 * The grace period for rcu cannot complete until the CPU that ran the
> +	 * hard irq_work has passed through a quiescent state after running
> +	 * our dtor work. This effectively flushes our pending work and allows
> +	 * the test to verify the dtor was called the expected number of times.
> +	 */
> +	kern_sync_rcu();
> +	ASSERT_EQ(skel->bss->kptr_dtor_nmi_cleanup_deleted, 0,
> +		  "kptr_dtor_nmi_cleanup_deleted");

Does this test need to assert that skel->bss->kptr_dtor_nmi_cleanup_err
is 0?

The cleanup_kptrs BPF program sets kptr_dtor_nmi_cleanup_err if an error
occurs, but it looks like only kptr_dtor_nmi_cleanup_deleted is verified
here. If an internal error happens but no masks were exchanged out,
cleanup_deleted will be 0 and the test might silently pass.

> +cleanup:
> +	if (pmu_fd >= 0)
> +		close(pmu_fd);
[ ... ]
> diff --git a/tools/testing/selftests/bpf/progs/kptr_dtor_nmi.c b/tools/testing/selftests/bpf/progs/kptr_dtor_nmi.c
> new file mode 100644
> index 0000000000000..693b07215006b
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/kptr_dtor_nmi.c
[ ... ]
> +static bool clear_array_slot_from_nmi(__u32 slot)
> +{
> +	struct kptr_dtor_nmi_value init = {};
> +	int err;
> +
> +	if (!slot_is_live(slot))
> +		return true;
> +
> +	err = bpf_map_update_elem(&kptr_dtor_nmi_array, &slot, &init, BPF_EXIST);
> +	if (err) {
> +		set_err(&kptr_dtor_nmi_nmi_err,
> +				&kptr_dtor_nmi_nmi_errno,
> +				KPTR_DTOR_NMI_DELETE_ERR, err);
> +		return false;
> +	}
> +
> +	clear_slot_live(slot);
> +	__sync_fetch_and_add(&kptr_dtor_nmi_deleted, 1);

Could concurrent NMI handlers cause a double-increment of kptr_dtor_nmi_deleted
due to a TOCTOU race on slot_is_live()?

Because clear_kptrs_from_nmi is attached to the global tp_btf/nmi_handler
tracepoint, it executes on NMIs across all CPUs. If two NMIs fire concurrently
on different CPUs, both could read slot_is_live(slot) as true and successfully
execute bpf_map_update_elem(..., BPF_EXIST) on the array map. 

Both would then call clear_slot_live() and increment kptr_dtor_nmi_deleted.
This double-increment could cause wait_for_nmi_drain() to return prematurely
before all slots are truly drained. The next round's populate_kptrs syscall
would then encounter uncleared kptrs, causing stash_mask() to fail with -EEXIST.

Would it be better to use an atomic test-and-clear operation here?

> +	return true;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260507175453.1140400-1-utilityemal77@gmail.com?part=2

      reply	other threads:[~2026-05-08  0:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07 17:54 [bpf-next v3 0/2] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-05-07 17:54 ` [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI Justin Suess
2026-05-07 18:43   ` bot+bpf-ci
2026-05-07 18:52     ` Justin Suess
2026-05-07 23:45   ` sashiko-bot
2026-05-10 15:13     ` Justin Suess
2026-05-10 22:38       ` Alexei Starovoitov
2026-05-11  1:49         ` Justin Suess
2026-05-11 15:51           ` Alexei Starovoitov
2026-05-11 16:38             ` Justin Suess
2026-05-11 17:18               ` Alexei Starovoitov
2026-05-11 20:10                 ` Kumar Kartikeya Dwivedi
2026-05-12  1:43                   ` Justin Suess
2026-05-12  1:46                     ` Kumar Kartikeya Dwivedi
2026-05-12  1:55                       ` Alexei Starovoitov
2026-05-12  2:03                         ` Kumar Kartikeya Dwivedi
2026-05-12  2:10                           ` Alexei Starovoitov
2026-05-12  2:13                             ` Kumar Kartikeya Dwivedi
2026-05-12  2:07                         ` Justin Suess
2026-05-12  2:08                           ` Kumar Kartikeya Dwivedi
2026-05-11 19:22             ` Justin Suess
2026-05-07 17:54 ` [bpf-next v3 2/2] selftests/bpf: Add kptr destructor NMI exerciser Justin Suess
2026-05-08  0:03   ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508000334.A1BE8C2BCB2@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    --cc=utilityemal77@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox