From: Justin Suess <utilityemal77@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: sashiko@lists.linux.dev, bpf <bpf@vger.kernel.org>
Subject: Re: [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI
Date: Mon, 11 May 2026 15:22:36 -0400 [thread overview]
Message-ID: <agInlPe_e8Mpp3Nq@zenbox> (raw)
In-Reply-To: <DIFYUK5DKGAT.189D9YYSQFCUZ@gmail.com>
On Mon, May 11, 2026 at 08:51:53AM -0700, Alexei Starovoitov wrote:
> On Sun May 10, 2026 at 6:49 PM PDT, Justin Suess wrote:
> > On Sun, May 10, 2026 at 03:38:08PM -0700, Alexei Starovoitov wrote:
> >> On Sun, May 10, 2026 at 8:14 AM Justin Suess <utilityemal77@gmail.com> wrote:
Here's a reproducer for the cgroup case:
https://gist.githubusercontent.com/RazeLighter777/5f77cdfe035a4e22ee2642ae7db6387d/raw/10898d27040a07098cccc5d0785d9ad6620344e7/cgroup_kptr_nmi_deadlock_repro
Hacked together with an AI prompt but functional.
Exercises a different path, but more consistently splats even without
CONFIG_RCU_NOCB_CPU / CONFIG_RCU_EXPERT since this dtor uses workqueue.
Had to use an fexit hook to get the timing condition right to release
the last cgroup reference.
But this lets you see the deadlock is indeed in the dtor in NMI.
This is on the same bpf-next/master
7e033543a2ab4c72319201298ed458e3bbddd82f:
[ 15.160694] ================================
[ 15.160695] WARNING: inconsistent lock state
[ 15.160695] 7.1.0-rc2-g7e033543a2ab-dirty #130 Not tainted
[ 15.160697] --------------------------------
[ 15.160697] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[ 15.160698] test_progs/434 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 15.160700] ffff9096fd66ced8 (&pool->lock){-.-.}-{2:2}, at: __queue_work+0xde/0x720
[ 15.160707] {INITIAL USE} state was registered at:
[ 15.160708] lock_acquire+0xbf/0x2e0
[ 15.160711] _raw_spin_lock+0x30/0x40
[ 15.160715] __queue_work+0xde/0x720
[ 15.160716] queue_work_on+0x54/0xa0
[ 15.160716] start_poll_synchronize_rcu_expedited+0xaf/0x110
[ 15.160719] rcu_init+0x958/0x990
[ 15.160722] start_kernel+0x746/0x980
[ 15.160725] x86_64_start_reservations+0x24/0x30
[ 15.160727] __pfx_reserve_bios_regions+0x0/0x10
[ 15.160729] common_startup_64+0x12c/0x138
[ 15.160731] irq event stamp: 18704
[ 15.160732] hardirqs last enabled at (18703): [<ffffffffa200148a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 15.160734] hardirqs last disabled at (18704): [<ffffffffa30e3f8f>] exc_nmi+0x7f/0x110
[ 15.160737] softirqs last enabled at (18698): [<ffffffffa22e5800>] __irq_exit_rcu+0xc0/0x100
[ 15.160739] softirqs last disabled at (18687): [<ffffffffa22e5800>] __irq_exit_rcu+0xc0/0x100
[ 15.160741]
[ 15.160741] other info that might help us debug this:
[ 15.160741] Possible unsafe locking scenario:
[ 15.160741]
[ 15.160742] CPU0
[ 15.160742] ----
[ 15.160742] lock(&pool->lock);
[ 15.160743] <Interrupt>
[ 15.160743] lock(&pool->lock);
[ 15.160744]
[ 15.160744] *** DEADLOCK ***
[ 15.160744]
[ 15.160744] no locks held by test_progs/434.
[ 15.160745]
[ 15.160745] stack backtrace:
[ 15.160747] CPU: 1 UID: 0 PID: 434 Comm: test_progs Not tainted 7.1.0-rc2-g7e033543a2ab-dirty #130 PREEMPT(full)
[ 15.160749] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 15.160750] Call Trace:
[ 15.160751] <TASK>
[ 15.160753] dump_stack_lvl+0x5d/0x80
[ 15.160757] print_usage_bug.part.0+0x22b/0x2c0
[ 15.160760] lock_acquire+0x295/0x2e0
[ 15.160762] ? srso_alias_return_thunk+0x5/0xfbef5
[ 15.160763] ? __queue_work+0xde/0x720
[ 15.160767] _raw_spin_lock+0x30/0x40
[ 15.160768] ? __queue_work+0xde/0x720
[ 15.160769] __queue_work+0xde/0x720
[ 15.160772] queue_work_on+0x54/0xa0
[ 15.160774] bpf_cgroup_release_dtor+0x12e/0x140
[ 15.160778] bpf_obj_free_fields+0x118/0x250
[ 15.160782] free_htab_elem+0x85/0xd0
[ 15.160785] htab_map_delete_elem+0x168/0x230
[ 15.160790] bpf_prog_23fcbbeb395ac6b4_clear_cgroup_kptrs_from_nmi+0x54/0x74
[ 15.160792] bpf_trace_run3+0x126/0x430
[ 15.160795] ? __pfx_perf_event_nmi_handler+0x10/0x10
[ 15.160799] nmi_handle.part.0+0x15b/0x250
[ 15.160802] ? __pfx_perf_event_nmi_handler+0x10/0x10
[ 15.160804] default_do_nmi+0x120/0x180
[ 15.160807] exc_nmi+0xe3/0x110
[ 15.160809] asm_exc_nmi+0xb7/0x100
[ 15.160810] RIP: 0033:0x5607a669541b
[ 15.160813] Code: c7 45 f0 00 00 00 00 eb 1a 8b 55 f0 8b 45 f4 01 d0 48 63 d0 48 8b 45 a8 48 01 d0 48 89 45 a8 83 45 f0 01 81 7d f0 3f 42 0f 00 <7e> dd e8 7e f5 ff ff 48 89 45 f8 48 8b 45 f8 48 3b 45 e8 73 16 83
[ 15.160814] RSP: 002b:00007ffdb09c1dc0 EFLAGS: 00000293
[ 15.160816] RAX: 0000003aced4e2f4 RBX: 00007f1d8d574000 RCX: 000000000000000f
[ 15.160816] RDX: 00000000000ad857 RSI: 00007f1d8d577000 RDI: 0000000000000001
[ 15.160817] RBP: 00007ffdb09c1e30 R08: 00007ffdb09c1da0 R09: 00007f1d8d577010
[ 15.160818] R10: 0000000000001614 R11: 0009718b9187183f R12: 0000000000000003
[ 15.160818] R13: 00007f1d8d5b6000 R14: 00007ffdb09c3358 R15: 00005607a9daf890
[ 15.160824] </TASK>
[ 15.214040] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[ 15.246002] perf: interrupt took too long (3135 > 3126), lowering kernel.perf_event_max_sample_rate to 63000
[ 15.308032] perf: interrupt took too long (3928 > 3918), lowering kernel.perf_event_max_sample_rate to 50000
[ 15.500072] perf: interrupt took too long (4912 > 4910), lowering kernel.perf_event_max_sample_rate to 40000
Justin
next prev parent reply other threads:[~2026-05-11 19:22 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 17:54 [bpf-next v3 0/2] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-05-07 17:54 ` [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI Justin Suess
2026-05-07 18:43 ` bot+bpf-ci
2026-05-07 18:52 ` Justin Suess
2026-05-07 23:45 ` sashiko-bot
2026-05-10 15:13 ` Justin Suess
2026-05-10 22:38 ` Alexei Starovoitov
2026-05-11 1:49 ` Justin Suess
2026-05-11 15:51 ` Alexei Starovoitov
2026-05-11 16:38 ` Justin Suess
2026-05-11 17:18 ` Alexei Starovoitov
2026-05-11 20:10 ` Kumar Kartikeya Dwivedi
2026-05-12 1:43 ` Justin Suess
2026-05-12 1:46 ` Kumar Kartikeya Dwivedi
2026-05-12 1:55 ` Alexei Starovoitov
2026-05-12 2:03 ` Kumar Kartikeya Dwivedi
2026-05-12 2:10 ` Alexei Starovoitov
2026-05-12 2:13 ` Kumar Kartikeya Dwivedi
2026-05-12 2:07 ` Justin Suess
2026-05-12 2:08 ` Kumar Kartikeya Dwivedi
2026-05-11 19:22 ` Justin Suess [this message]
2026-05-07 17:54 ` [bpf-next v3 2/2] selftests/bpf: Add kptr destructor NMI exerciser Justin Suess
2026-05-08 0:03 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agInlPe_e8Mpp3Nq@zenbox \
--to=utilityemal77@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=sashiko@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox