From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 189D4355F54 for ; Sun, 10 May 2026 15:13:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778426020; cv=none; b=ezVL0abZGYfuJGwQ3wcdDAZofHrAz0eXH6wWNalIr6N2XzKS2d/ooGAXkRBshoUnV83Wy4Tjn4K3DXv3SB9jw7xpQbEewSkIjqIExEQP7f7wj1p7OX9ZFaCfEsBWnTHKUH8WokTZxDZFYrZ6mCIz5yAN/Fa6MJLssZkUNAuMs8g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778426020; c=relaxed/simple; bh=r/WOnTGUEtMH6yCZoAP+YqqwKeh7rprH5FfTfFGcFdU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=XOjijuPVGWg89bOgtR6KVsSeePkrjhBaD3yghXPdIfitBGX62XWSgy8eNdzrH0STz5kA9eXqSo35kF/8Ez1ZM3ccXPs/VnUCuYuv1xqW2P0izaNosk9sLgrQesqL6UcuslvwkvC/t9KtxKdDyLfrjNuBxo1p/LrkMM7oFeIXFSA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hHFhudWl; arc=none smtp.client-ip=209.85.128.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hHFhudWl" Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-7bd8c55c885so34111257b3.2 for ; Sun, 10 May 2026 08:13:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778426018; x=1779030818; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=N6YM/AAXzn9J0aZ953Dqra2iVWJmXRfyE6DQF3nGAEs=; b=hHFhudWlAhv3AiSlp0PqgEOZrHKiJxfJo5MSqI0vw+TR6sCcEBG3J1vr43jIIfhQBL aGBw6kDk+BzIBWh04FvIuadCKXLBg1+npWuyZRapgdcuyhVUNzPUHCIpqqxd+XPehNaV jlR0Bfkm5kMv83F/48gzjZenNXAln3rYipeVhaN9TlRDS5KyuGnCPGN9+DtRFGUYO3uK NVguAv3ssHQMAevn9dVa+LjoXGH+Puh7fR2LaqB27BAq/3yS8i2RoMAfBycWTJCSNPiY +kfNHdERZ3VwhOyJJfIaWRncfYgI/Ez+RQmXmcbkM6Boyllu6wx9O3wrU79DAnomFTOw 0usw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778426018; x=1779030818; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=N6YM/AAXzn9J0aZ953Dqra2iVWJmXRfyE6DQF3nGAEs=; b=TeoDaBCdUa9ffAC6dXF8I38HgF34DGs9+9j86UfwegWiT4GIGNoP4V+2SQ/JBjd1vr haU3qYliZbFuNVOSY+NmZdes8ToubhcOOoEc37dIHwUvf/xKUmD0RprMTdqUD29Sqytf vXHsZSDvJA2jbWl3bIt6lIb0S1uV56YIplMTEfRH+EL8dWhBDcvTg9WNNlwZ9d+ZZZXl F9TmW0vVn2YiymU6HuhfZZWP6jVWxqJZGjUQJMQI2cvISdJ+mbtiZq3SujKSH9x4wpe+ d3gImUBsGy9WAI4OgGzc2vCyL/KjcPNj6BtLkbpuJWmCgL0XXHkTUjfyx8A2PXOQI5QM rJnA== X-Gm-Message-State: AOJu0YwS/mgx3NoEVg6Gi37PZV82bnTKS+MrHIH/LpsgF6u/wYbkdEWB fYIHVgCjEM4xDJrw0xi/DlPbhVAMF/JqgWy8sXuwjZk01E310wGxzdKEWF8e2sya X-Gm-Gg: Acq92OHdGM14jz0QCGVqoAujGag9fZkd8XKwpVTri56Xf52XyoNoDg0avARjoQgd76k mSy2A+NBoviLJsoULOjHvm1hA6X2iatjygENnJwAaIbBYT73iwBjclncO+VdhWnqa4BlYvNBCfh 17CZlCYgqTwMan2qM6fi86RJAyR9RGd37ehtuKUu0LaqAAxtFhGOfEuAhzUyfVM8L6pJlrfaPlU q02WsbU57FxNWZh2Geczeg2AqGF0ymBazJH4EnSpEeyZM9+rMl1BjUfEUWqrzHquQug6DogBDLZ cI5oFxkrEi7hllqueSCvOlE0ytrPG8IiWqp2yrlR0gjDEp5WkqYMy/3fHWghef7gYe8a6/MfuFA Ut15qu8flMSx7hmi7XEHj1dxvFNv2ZtovoeCcRCLbc97y9vtckDxk54GiQIDzD3wUhY9HCOZzq1 bw5dwM+rvNAb/UThkYH7fLTX2yl7msUJneMT6JyUBnOKAJfgHcBsXvAL9t7g== X-Received: by 2002:a05:690c:4b0c:b0:799:2b4c:1d49 with SMTP id 00721157ae682-7bdf5eef7e4mr218836357b3.36.1778426018026; Sun, 10 May 2026 08:13:38 -0700 (PDT) Received: from zenbox ([2600:1700:18fb:6011:4060:ee75:92ff:98db]) by smtp.gmail.com with ESMTPSA id 00721157ae682-7bd6685d37fsm135045457b3.36.2026.05.10.08.13.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 May 2026 08:13:37 -0700 (PDT) Date: Sun, 10 May 2026 11:13:36 -0400 From: Justin Suess To: sashiko@lists.linux.dev Cc: bpf@vger.kernel.org Subject: Re: [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI Message-ID: References: <20260507175453.1140400-2-utilityemal77@gmail.com> <20260507234520.646C4C2BCB2@smtp.kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260507234520.646C4C2BCB2@smtp.kernel.org> On Thu, May 07, 2026 at 11:45:19PM +0000, sashiko-bot@kernel.org wrote: > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: > - [High] TOCTOU race in `bpf_dtor_kptr_release_one()` can lead to NMI deadlocks. > - [Low] Memory leak on initialization failure in `bpf_dtor_kptr_init()`. > -- > > commit 488cdc390b90427a591d47211600c09addc95fcd > Author: Justin Suess > Date: Thu May 7 17:54:52 2026 +0000 > > bpf: Offload kptr destructors that run from NMI > > This commit queues destructor-backed kptr teardowns to irq_work when > running from NMI context. It maintains an idle-slot surplus counter to > ensure offload slots are preallocated to prevent deadlocks. > > Closes: https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/ > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index 3b1f0ba02f613..d34fdb99eb8a9 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > [ ... ] > > @@ -65,6 +69,111 @@ static DEFINE_SPINLOCK(map_idr_lock); > > +static void bpf_dtor_kptr_release_one(void) > > +{ > > + struct bpf_dtor_kptr_work *job; > > + long surplus; > > + > > + for (;;) { > > + surplus = atomic_long_read(&bpf_dtor_kptr_surplus); > > + if (surplus <= 0) > > + return; > > + > > + job = bpf_dtor_kptr_pop_idle(); > > Could this sequence lead to a race condition that drains the idle pool > too early? > > If multiple threads read a positive surplus value here, they might all call > bpf_dtor_kptr_pop_idle() before the cmpxchg executes. This could temporarily > drain the idle pool below the required number of reserved slots. > > If an NMI fires during this window and attempts to tear down a kptr, would > bpf_dtor_kptr_offload() find the idle pool empty? If so, it looks like it > would trigger the fallback warning and run the destructor inline from the > NMI context, which might reintroduce the deadlocks this patch aims to > prevent. > Yes this is real. But there's a bigger problem: the push side of pcpu_freelist is not interruptible. If we jump to NMI during this stage then the pcpu_freelist will be locked up because we were pushing to it, thus preventing us from popping. Thus we just deadlock again. I think we technically have exclusivity here on our percpu head (we can't be in NMI and process context stimultaniously on the cpu). So maybe it is safe to forcibly acquire that specific lock? I'm not 100% sure. llist won't work since we can't have multiple llist_del_first consumers, and llist_del_all would render the list null and unusable if we jump to it from NMI while we're borrowing it. Making percpu llists won't work either, since the cpu we allocate the job one might not be the one the dtor runs on, so we need percpu stealing, which then runs into the same multiple consumers problem if threads steal from same cpu head. So this code is not ready either way. I think 128 bit cmpxchg could technically solve this problem, but don't think we can rely on it for all architectures BPF supports. If anyone has any guidance on this it would be much appreciated. Been stuck on this for several weeks... Construct I need is a data structure with atomic push, pop, and move between sets. No specific ordering needed, but updates from producers must be immediately visible to all consumers. I've looked into Treiber stacks. Any help or guidance on this would be appreciated! > > + if (!job) > > + return; > > + > > + if (!atomic_long_try_cmpxchg(&bpf_dtor_kptr_surplus, &surplus, > > + surplus - 1)) { > > + pcpu_freelist_push(&bpf_dtor_kptr_idle, &job->fnode); > > + continue; > > + } > > + > > + bpf_mem_free(&bpf_global_ma, job); > > + return; > > + } > > +} > [ ... ] > > +static int __init bpf_dtor_kptr_init(void) > > +{ > > + int err; > > + > > + err = pcpu_freelist_init(&bpf_dtor_kptr_idle); > > + if (err) > > + return err; > > + > > + err = pcpu_freelist_init(&bpf_dtor_kptr_jobs); > > + if (err) > > + return err; > > Does this error path leak the per-cpu memory allocated for the idle list? > > If the second pcpu_freelist_init() fails, the function returns immediately. > Should there be a cleanup path here to free the already initialized > bpf_dtor_kptr_idle list? > > > + > > + return 0; > > +} > > +late_initcall(bpf_dtor_kptr_init); > > -- > Sashiko AI review · https://sashiko.dev/#/patchset/20260507175453.1140400-1-utilityemal77@gmail.com?part=1