From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1CB99388E4B for ; Sun, 10 May 2026 15:13:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778426020; cv=none; b=mj3taC5LfUKQoNdLIEScQ8a4Yz38+Jj5/dHw1ySx7ulPlROJDWiQ2hamfBr95YWGDXUoJvCzlOInkSQLFm2RPEJrupIB9sqAseI9v40RDiPy54kSK73uxtAJcb4b9bJDqrBsnF/I5lZXWHemETLitl8CXLwtJn76pK7DZ1w5FcI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778426020; c=relaxed/simple; bh=r/WOnTGUEtMH6yCZoAP+YqqwKeh7rprH5FfTfFGcFdU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=XOjijuPVGWg89bOgtR6KVsSeePkrjhBaD3yghXPdIfitBGX62XWSgy8eNdzrH0STz5kA9eXqSo35kF/8Ez1ZM3ccXPs/VnUCuYuv1xqW2P0izaNosk9sLgrQesqL6UcuslvwkvC/t9KtxKdDyLfrjNuBxo1p/LrkMM7oFeIXFSA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZikUT0Fr; arc=none smtp.client-ip=209.85.128.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZikUT0Fr" Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-7bd5c773ef3so34539077b3.1 for ; Sun, 10 May 2026 08:13:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778426018; x=1779030818; darn=lists.linux.dev; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=N6YM/AAXzn9J0aZ953Dqra2iVWJmXRfyE6DQF3nGAEs=; b=ZikUT0FrC/dbUsbfiFBo/zQDln+X9+t/jvjXxdrAMeHLfkcuW5CB0noo8lCSufUfSq b6nnrF1Q9cx7uYHC30J1T34HrRyBi5m6BxGYLZY6fxvGdLZ634v/F9M29Gk5OxMgKiNE xH/0dEeN/bSP1CdOnQmqvY2b3L9Ak6pRgD4s0qFCiceR0eBiPYcoX9kf4Enqex9zbzur fB709Cb7TA7Bqg7giHL9P+4G5WIfGJiiIMk2iGyIEgzgIowC2I+c/nBrCpMBi1FGbOQu ENnuFyHf4I9iKM/2/pN5qEKe0b30iU6nsd5nfMrAkbPmSfUgVu8hbtrkIgY6ZkFiBjOZ 5ovQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778426018; x=1779030818; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=N6YM/AAXzn9J0aZ953Dqra2iVWJmXRfyE6DQF3nGAEs=; b=n+Uyyi1zbc7xwsMWNnGbNIgWoBHuQxSUNvozlY6j0yE3C9vR0cPf+PK3HOTATl61q4 PgNAxyvrvLJqPi5ciVn1xx0dstquEY/qhY1vLAqB8wJX1yuRlzbRlRFFB93xIsFcbdtS EC9mgUZvwGQzGGgIj7fWDu3BndpMN90iI0xSs7NyRHdyJX6G0KdTAx/YbFhDx5GJ19BG xw9990ivilSjVnSbaDfreO/286QUxXs5uSJLlucovmAioUZ00IAd44w42RM9Sb6BQnH2 MH5bTJzXeUIovW07M/4UlIs254ByQQGCuAA+fJ/YixrhGC5CCYW1n7XqK0afwFS+00zq G5Rw== X-Gm-Message-State: AOJu0YyC8/TnMY1zLnNpZDL7IC+ZBykDc/Rm4CXM6nwGd/KRfItYFuuY USEkZrjzLf5JhyLLDoIf4PYZlje2nt7xOo9bWA4iz97WHL+mJ/bOEhOR8vXeSiKu X-Gm-Gg: Acq92OExhLfwTX9nliR4hy9vIPgR1KrTw/Qntet/n/MutP8Kb6Lc0TYNV625anA1Qv4 R4lRY9JLrmndqacFfMVjUeU9jHHxrtRaRjSrg8leikQc6DNKcxKyZtmKW10hE4w9gnzmvG+NxW8 mICWMhdyOe0MAuona0wbs4yM8CGeqHRrPNw3rwT+5GB7dC0NWxvweyhzZKxZu2O+ZMQjycdSSW/ e/b9Hb/AvIXAX5x7/GngskwaIQ+R6dzrZBJPeT0fH4i+GhZ7LAqxR3AE+UsSwI5a/r0ejRP2x8T 8tDgzuk872982LPsIYXrEYI1H197uznK5e/EVLuxFpJR4u6IxjxK1Ruirov78w3DkmvzLRCTC2B 5y1TXGEbFrvPcsNlbLv3Bp+LLcwELkWI5QWdpGpm1wVb+1jBA/zVod9TU4ewm29SjF0TNCXyWOC fTQW1B7qtdqXhxA7S4oPcT5rmkU0i7kjS9ouf5mOon6XoP+i0ua80c/MLLsQ== X-Received: by 2002:a05:690c:4b0c:b0:799:2b4c:1d49 with SMTP id 00721157ae682-7bdf5eef7e4mr218836357b3.36.1778426018026; Sun, 10 May 2026 08:13:38 -0700 (PDT) Received: from zenbox ([2600:1700:18fb:6011:4060:ee75:92ff:98db]) by smtp.gmail.com with ESMTPSA id 00721157ae682-7bd6685d37fsm135045457b3.36.2026.05.10.08.13.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 May 2026 08:13:37 -0700 (PDT) Date: Sun, 10 May 2026 11:13:36 -0400 From: Justin Suess To: sashiko@lists.linux.dev Cc: bpf@vger.kernel.org Subject: Re: [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI Message-ID: References: <20260507175453.1140400-2-utilityemal77@gmail.com> <20260507234520.646C4C2BCB2@smtp.kernel.org> Precedence: bulk X-Mailing-List: sashiko@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260507234520.646C4C2BCB2@smtp.kernel.org> On Thu, May 07, 2026 at 11:45:19PM +0000, sashiko-bot@kernel.org wrote: > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: > - [High] TOCTOU race in `bpf_dtor_kptr_release_one()` can lead to NMI deadlocks. > - [Low] Memory leak on initialization failure in `bpf_dtor_kptr_init()`. > -- > > commit 488cdc390b90427a591d47211600c09addc95fcd > Author: Justin Suess > Date: Thu May 7 17:54:52 2026 +0000 > > bpf: Offload kptr destructors that run from NMI > > This commit queues destructor-backed kptr teardowns to irq_work when > running from NMI context. It maintains an idle-slot surplus counter to > ensure offload slots are preallocated to prevent deadlocks. > > Closes: https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/ > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index 3b1f0ba02f613..d34fdb99eb8a9 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > [ ... ] > > @@ -65,6 +69,111 @@ static DEFINE_SPINLOCK(map_idr_lock); > > +static void bpf_dtor_kptr_release_one(void) > > +{ > > + struct bpf_dtor_kptr_work *job; > > + long surplus; > > + > > + for (;;) { > > + surplus = atomic_long_read(&bpf_dtor_kptr_surplus); > > + if (surplus <= 0) > > + return; > > + > > + job = bpf_dtor_kptr_pop_idle(); > > Could this sequence lead to a race condition that drains the idle pool > too early? > > If multiple threads read a positive surplus value here, they might all call > bpf_dtor_kptr_pop_idle() before the cmpxchg executes. This could temporarily > drain the idle pool below the required number of reserved slots. > > If an NMI fires during this window and attempts to tear down a kptr, would > bpf_dtor_kptr_offload() find the idle pool empty? If so, it looks like it > would trigger the fallback warning and run the destructor inline from the > NMI context, which might reintroduce the deadlocks this patch aims to > prevent. > Yes this is real. But there's a bigger problem: the push side of pcpu_freelist is not interruptible. If we jump to NMI during this stage then the pcpu_freelist will be locked up because we were pushing to it, thus preventing us from popping. Thus we just deadlock again. I think we technically have exclusivity here on our percpu head (we can't be in NMI and process context stimultaniously on the cpu). So maybe it is safe to forcibly acquire that specific lock? I'm not 100% sure. llist won't work since we can't have multiple llist_del_first consumers, and llist_del_all would render the list null and unusable if we jump to it from NMI while we're borrowing it. Making percpu llists won't work either, since the cpu we allocate the job one might not be the one the dtor runs on, so we need percpu stealing, which then runs into the same multiple consumers problem if threads steal from same cpu head. So this code is not ready either way. I think 128 bit cmpxchg could technically solve this problem, but don't think we can rely on it for all architectures BPF supports. If anyone has any guidance on this it would be much appreciated. Been stuck on this for several weeks... Construct I need is a data structure with atomic push, pop, and move between sets. No specific ordering needed, but updates from producers must be immediately visible to all consumers. I've looked into Treiber stacks. Any help or guidance on this would be appreciated! > > + if (!job) > > + return; > > + > > + if (!atomic_long_try_cmpxchg(&bpf_dtor_kptr_surplus, &surplus, > > + surplus - 1)) { > > + pcpu_freelist_push(&bpf_dtor_kptr_idle, &job->fnode); > > + continue; > > + } > > + > > + bpf_mem_free(&bpf_global_ma, job); > > + return; > > + } > > +} > [ ... ] > > +static int __init bpf_dtor_kptr_init(void) > > +{ > > + int err; > > + > > + err = pcpu_freelist_init(&bpf_dtor_kptr_idle); > > + if (err) > > + return err; > > + > > + err = pcpu_freelist_init(&bpf_dtor_kptr_jobs); > > + if (err) > > + return err; > > Does this error path leak the per-cpu memory allocated for the idle list? > > If the second pcpu_freelist_init() fails, the function returns immediately. > Should there be a cleanup path here to free the already initialized > bpf_dtor_kptr_idle list? > > > + > > + return 0; > > +} > > +late_initcall(bpf_dtor_kptr_init); > > -- > Sashiko AI review · https://sashiko.dev/#/patchset/20260507175453.1140400-1-utilityemal77@gmail.com?part=1