BPF List
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Justin Suess" <utilityemal77@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next 1/1] bpf: fix deadlock in special field destruction in NMI
Date: Tue, 19 May 2026 01:51:22 +0000	[thread overview]
Message-ID: <20260519015123.36096C2BCB7@smtp.kernel.org> (raw)
In-Reply-To: <20260519011450.1144935-2-utilityemal77@gmail.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] LRU maps containing NMI-unsafe fields (like BPF_KPTR_REF or BPF_LIST_HEAD) will permanently leak these fields and corrupt map state when updated or deleted in an IRQ-disabled context.
- [High] Modifying the generic `bpf_obj_free_fields()` to silently skip destruction in IRQ-disabled contexts causes permanent memory leaks for callers lacking deferred destructors (like `bpf_obj_drop` and Array maps).
- [Medium] Forcibly mutating the user-requested `map_flags` to include `BPF_F_NO_PREALLOC` silently breaks performance expectations and UAPI transparency.
--

commit 3fbec1aa4107fa1f73249893a4ed21fd7d2213fa
Author: Justin Suess <utilityemal77@gmail.com>

bpf: fix deadlock in special field destruction in NMI

This commit relaxes bpf_obj_free_fields() to skip async work cancellation
and unsafe free operations in IRQ-disabled contexts. It converts hash maps
with NMI-unsafe fields to non-preallocated maps using bpf_mem_alloc to
defer the cleanup. LRU maps and preallocated hash maps are adjusted to
support this mechanism.

Closes: https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/
Link: https://lore.kernel.org/bpf/DIG0ONMVOP0L.3QFYUPWFSKWI4@gmail.com/

> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 3dd9b4924ae4f..0db1dc8ae0bec 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
[ ... ]
> @@ -522,13 +532,53 @@ static int htab_set_dtor(struct bpf_htab *htab, void (*dtor)(void *, void *))
>  	return 0;
>  }
>  
> +static int htab_convert_to_non_prealloc(struct bpf_htab *htab)
> +{
> +	bool percpu = htab_is_percpu(htab);
> +	int err;
> +
> +	htab_free_prealloced_fields(htab);
> +	free_percpu(htab->extra_elems);
> +	htab->extra_elems = NULL;
> +	prealloc_destroy(htab);
> +	htab->map.map_flags |= BPF_F_NO_PREALLOC;

Does mutating the map flags post-creation to include BPF_F_NO_PREALLOC alter
the memory model and user-visible flags without warning? If userspace queries
the map info via bpf_map_get_info_by_fd(), will it see flags it never
requested?

Would it be better to explicitly reject this configuration and return -EINVAL
rather than secretly altering the underlying memory model?

[ ... ]
>  static int htab_map_check_btf(struct bpf_map *map, const struct btf *btf,
>  			      const struct btf_type *key_type, const struct btf_type *value_type)
>  {
>  	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> +	int err;
> +
> +	if (!htab_has_nmi_special_fields(htab)) {
> +		if (htab_is_prealloc(htab))
> +			return 0;
> +	} else {
> +		if (htab_is_lru(htab))
> +			return 0;

Since LRU maps are exempted from conversion to non-preallocated mode here,
they do not use bpf_mem_alloc and completely lack the deferred destructor
mechanism relied upon by this patch.

When an LRU element is replaced or deleted, htab_lru_push_free() is called
and synchronously invokes check_and_free_fields() which eventually calls
bpf_obj_free_fields().

If this executes in an IRQ-disabled context, will the new if
(irqs_disabled()) checks in bpf_obj_free_fields() silently skip freeing
the special fields?

Does this cause a permanent memory leak since the element is then
immediately pushed back to the LRU free list? When the element is
subsequently popped and reused, could it inherit stale pointers and
corrupt map state?

[ ... ]
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 6600e126fbfb4..1f52453d5a2f1 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -839,6 +839,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>  			break;
>  		case BPF_KPTR_REF:
>  		case BPF_KPTR_PERCPU:
> +			if (irqs_disabled())
> +				break;
>  			xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
>  			if (!xchgd_field)
>  				break;

Because bpf_obj_free_fields() is a generic helper used throughout the BPF
subsystem, does silently skipping destruction in IRQ-disabled contexts
cause leaks for callers that lack deferred destructors?

For example, if a BPF program running in an NMI or kprobe drops an object
containing nested kptrs via __bpf_obj_drop_impl(), will the nested fields
be skipped and permanently leaked when the outer object is queued for
freeing via bpf_global_ma?

Does this similarly affect in-place updates to Array maps which rely on
bpf_obj_free_fields() to clear old special fields?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260519011450.1144935-2-utilityemal77@gmail.com?part=1

  reply	other threads:[~2026-05-19  1:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  1:14 [PATCH bpf-next 0/1] bpf: Fix deadlock in freeing of special fields in NMI Justin Suess
2026-05-19  1:14 ` [PATCH bpf-next 1/1] bpf: fix deadlock in special field destruction " Justin Suess
2026-05-19  1:51   ` sashiko-bot [this message]
2026-05-19 14:42     ` Justin Suess
2026-05-19 15:59       ` Justin Suess
2026-05-19 19:27         ` Kumar Kartikeya Dwivedi
2026-05-19 23:19           ` Justin Suess
2026-05-19 10:25   ` kernel test robot
2026-05-19 12:31   ` Mykyta Yatsenko
2026-05-19 13:22     ` Justin Suess
2026-05-19 13:55       ` Mykyta Yatsenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519015123.36096C2BCB7@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=utilityemal77@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox