Re: [BUG] bpf: use-after-free in hashtab BPF_F_LOCK in-place update path

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
To: Puranjay Mohan <puranjay12@gmail.com>
Cc: Aaron Esau <aaron1esau@gmail.com>,
	bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org
Subject: Re: [BUG] bpf: use-after-free in hashtab BPF_F_LOCK in-place update path
Date: Thu, 26 Mar 2026 15:47:35 +0000	[thread overview]
Message-ID: <87ldfeiodk.fsf@gmail.com> (raw)
In-Reply-To: <87o6kaioju.fsf@gmail.com>

Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> writes:

> Puranjay Mohan <puranjay12@gmail.com> writes:
>
>> On Thu, Mar 26, 2026 at 3:26 PM Mykyta Yatsenko
>> <mykyta.yatsenko5@gmail.com> wrote:
>>>
>>> Puranjay Mohan <puranjay@kernel.org> writes:
>>>
>>> > Aaron Esau <aaron1esau@gmail.com> writes:
>>> >
>>> >> Reported-by: Aaron Esau <aaron1esau@gmail.com>
>>> >>
>>> >> htab_map_update_elem() has a use-after-free when BPF_F_LOCK is used
>>> >> for in-place updates.
>>> >>
>>> >> The BPF_F_LOCK path calls lookup_nulls_elem_raw() without holding the
>>> >> bucket lock, then dereferences the element via copy_map_value_locked().
>>> >> A concurrent htab_map_delete_elem() can delete and free the element
>>> >> between these steps.
>>> >>
>>> >> free_htab_elem() uses bpf_mem_cache_free(), which immediately returns
>>> >> the object to the per-CPU free list (not RCU-deferred). The memory may
>>> >> be reallocated before copy_map_value_locked() executes, leading to
>>> >> writes into a different element.
>>> >>
>>> >> When lookup succeeds (l_old != NULL), the in-place update path returns
>>> >> early, so the “full lookup under lock” path is not taken.
>>> >>
>>> >> Race:
>>> >>
>>> >>   CPU 0: htab_map_update_elem (BPF_F_LOCK)
>>> >>          lookup_nulls_elem_raw() → E (no bucket lock)
>>> >>          ...
>>> >>   CPU 1: htab_map_delete_elem()
>>> >>          htab_lock_bucket → hlist_nulls_del_rcu → htab_unlock_bucket
>>> >>          free_htab_elem → bpf_mem_cache_free (immediate free)
>>> >>   CPU 1: htab_map_update_elem (new key)
>>> >>          alloc_htab_elem → reuses E
>>> >>   CPU 0: copy_map_value_locked(E, ...) → writes into reused object
>>> >>
>>> >> Reproduction:
>>> >>
>>> >>   1. Create BPF_MAP_TYPE_HASH with a value containing bpf_spin_lock
>>> >>      (max_entries=64, 7 u64 fields + lock).
>>> >>   2. Threads A: BPF_MAP_UPDATE_ELEM with BPF_F_LOCK (pattern 0xAAAA...)
>>> >>   3. Threads B: DELETE + UPDATE (pattern 0xBBBB...) on same keys
>>> >>   4. Threads C: same as A (pattern 0xCCCC...)
>>> >>   5. Verifier threads: LOOKUP loop, detect mixed-pattern values
>>> >>   6. Run 60s on >=4 CPUs
>>> >>
>>> >> Attached a POC. On 6.19.9 (4 vCPU QEMU, CONFIG_PREEMPT=y),
>>> >> I observed ~645 torn values in 2.5M checks (~0.026%).
>>> >>
>>> >> Fixes: 96049f3afd50 ("bpf: introduce BPF_F_LOCK flag")
>>> >
>>> > Although this is a real issue, your reproducer is not accurate, it will
>>> > see torn writes even without the UAF issue, because the verifier thread
>>> > is not taking the lock:
>>> >
>>> > So the torn write pattern CCCAAAA can mean:
>>> >   1. Thread A finished writing AAAAAAA (while holding the lock)
>>> >   2. Thread C acquired the lock and started writing: field[0]=C, field[1]=C, field[2]=C...
>>> >   3. The verifier thread reads (no lock): sees field[0]=C, field[1]=C, field[2]=C, field[3]=A, field[4]=A, field[5]=A, field[6]=A
>>> >   4. Thread C finishes: field[3]=C, field[4]=C, field[5]=C, field[6]=C, releases lock
>>> >
>>> > This race happens regardless of whether the element is freed/reused.  It
>>> > would happen even without thread B (the delete+readd thread). The
>>> > corruption source is the non-atomic read, not the UAF.
>>> Have you confirmed torn reads even with BPF_F_LOCK flag on the
>>> BPF_MAP_LOOKUP_ELEM_CMD? I understand there must not be any torn reads
>>> with spinlock taken on the lookup path.
>>>
>>> The reproducer looks like a good selftest to have, but it needs to be
>>> ported to use libbpf, currently it looks too complex.
>>
>> Yes, I have confirmed torn reads even with BPF_F_LOCK on
>> BPF_MAP_LOOKUP_ELEM_CMD, the results given below are with BPF_F_LOCK.
>> But this is expected behaviour, BPF_F_LOCK performs a lockless lookup
>> and takes only the element's embedded spin_lock for in-place value
>> updates. It does not synchronize against concurrent deletes.
>>
> It does not synchronize against concurrent deletes, yes, but with
> BPF_F_LOCK on BPF_MAP_LOOKUP_ELEM_CMD, we still take the spinlock before
> copying value from the map (syscall.c:354). Deletion does not mutate the
> element, so when the value is reused, it should still take the same lock,
> shouldn't it?

Ok, I think this is just a wonky reproducer:
fill_value(&val, PATTERN_B, seq++);
map_update(g_map_fd, &g_key, &val, BPF_ANY);

should have been BPF_ANY | BPF_F_LOCK
>>
>>> >
>>> > If you change the preproducer like:
>>> >
>>> > -- >8 --
>>> >
>>> > --- repro.c     2026-03-26 05:22:49.012503218 -0700
>>> > +++ repro2.c    2026-03-26 06:24:40.951044279 -0700
>>> > @@ -227,6 +227,7 @@
>>> >         attr.map_fd = fd;
>>> >         attr.key = (uint64_t)(unsigned long)key;
>>> >         attr.value = (uint64_t)(unsigned long)val;
>>> > +       attr.flags = BPF_F_LOCK;
>>> >         return bpf_sys(BPF_MAP_LOOKUP_ELEM_CMD, &attr, sizeof(attr));
>>> >  }
>>> >
>>> > -- 8< --
>>> >
>>> > Now it will detect the correct UAF problem.
>>> >
>>> > I verified that this updated reproducer shows the problem, the following
>>> > kernel diff fixes it:
>>> >
>>> > -- >8 --
>>> >
>>> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>> > index bc6bc8bb871d..af33f62069f0 100644
>>> > --- a/kernel/bpf/hashtab.c
>>> > +++ b/kernel/bpf/hashtab.c
>>> > @@ -953,7 +953,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
>>> >
>>> >         if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
>>> >                 bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr);
>>> > -       bpf_mem_cache_free(&htab->ma, l);
>>> > +       bpf_mem_cache_free_rcu(&htab->ma, l);
>>> >  }
>>> >
>>> >  static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l)
>>> >
>>> > -- 8< --
>>> >
>>> > Before:
>>> >
>>> > [root@alarm host0]# ./repro2
>>> > Running 10 threads for 60 seconds...
>>> >
>>> > Total checks:    49228421
>>> > Torn writes:     5470
>>> > Max torn fields: 3 / 7
>>> > Corruption rate: 0.011111%
>>> >
>>> > Cross-pattern breakdown:
>>> >   A in B: 8595
>>> >   C in B: 7826
>>> >   Unknown: 1
>>> >
>>> > First 20 events:
>>> >   [0] check #42061 seq=39070 CCCBBBB
>>> >   [1] check #65714 seq=60575 CCCBBBB
>>> >   [2] check #65287 seq=60575 CCCBBBB
>>> >   [3] check #70474 seq=65793 AAABBBB
>>> >   [4] check #70907 seq=65793 AAABBBB
>>> >   [5] check #103389 seq=95745 AAABBBB
>>> >   [6] check #107208 seq=98672 CCCBBBB
>>> >   [7] check #108218 seq=100387 CCCBBBB
>>> >   [8] check #111490 seq=103388 CCCBBBB
>>> >   [9] check #140942 seq=128894 CCCBBBB
>>> >   [10] check #164845 seq=151828 CCCBBBB
>>> >   [11] check #163993 seq=151828 CCCBBBB
>>> >   [12] check #169184 seq=155453 CCCBBBB
>>> >   [13] check #171383 seq=158572 AAABBBB
>>> >   [14] check #179943 seq=165425 CCCBBBB
>>> >   [15] check #189218 seq=173926 CCCBBBB
>>> >   [16] check #192119 seq=177892 CCCBBBB
>>> >   [17] check #194253 seq=180562 AAABBBB
>>> >   [18] check #202169 seq=187253 CCCBBBB
>>> >   [19] check #205452 seq=189021 CCCBBBB
>>> >
>>> > CORRUPTION DETECTED
>>> >
>>> > After:
>>> >
>>> > [root@alarm host0]# ./repro2
>>> > Running 10 threads for 60 seconds...
>>> >
>>> > Total checks:    108666576
>>> > Torn writes:     0
>>> > Max torn fields: 0 / 7
>>> >
>>> > No corruption detected (try more CPUs or longer run)
>>> > [root@alarm host0]# nproc
>>> > 16
>>> >
>>> > I will send a patch to fix this soon after validating the above kernel
>>> > diff and figuring out how we got to this state in htab_elem_free() by
>>> > analyzing the git history.
>>> >
>>> > Thanks for the report.
>>> > Puranjay

next prev parent reply	other threads:[~2026-03-26 15:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26  8:49 [BUG] bpf: use-after-free in hashtab BPF_F_LOCK in-place update path Aaron Esau
2026-03-26 13:39 ` Puranjay Mohan
2026-03-26 14:58   ` Kumar Kartikeya Dwivedi
2026-03-26 15:02   ` Puranjay Mohan
2026-03-26 15:26   ` Mykyta Yatsenko
2026-03-26 15:33     ` Puranjay Mohan
2026-03-26 15:43       ` Mykyta Yatsenko
2026-03-26 15:47         ` Mykyta Yatsenko [this message]
2026-03-26 15:57           ` Puranjay Mohan
2026-03-27  2:44             ` Aaron Esau
2026-03-27  3:21               ` Kumar Kartikeya Dwivedi
2026-03-27 16:09                 ` Mykyta Yatsenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ldfeiodk.fsf@gmail.com \
    --to=mykyta.yatsenko5@gmail.com \
    --cc=aaron1esau@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=puranjay12@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox