Re: [BUG] bpf: use-after-free in hashtab BPF_F_LOCK in-place update path

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
To: Puranjay Mohan <puranjay@kernel.org>,
	Aaron Esau <aaron1esau@gmail.com>,
	bpf@vger.kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org
Subject: Re: [BUG] bpf: use-after-free in hashtab BPF_F_LOCK in-place update path
Date: Thu, 26 Mar 2026 15:26:48 +0000	[thread overview]
Message-ID: <87qzp6ipc7.fsf@gmail.com> (raw)
In-Reply-To: <m2se9mg16x.fsf@kernel.org>

Puranjay Mohan <puranjay@kernel.org> writes:

> Aaron Esau <aaron1esau@gmail.com> writes:
>
>> Reported-by: Aaron Esau <aaron1esau@gmail.com>
>>
>> htab_map_update_elem() has a use-after-free when BPF_F_LOCK is used
>> for in-place updates.
>>
>> The BPF_F_LOCK path calls lookup_nulls_elem_raw() without holding the
>> bucket lock, then dereferences the element via copy_map_value_locked().
>> A concurrent htab_map_delete_elem() can delete and free the element
>> between these steps.
>>
>> free_htab_elem() uses bpf_mem_cache_free(), which immediately returns
>> the object to the per-CPU free list (not RCU-deferred). The memory may
>> be reallocated before copy_map_value_locked() executes, leading to
>> writes into a different element.
>>
>> When lookup succeeds (l_old != NULL), the in-place update path returns
>> early, so the “full lookup under lock” path is not taken.
>>
>> Race:
>>
>>   CPU 0: htab_map_update_elem (BPF_F_LOCK)
>>          lookup_nulls_elem_raw() → E (no bucket lock)
>>          ...
>>   CPU 1: htab_map_delete_elem()
>>          htab_lock_bucket → hlist_nulls_del_rcu → htab_unlock_bucket
>>          free_htab_elem → bpf_mem_cache_free (immediate free)
>>   CPU 1: htab_map_update_elem (new key)
>>          alloc_htab_elem → reuses E
>>   CPU 0: copy_map_value_locked(E, ...) → writes into reused object
>>
>> Reproduction:
>>
>>   1. Create BPF_MAP_TYPE_HASH with a value containing bpf_spin_lock
>>      (max_entries=64, 7 u64 fields + lock).
>>   2. Threads A: BPF_MAP_UPDATE_ELEM with BPF_F_LOCK (pattern 0xAAAA...)
>>   3. Threads B: DELETE + UPDATE (pattern 0xBBBB...) on same keys
>>   4. Threads C: same as A (pattern 0xCCCC...)
>>   5. Verifier threads: LOOKUP loop, detect mixed-pattern values
>>   6. Run 60s on >=4 CPUs
>>
>> Attached a POC. On 6.19.9 (4 vCPU QEMU, CONFIG_PREEMPT=y),
>> I observed ~645 torn values in 2.5M checks (~0.026%).
>>
>> Fixes: 96049f3afd50 ("bpf: introduce BPF_F_LOCK flag")
>
> Although this is a real issue, your reproducer is not accurate, it will
> see torn writes even without the UAF issue, because the verifier thread
> is not taking the lock:
>
> So the torn write pattern CCCAAAA can mean:
>   1. Thread A finished writing AAAAAAA (while holding the lock)
>   2. Thread C acquired the lock and started writing: field[0]=C, field[1]=C, field[2]=C...
>   3. The verifier thread reads (no lock): sees field[0]=C, field[1]=C, field[2]=C, field[3]=A, field[4]=A, field[5]=A, field[6]=A
>   4. Thread C finishes: field[3]=C, field[4]=C, field[5]=C, field[6]=C, releases lock
>
> This race happens regardless of whether the element is freed/reused.  It
> would happen even without thread B (the delete+readd thread). The
> corruption source is the non-atomic read, not the UAF.
Have you confirmed torn reads even with BPF_F_LOCK flag on the
BPF_MAP_LOOKUP_ELEM_CMD? I understand there must not be any torn reads
with spinlock taken on the lookup path.

The reproducer looks like a good selftest to have, but it needs to be
ported to use libbpf, currently it looks too complex.
>
> If you change the preproducer like:
>
> -- >8 --
>
> --- repro.c     2026-03-26 05:22:49.012503218 -0700
> +++ repro2.c    2026-03-26 06:24:40.951044279 -0700
> @@ -227,6 +227,7 @@
>         attr.map_fd = fd;
>         attr.key = (uint64_t)(unsigned long)key;
>         attr.value = (uint64_t)(unsigned long)val;
> +       attr.flags = BPF_F_LOCK;
>         return bpf_sys(BPF_MAP_LOOKUP_ELEM_CMD, &attr, sizeof(attr));
>  }
>
> -- 8< --
>
> Now it will detect the correct UAF problem.
>
> I verified that this updated reproducer shows the problem, the following
> kernel diff fixes it:
>
> -- >8 --
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index bc6bc8bb871d..af33f62069f0 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -953,7 +953,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
>
>         if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
>                 bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr);
> -       bpf_mem_cache_free(&htab->ma, l);
> +       bpf_mem_cache_free_rcu(&htab->ma, l);
>  }
>
>  static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l)
>
> -- 8< --
>
> Before:
>
> [root@alarm host0]# ./repro2
> Running 10 threads for 60 seconds...
>
> Total checks:    49228421
> Torn writes:     5470
> Max torn fields: 3 / 7
> Corruption rate: 0.011111%
>
> Cross-pattern breakdown:
>   A in B: 8595
>   C in B: 7826
>   Unknown: 1
>
> First 20 events:
>   [0] check #42061 seq=39070 CCCBBBB
>   [1] check #65714 seq=60575 CCCBBBB
>   [2] check #65287 seq=60575 CCCBBBB
>   [3] check #70474 seq=65793 AAABBBB
>   [4] check #70907 seq=65793 AAABBBB
>   [5] check #103389 seq=95745 AAABBBB
>   [6] check #107208 seq=98672 CCCBBBB
>   [7] check #108218 seq=100387 CCCBBBB
>   [8] check #111490 seq=103388 CCCBBBB
>   [9] check #140942 seq=128894 CCCBBBB
>   [10] check #164845 seq=151828 CCCBBBB
>   [11] check #163993 seq=151828 CCCBBBB
>   [12] check #169184 seq=155453 CCCBBBB
>   [13] check #171383 seq=158572 AAABBBB
>   [14] check #179943 seq=165425 CCCBBBB
>   [15] check #189218 seq=173926 CCCBBBB
>   [16] check #192119 seq=177892 CCCBBBB
>   [17] check #194253 seq=180562 AAABBBB
>   [18] check #202169 seq=187253 CCCBBBB
>   [19] check #205452 seq=189021 CCCBBBB
>
> CORRUPTION DETECTED
>
> After:
>
> [root@alarm host0]# ./repro2
> Running 10 threads for 60 seconds...
>
> Total checks:    108666576
> Torn writes:     0
> Max torn fields: 0 / 7
>
> No corruption detected (try more CPUs or longer run)
> [root@alarm host0]# nproc
> 16
>
> I will send a patch to fix this soon after validating the above kernel
> diff and figuring out how we got to this state in htab_elem_free() by
> analyzing the git history.
>
> Thanks for the report.
> Puranjay

next prev parent reply	other threads:[~2026-03-26 15:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26  8:49 [BUG] bpf: use-after-free in hashtab BPF_F_LOCK in-place update path Aaron Esau
2026-03-26 13:39 ` Puranjay Mohan
2026-03-26 14:58   ` Kumar Kartikeya Dwivedi
2026-03-26 15:02   ` Puranjay Mohan
2026-03-26 15:26   ` Mykyta Yatsenko [this message]
2026-03-26 15:33     ` Puranjay Mohan
2026-03-26 15:43       ` Mykyta Yatsenko
2026-03-26 15:47         ` Mykyta Yatsenko
2026-03-26 15:57           ` Puranjay Mohan
2026-03-27  2:44             ` Aaron Esau
2026-03-27  3:21               ` Kumar Kartikeya Dwivedi
2026-03-27 16:09                 ` Mykyta Yatsenko
2026-03-31  0:55                   ` Kumar Kartikeya Dwivedi
2026-03-31 11:39                     ` Mykyta Yatsenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87qzp6ipc7.fsf@gmail.com \
    --to=mykyta.yatsenko5@gmail.com \
    --cc=aaron1esau@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=puranjay@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.