* [PATCH v2] libbpf: fix UAF in strset__add_str()
[not found] <89d34016-cf82-4beb-989c-e4fc2e3cd29e@gmail.com>
@ 2026-05-15 4:47 ` Carlos Llamas
2026-05-15 22:08 ` Andrii Nakryiko
0 siblings, 1 reply; 10+ messages in thread
From: Carlos Llamas @ 2026-05-15 4:47 UTC (permalink / raw)
To: mykyta.yatsenko5, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa
Cc: kernel-team, linux-kernel, Carlos Llamas, Mykyta Yatsenko,
open list:BPF [GENERAL] (Safe Dynamic Programs and Tools)
strset_add_str_mem() might reallocate the strset data buffer in order to
accommodate the provided string 's'. However, if 's' points to a string
already present in the buffer, it becomes dangling after the realloc.
This leads to a use-after-free when attempting to memcpy() the string
into the new buffer.
One scenario that triggers this problematic path is when resolve_btfids
attempts to patch kfunc prototypes using existing BTF parameter names:
| resolve_btfids: function bpf_list_push_back_impl already exists in BTF
| Segmentation fault (core dumped)
Compiling resolve_btfids with fsanitize=address generates a detailed
report of the UAF:
| =================================================================
| ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
| ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
| READ of size 5 at 0x7f4c4a500bd4 thread T0
| #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
| #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
| #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
| #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
| #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
| #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
| #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
| #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
| #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
| #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
|
| 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
| freed by thread T0 here:
| #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
| #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
| #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
|
| previously allocated by thread T0 here:
| #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
| #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
While resolve_btfids could be refactored to avoid this call path, let's
instead fix this issue at the source in strset__add_str() and avoid
similar scenarios. Let's simply check whether 's' is already within the
strset data buffer boundaries, and skip appending it at the end of the
buffer again.
While already here, also fix strset__find_str() which suffers from the
same problem by factoring out the common operations into a new helper
function strset__offset() as suggested by Mykyta.
Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
v2:
Implemented the fix in strset__offset() helper as suggested by Mykyta.
Added support to handle "substrings" of existing ones.
Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
v1:
https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@google.com/
tools/lib/bpf/strset.c | 70 ++++++++++++++++++++++++++----------------
1 file changed, 43 insertions(+), 27 deletions(-)
diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
index 2464bcbd04e0..e50da238914d 100644
--- a/tools/lib/bpf/strset.c
+++ b/tools/lib/bpf/strset.c
@@ -107,25 +107,53 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
set->strs_data_len, set->strs_data_max_len, add_sz);
}
-/* Find string offset that corresponds to a given string *s*.
- * Returns:
- * - >0 offset into string data, if string is found;
- * - -ENOENT, if string is not in the string data;
- * - <0, on any other error.
+/* Returns the offset of the string in set, and populates *grow* with the
+ * length which set needs to grow by to include *s*.
*/
-int strset__find_str(struct strset *set, const char *s)
+static long strset__offset(struct strset *set, const char *s, long *grow)
{
- long old_off, new_off, len;
+ const char *strs = strset__data(set);
+ long len;
void *p;
- /* see strset__add_str() for why we do this */
+ /* Check whether 's' is already within the buffer */
+ if (strs && s >= strs && s < strs + set->strs_data_len) {
+ *grow = 0;
+ return s - strs;
+ }
+
+ /* Hashmap keys are always offsets within set->strs_data, so to even
+ * look up some string from the "outside", we need to first append it
+ * at the end, so that it can be addressed with an offset. Luckily,
+ * until set->strs_data_len is incremented, that string is just a piece
+ * of garbage for the rest of the code, so no harm, no foul. On the
+ * other hand, if the string is unique, it's already appended and
+ * ready to be used, only a simple set->strs_data_len increment away.
+ */
len = strlen(s) + 1;
p = strset_add_str_mem(set, len);
if (!p)
return -ENOMEM;
- new_off = set->strs_data_len;
memcpy(p, s, len);
+ *grow = len;
+
+ return set->strs_data_len;
+}
+
+/* Find string offset that corresponds to a given string *s*.
+ * Returns:
+ * - >0 offset into string data, if string is found;
+ * - -ENOENT, if string is not in the string data;
+ * - <0, on any other error.
+ */
+int strset__find_str(struct strset *set, const char *s)
+{
+ long old_off, new_off, grow;
+
+ new_off = strset__offset(set, s, &grow);
+ if (new_off < 0)
+ return new_off;
if (hashmap__find(set->strs_hash, new_off, &old_off))
return old_off;
@@ -141,25 +169,12 @@ int strset__find_str(struct strset *set, const char *s)
*/
int strset__add_str(struct strset *set, const char *s)
{
- long old_off, new_off, len;
- void *p;
+ long old_off, new_off, grow;
int err;
- /* Hashmap keys are always offsets within set->strs_data, so to even
- * look up some string from the "outside", we need to first append it
- * at the end, so that it can be addressed with an offset. Luckily,
- * until set->strs_data_len is incremented, that string is just a piece
- * of garbage for the rest of the code, so no harm, no foul. On the
- * other hand, if the string is unique, it's already appended and
- * ready to be used, only a simple set->strs_data_len increment away.
- */
- len = strlen(s) + 1;
- p = strset_add_str_mem(set, len);
- if (!p)
- return -ENOMEM;
-
- new_off = set->strs_data_len;
- memcpy(p, s, len);
+ new_off = strset__offset(set, s, &grow);
+ if (new_off < 0)
+ return new_off;
/* Now attempt to add the string, but only if the string with the same
* contents doesn't exist already (HASHMAP_ADD strategy). If such
@@ -172,6 +187,7 @@ int strset__add_str(struct strset *set, const char *s)
if (err)
return err;
- set->strs_data_len += len; /* new unique string, adjust data length */
+ set->strs_data_len += grow; /* new unique string, adjust data length */
+
return new_off;
}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] libbpf: fix UAF in strset__add_str()
2026-05-15 4:47 ` [PATCH v2] libbpf: fix UAF in strset__add_str() Carlos Llamas
@ 2026-05-15 22:08 ` Andrii Nakryiko
2026-05-18 4:59 ` Carlos Llamas
2026-05-18 5:05 ` [PATCH v3] " Carlos Llamas
0 siblings, 2 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2026-05-15 22:08 UTC (permalink / raw)
To: Carlos Llamas
Cc: mykyta.yatsenko5, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
kernel-team, linux-kernel, Mykyta Yatsenko,
open list:BPF [GENERAL] (Safe Dynamic Programs and Tools)
On Thu, May 14, 2026 at 9:48 PM Carlos Llamas <cmllamas@google.com> wrote:
>
> strset_add_str_mem() might reallocate the strset data buffer in order to
> accommodate the provided string 's'. However, if 's' points to a string
> already present in the buffer, it becomes dangling after the realloc.
> This leads to a use-after-free when attempting to memcpy() the string
> into the new buffer.
>
> One scenario that triggers this problematic path is when resolve_btfids
> attempts to patch kfunc prototypes using existing BTF parameter names:
>
> | resolve_btfids: function bpf_list_push_back_impl already exists in BTF
> | Segmentation fault (core dumped)
>
> Compiling resolve_btfids with fsanitize=address generates a detailed
> report of the UAF:
>
> | =================================================================
> | ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
> | ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
> | READ of size 5 at 0x7f4c4a500bd4 thread T0
> | #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
> | #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
> | #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
> | #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
> | #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
> | #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
> | #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
> | #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> | #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
> | #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
> |
> | 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
> | freed by thread T0 here:
> | #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
> | #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
> | #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
> |
> | previously allocated by thread T0 here:
> | #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
> | #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
>
> While resolve_btfids could be refactored to avoid this call path, let's
> instead fix this issue at the source in strset__add_str() and avoid
> similar scenarios. Let's simply check whether 's' is already within the
> strset data buffer boundaries, and skip appending it at the end of the
> buffer again.
>
> While already here, also fix strset__find_str() which suffers from the
> same problem by factoring out the common operations into a new helper
> function strset__offset() as suggested by Mykyta.
>
> Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
> Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
> ---
> v2:
> Implemented the fix in strset__offset() helper as suggested by Mykyta.
> Added support to handle "substrings" of existing ones.
> Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
>
> v1:
> https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@google.com/
>
> tools/lib/bpf/strset.c | 70 ++++++++++++++++++++++++++----------------
> 1 file changed, 43 insertions(+), 27 deletions(-)
>
> diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
> index 2464bcbd04e0..e50da238914d 100644
> --- a/tools/lib/bpf/strset.c
> +++ b/tools/lib/bpf/strset.c
> @@ -107,25 +107,53 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> set->strs_data_len, set->strs_data_max_len, add_sz);
> }
>
> -/* Find string offset that corresponds to a given string *s*.
> - * Returns:
> - * - >0 offset into string data, if string is found;
> - * - -ENOENT, if string is not in the string data;
> - * - <0, on any other error.
> +/* Returns the offset of the string in set, and populates *grow* with the
> + * length which set needs to grow by to include *s*.
> */
> -int strset__find_str(struct strset *set, const char *s)
> +static long strset__offset(struct strset *set, const char *s, long *grow)
tbh, it seems like
> {
> - long old_off, new_off, len;
> + const char *strs = strset__data(set);
> + long len;
> void *p;
>
> - /* see strset__add_str() for why we do this */
> + /* Check whether 's' is already within the buffer */
> + if (strs && s >= strs && s < strs + set->strs_data_len) {
> + *grow = 0;
> + return s - strs;
> + }
> +
> + /* Hashmap keys are always offsets within set->strs_data, so to even
> + * look up some string from the "outside", we need to first append it
> + * at the end, so that it can be addressed with an offset. Luckily,
> + * until set->strs_data_len is incremented, that string is just a piece
> + * of garbage for the rest of the code, so no harm, no foul. On the
> + * other hand, if the string is unique, it's already appended and
> + * ready to be used, only a simple set->strs_data_len increment away.
> + */
> len = strlen(s) + 1;
> p = strset_add_str_mem(set, len);
> if (!p)
> return -ENOMEM;
>
> - new_off = set->strs_data_len;
> memcpy(p, s, len);
> + *grow = len;
> +
> + return set->strs_data_len;
> +}
> +
> +/* Find string offset that corresponds to a given string *s*.
> + * Returns:
> + * - >0 offset into string data, if string is found;
> + * - -ENOENT, if string is not in the string data;
> + * - <0, on any other error.
> + */
> +int strset__find_str(struct strset *set, const char *s)
> +{
> + long old_off, new_off, grow;
> +
> + new_off = strset__offset(set, s, &grow);
> + if (new_off < 0)
> + return new_off;
>
I'm not a fan of bypassing strs_hash, tbh... see below
> if (hashmap__find(set->strs_hash, new_off, &old_off))
> return old_off;
> @@ -141,25 +169,12 @@ int strset__find_str(struct strset *set, const char *s)
> */
> int strset__add_str(struct strset *set, const char *s)
> {
> - long old_off, new_off, len;
> - void *p;
> + long old_off, new_off, grow;
> int err;
>
> - /* Hashmap keys are always offsets within set->strs_data, so to even
> - * look up some string from the "outside", we need to first append it
> - * at the end, so that it can be addressed with an offset. Luckily,
> - * until set->strs_data_len is incremented, that string is just a piece
> - * of garbage for the rest of the code, so no harm, no foul. On the
> - * other hand, if the string is unique, it's already appended and
> - * ready to be used, only a simple set->strs_data_len increment away.
> - */
> - len = strlen(s) + 1;
> - p = strset_add_str_mem(set, len);
> - if (!p)
> - return -ENOMEM;
> -
> - new_off = set->strs_data_len;
> - memcpy(p, s, len);
> + new_off = strset__offset(set, s, &grow);
> + if (new_off < 0)
> + return new_off;
I agree that this is a usability problem, but I'd handle it by
recalculating s pointer to correct one if it happens to be coming from
invalidated strs_data. Something along the lines of:
const char *old_data = set->strs_data;
size_t old_data_len = set->strs_data_len;
p = strset_add_str_mem(...);
if (!p)
return -ENOMEM;
if (p != old_data && s >= old_data && s < old_data + old_data_len)
s = p + (s - old_data);
At this point s will be correct even if it was invalidated.
pw-bot: cr
>
> /* Now attempt to add the string, but only if the string with the same
> * contents doesn't exist already (HASHMAP_ADD strategy). If such
> @@ -172,6 +187,7 @@ int strset__add_str(struct strset *set, const char *s)
> if (err)
> return err;
>
> - set->strs_data_len += len; /* new unique string, adjust data length */
> + set->strs_data_len += grow; /* new unique string, adjust data length */
> +
> return new_off;
> }
> --
> 2.54.0.563.g4f69b47b94-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] libbpf: fix UAF in strset__add_str()
2026-05-15 22:08 ` Andrii Nakryiko
@ 2026-05-18 4:59 ` Carlos Llamas
2026-05-18 5:05 ` [PATCH v3] " Carlos Llamas
1 sibling, 0 replies; 10+ messages in thread
From: Carlos Llamas @ 2026-05-18 4:59 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: mykyta.yatsenko5, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
kernel-team, linux-kernel, Mykyta Yatsenko,
open list:BPF [GENERAL] (Safe Dynamic Programs and Tools)
On Fri, May 15, 2026 at 03:08:47PM -0700, Andrii Nakryiko wrote:
> I agree that this is a usability problem, but I'd handle it by
> recalculating s pointer to correct one if it happens to be coming from
> invalidated strs_data. Something along the lines of:
>
> const char *old_data = set->strs_data;
> size_t old_data_len = set->strs_data_len;
>
> p = strset_add_str_mem(...);
> if (!p)
> return -ENOMEM;
>
> if (p != old_data && s >= old_data && s < old_data + old_data_len)
> s = p + (s - old_data);
>
>
> At this point s will be correct even if it was invalidated.
>
> pw-bot: cr
Oh right, that totally works. I've tested the changes and will send a
new patch. Obviously 'p' was meant to be 'set->strs_data' in the check.
Thanks,
--
Carlos Llamas
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3] libbpf: fix UAF in strset__add_str()
2026-05-15 22:08 ` Andrii Nakryiko
2026-05-18 4:59 ` Carlos Llamas
@ 2026-05-18 5:05 ` Carlos Llamas
2026-05-18 17:36 ` Andrii Nakryiko
1 sibling, 1 reply; 10+ messages in thread
From: Carlos Llamas @ 2026-05-18 5:05 UTC (permalink / raw)
To: mykyta.yatsenko5, andrii.nakryiko, Andrii Nakryiko,
Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
Yonghong Song, Jiri Olsa
Cc: kernel-team, linux-kernel, Carlos Llamas, Mykyta Yatsenko,
open list:BPF [GENERAL] (Safe Dynamic Programs and Tools)
strset_add_str_mem() might reallocate the strset data buffer in order to
accommodate the provided string 's'. However, if 's' points to a string
already present in the buffer, it becomes dangling after the realloc.
This leads to a use-after-free when attempting to memcpy() the string
into the new buffer.
One scenario that triggers this problematic path is when resolve_btfids
attempts to patch kfunc prototypes using existing BTF parameter names:
| resolve_btfids: function bpf_list_push_back_impl already exists in BTF
| Segmentation fault (core dumped)
Compiling resolve_btfids with fsanitize=address generates a detailed
report of the UAF:
| =================================================================
| ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
| ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
| READ of size 5 at 0x7f4c4a500bd4 thread T0
| #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
| #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
| #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
| #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
| #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
| #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
| #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
| #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
| #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
| #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
|
| 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
| freed by thread T0 here:
| #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
| #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
| #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
|
| previously allocated by thread T0 here:
| #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
| #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
While resolve_btfids could be refactored to avoid this call path, let's
instead fix this issue at the source in strset__add_str() and avoid
similar scenarios.
Let's check if set->strs_data was reallocated and whether 's' points to
an internal string within the old strset buffer. In such case, 's' is
reconstructed to point to the new buffer.
While already here, also fix strset__find_str() which suffers from the
same problem by factoring out the common operations into a new helper
function strset_str_append().
Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
v3:
Switch to 's' reconstruction approach suggested by Andrii.
Adjusted names and commit log accordingly.
v2:
Implemented the fix in strset__offset() helper as suggested by Mykyta.
Added support to handle "substrings" of existing ones.
Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
https://lore.kernel.org/all/20260515044759.2863546-1-cmllamas@google.com/
v1:
https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@google.com/
tools/lib/bpf/strset.c | 58 +++++++++++++++++++++++++++---------------
1 file changed, 37 insertions(+), 21 deletions(-)
diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
index 2464bcbd04e0..d229961ff2fc 100644
--- a/tools/lib/bpf/strset.c
+++ b/tools/lib/bpf/strset.c
@@ -107,6 +107,37 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
set->strs_data_len, set->strs_data_max_len, add_sz);
}
+static long strset_str_append(struct strset *set, const char *s)
+{
+ const char *old_data = strset__data(set);
+ long len = strlen(s) + 1;
+ void *p;
+
+ /* Hashmap keys are always offsets within set->strs_data, so to even
+ * look up some string from the "outside", we need to first append it
+ * at the end, so that it can be addressed with an offset. Luckily,
+ * until set->strs_data_len is incremented, that string is just a piece
+ * of garbage for the rest of the code, so no harm, no foul. On the
+ * other hand, if the string is unique, it's already appended and
+ * ready to be used, only a simple set->strs_data_len increment away.
+ */
+ p = strset_add_str_mem(set, len);
+ if (!p)
+ return -ENOMEM;
+
+ /* The set->strs_data might have reallocated and if 's' pointed
+ * to an internal string within the old buffer, then it became
+ * dangling and needs to be reconstructed before the copy.
+ */
+ if (old_data && old_data != strset__data(set) &&
+ s >= old_data && s < old_data + set->strs_data_len)
+ s = strset__data(set) + (s - old_data);
+
+ memcpy(p, s, len);
+
+ return len;
+}
+
/* Find string offset that corresponds to a given string *s*.
* Returns:
* - >0 offset into string data, if string is found;
@@ -116,16 +147,12 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
int strset__find_str(struct strset *set, const char *s)
{
long old_off, new_off, len;
- void *p;
- /* see strset__add_str() for why we do this */
- len = strlen(s) + 1;
- p = strset_add_str_mem(set, len);
- if (!p)
- return -ENOMEM;
+ len = strset_str_append(set, s);
+ if (len < 0)
+ return len;
new_off = set->strs_data_len;
- memcpy(p, s, len);
if (hashmap__find(set->strs_hash, new_off, &old_off))
return old_off;
@@ -142,24 +169,13 @@ int strset__find_str(struct strset *set, const char *s)
int strset__add_str(struct strset *set, const char *s)
{
long old_off, new_off, len;
- void *p;
int err;
- /* Hashmap keys are always offsets within set->strs_data, so to even
- * look up some string from the "outside", we need to first append it
- * at the end, so that it can be addressed with an offset. Luckily,
- * until set->strs_data_len is incremented, that string is just a piece
- * of garbage for the rest of the code, so no harm, no foul. On the
- * other hand, if the string is unique, it's already appended and
- * ready to be used, only a simple set->strs_data_len increment away.
- */
- len = strlen(s) + 1;
- p = strset_add_str_mem(set, len);
- if (!p)
- return -ENOMEM;
+ len = strset_str_append(set, s);
+ if (len < 0)
+ return len;
new_off = set->strs_data_len;
- memcpy(p, s, len);
/* Now attempt to add the string, but only if the string with the same
* contents doesn't exist already (HASHMAP_ADD strategy). If such
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3] libbpf: fix UAF in strset__add_str()
2026-05-18 5:05 ` [PATCH v3] " Carlos Llamas
@ 2026-05-18 17:36 ` Andrii Nakryiko
2026-05-22 18:35 ` Andrii Nakryiko
2026-05-23 16:27 ` [PATCH v4] " Carlos Llamas
0 siblings, 2 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2026-05-18 17:36 UTC (permalink / raw)
To: Carlos Llamas
Cc: mykyta.yatsenko5, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
kernel-team, linux-kernel, Mykyta Yatsenko,
open list:BPF [GENERAL] (Safe Dynamic Programs and Tools)
On Sun, May 17, 2026 at 10:05 PM Carlos Llamas <cmllamas@google.com> wrote:
>
> strset_add_str_mem() might reallocate the strset data buffer in order to
> accommodate the provided string 's'. However, if 's' points to a string
> already present in the buffer, it becomes dangling after the realloc.
> This leads to a use-after-free when attempting to memcpy() the string
> into the new buffer.
>
> One scenario that triggers this problematic path is when resolve_btfids
> attempts to patch kfunc prototypes using existing BTF parameter names:
>
> | resolve_btfids: function bpf_list_push_back_impl already exists in BTF
> | Segmentation fault (core dumped)
>
> Compiling resolve_btfids with fsanitize=address generates a detailed
> report of the UAF:
>
> | =================================================================
> | ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
> | ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
> | READ of size 5 at 0x7f4c4a500bd4 thread T0
> | #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
> | #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
> | #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
> | #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
> | #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
> | #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
> | #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
> | #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> | #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
> | #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
> |
> | 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
> | freed by thread T0 here:
> | #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
> | #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
> | #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
> |
> | previously allocated by thread T0 here:
> | #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
> | #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
>
> While resolve_btfids could be refactored to avoid this call path, let's
> instead fix this issue at the source in strset__add_str() and avoid
> similar scenarios.
>
> Let's check if set->strs_data was reallocated and whether 's' points to
> an internal string within the old strset buffer. In such case, 's' is
> reconstructed to point to the new buffer.
>
> While already here, also fix strset__find_str() which suffers from the
> same problem by factoring out the common operations into a new helper
> function strset_str_append().
>
> Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
> ---
> v3:
> Switch to 's' reconstruction approach suggested by Andrii.
> Adjusted names and commit log accordingly.
>
> v2:
> Implemented the fix in strset__offset() helper as suggested by Mykyta.
> Added support to handle "substrings" of existing ones.
> Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
> https://lore.kernel.org/all/20260515044759.2863546-1-cmllamas@google.com/
>
> v1:
> https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@google.com/
> tools/lib/bpf/strset.c | 58 +++++++++++++++++++++++++++---------------
> 1 file changed, 37 insertions(+), 21 deletions(-)
>
> diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
> index 2464bcbd04e0..d229961ff2fc 100644
> --- a/tools/lib/bpf/strset.c
> +++ b/tools/lib/bpf/strset.c
> @@ -107,6 +107,37 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> set->strs_data_len, set->strs_data_max_len, add_sz);
> }
>
> +static long strset_str_append(struct strset *set, const char *s)
> +{
> + const char *old_data = strset__data(set);
sashiko suggests to use uintptr_t for this detection to avoid
undefined behavior, let's do that then)?
but other than that and a nit below, looks good, thanks!
pw-bot: cr
> + long len = strlen(s) + 1;
> + void *p;
> +
> + /* Hashmap keys are always offsets within set->strs_data, so to even
> + * look up some string from the "outside", we need to first append it
> + * at the end, so that it can be addressed with an offset. Luckily,
> + * until set->strs_data_len is incremented, that string is just a piece
> + * of garbage for the rest of the code, so no harm, no foul. On the
> + * other hand, if the string is unique, it's already appended and
> + * ready to be used, only a simple set->strs_data_len increment away.
> + */
> + p = strset_add_str_mem(set, len);
> + if (!p)
> + return -ENOMEM;
> +
> + /* The set->strs_data might have reallocated and if 's' pointed
> + * to an internal string within the old buffer, then it became
> + * dangling and needs to be reconstructed before the copy.
> + */
> + if (old_data && old_data != strset__data(set) &&
> + s >= old_data && s < old_data + set->strs_data_len)
> + s = strset__data(set) + (s - old_data);
nit: strset__data() is more of an "external api", note how we access
strs_data_len directly through field. So let's use s->strs_data access
consistently.
> +
> + memcpy(p, s, len);
> +
> + return len;
> +}
> +
> /* Find string offset that corresponds to a given string *s*.
> * Returns:
> * - >0 offset into string data, if string is found;
> @@ -116,16 +147,12 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> int strset__find_str(struct strset *set, const char *s)
> {
> long old_off, new_off, len;
> - void *p;
>
> - /* see strset__add_str() for why we do this */
> - len = strlen(s) + 1;
> - p = strset_add_str_mem(set, len);
> - if (!p)
> - return -ENOMEM;
> + len = strset_str_append(set, s);
> + if (len < 0)
> + return len;
>
> new_off = set->strs_data_len;
> - memcpy(p, s, len);
>
> if (hashmap__find(set->strs_hash, new_off, &old_off))
> return old_off;
> @@ -142,24 +169,13 @@ int strset__find_str(struct strset *set, const char *s)
> int strset__add_str(struct strset *set, const char *s)
> {
> long old_off, new_off, len;
> - void *p;
> int err;
>
> - /* Hashmap keys are always offsets within set->strs_data, so to even
> - * look up some string from the "outside", we need to first append it
> - * at the end, so that it can be addressed with an offset. Luckily,
> - * until set->strs_data_len is incremented, that string is just a piece
> - * of garbage for the rest of the code, so no harm, no foul. On the
> - * other hand, if the string is unique, it's already appended and
> - * ready to be used, only a simple set->strs_data_len increment away.
> - */
> - len = strlen(s) + 1;
> - p = strset_add_str_mem(set, len);
> - if (!p)
> - return -ENOMEM;
> + len = strset_str_append(set, s);
> + if (len < 0)
> + return len;
>
> new_off = set->strs_data_len;
> - memcpy(p, s, len);
>
> /* Now attempt to add the string, but only if the string with the same
> * contents doesn't exist already (HASHMAP_ADD strategy). If such
> --
> 2.54.0.563.g4f69b47b94-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] libbpf: fix UAF in strset__add_str()
2026-05-18 17:36 ` Andrii Nakryiko
@ 2026-05-22 18:35 ` Andrii Nakryiko
2026-05-22 18:50 ` Carlos Llamas
2026-05-23 16:27 ` [PATCH v4] " Carlos Llamas
1 sibling, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2026-05-22 18:35 UTC (permalink / raw)
To: Carlos Llamas
Cc: mykyta.yatsenko5, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
kernel-team, linux-kernel, Mykyta Yatsenko,
open list:BPF [GENERAL] (Safe Dynamic Programs and Tools)
On Mon, May 18, 2026 at 10:36 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sun, May 17, 2026 at 10:05 PM Carlos Llamas <cmllamas@google.com> wrote:
> >
> > strset_add_str_mem() might reallocate the strset data buffer in order to
> > accommodate the provided string 's'. However, if 's' points to a string
> > already present in the buffer, it becomes dangling after the realloc.
> > This leads to a use-after-free when attempting to memcpy() the string
> > into the new buffer.
> >
> > One scenario that triggers this problematic path is when resolve_btfids
> > attempts to patch kfunc prototypes using existing BTF parameter names:
> >
> > | resolve_btfids: function bpf_list_push_back_impl already exists in BTF
> > | Segmentation fault (core dumped)
> >
> > Compiling resolve_btfids with fsanitize=address generates a detailed
> > report of the UAF:
> >
> > | =================================================================
> > | ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
> > | ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
> > | READ of size 5 at 0x7f4c4a500bd4 thread T0
> > | #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
> > | #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
> > | #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
> > | #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
> > | #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
> > | #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
> > | #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
> > | #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> > | #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
> > | #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
> > |
> > | 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
> > | freed by thread T0 here:
> > | #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
> > | #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
> > | #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
> > |
> > | previously allocated by thread T0 here:
> > | #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
> > | #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
> >
> > While resolve_btfids could be refactored to avoid this call path, let's
> > instead fix this issue at the source in strset__add_str() and avoid
> > similar scenarios.
> >
> > Let's check if set->strs_data was reallocated and whether 's' points to
> > an internal string within the old strset buffer. In such case, 's' is
> > reconstructed to point to the new buffer.
> >
> > While already here, also fix strset__find_str() which suffers from the
> > same problem by factoring out the common operations into a new helper
> > function strset_str_append().
> >
> > Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
> > Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> > Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
> > Signed-off-by: Carlos Llamas <cmllamas@google.com>
> > ---
> > v3:
> > Switch to 's' reconstruction approach suggested by Andrii.
> > Adjusted names and commit log accordingly.
> >
> > v2:
> > Implemented the fix in strset__offset() helper as suggested by Mykyta.
> > Added support to handle "substrings" of existing ones.
> > Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
> > https://lore.kernel.org/all/20260515044759.2863546-1-cmllamas@google.com/
> >
> > v1:
> > https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@google.com/
> > tools/lib/bpf/strset.c | 58 +++++++++++++++++++++++++++---------------
> > 1 file changed, 37 insertions(+), 21 deletions(-)
> >
> > diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
> > index 2464bcbd04e0..d229961ff2fc 100644
> > --- a/tools/lib/bpf/strset.c
> > +++ b/tools/lib/bpf/strset.c
> > @@ -107,6 +107,37 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> > set->strs_data_len, set->strs_data_max_len, add_sz);
> > }
> >
> > +static long strset_str_append(struct strset *set, const char *s)
> > +{
> > + const char *old_data = strset__data(set);
>
> sashiko suggests to use uintptr_t for this detection to avoid
> undefined behavior, let's do that then)?
>
> but other than that and a nit below, looks good, thanks!
>
> pw-bot: cr
Ping. Carlos, are you planning to send a follow up?
>
> > + long len = strlen(s) + 1;
> > + void *p;
> > +
> > + /* Hashmap keys are always offsets within set->strs_data, so to even
> > + * look up some string from the "outside", we need to first append it
> > + * at the end, so that it can be addressed with an offset. Luckily,
> > + * until set->strs_data_len is incremented, that string is just a piece
> > + * of garbage for the rest of the code, so no harm, no foul. On the
> > + * other hand, if the string is unique, it's already appended and
> > + * ready to be used, only a simple set->strs_data_len increment away.
> > + */
> > + p = strset_add_str_mem(set, len);
> > + if (!p)
> > + return -ENOMEM;
> > +
> > + /* The set->strs_data might have reallocated and if 's' pointed
> > + * to an internal string within the old buffer, then it became
> > + * dangling and needs to be reconstructed before the copy.
> > + */
> > + if (old_data && old_data != strset__data(set) &&
> > + s >= old_data && s < old_data + set->strs_data_len)
> > + s = strset__data(set) + (s - old_data);
>
> nit: strset__data() is more of an "external api", note how we access
> strs_data_len directly through field. So let's use s->strs_data access
> consistently.
>
> > +
> > + memcpy(p, s, len);
> > +
> > + return len;
> > +}
> > +
> > /* Find string offset that corresponds to a given string *s*.
> > * Returns:
> > * - >0 offset into string data, if string is found;
> > @@ -116,16 +147,12 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> > int strset__find_str(struct strset *set, const char *s)
> > {
> > long old_off, new_off, len;
> > - void *p;
> >
> > - /* see strset__add_str() for why we do this */
> > - len = strlen(s) + 1;
> > - p = strset_add_str_mem(set, len);
> > - if (!p)
> > - return -ENOMEM;
> > + len = strset_str_append(set, s);
> > + if (len < 0)
> > + return len;
> >
> > new_off = set->strs_data_len;
> > - memcpy(p, s, len);
> >
> > if (hashmap__find(set->strs_hash, new_off, &old_off))
> > return old_off;
> > @@ -142,24 +169,13 @@ int strset__find_str(struct strset *set, const char *s)
> > int strset__add_str(struct strset *set, const char *s)
> > {
> > long old_off, new_off, len;
> > - void *p;
> > int err;
> >
> > - /* Hashmap keys are always offsets within set->strs_data, so to even
> > - * look up some string from the "outside", we need to first append it
> > - * at the end, so that it can be addressed with an offset. Luckily,
> > - * until set->strs_data_len is incremented, that string is just a piece
> > - * of garbage for the rest of the code, so no harm, no foul. On the
> > - * other hand, if the string is unique, it's already appended and
> > - * ready to be used, only a simple set->strs_data_len increment away.
> > - */
> > - len = strlen(s) + 1;
> > - p = strset_add_str_mem(set, len);
> > - if (!p)
> > - return -ENOMEM;
> > + len = strset_str_append(set, s);
> > + if (len < 0)
> > + return len;
> >
> > new_off = set->strs_data_len;
> > - memcpy(p, s, len);
> >
> > /* Now attempt to add the string, but only if the string with the same
> > * contents doesn't exist already (HASHMAP_ADD strategy). If such
> > --
> > 2.54.0.563.g4f69b47b94-goog
> >
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] libbpf: fix UAF in strset__add_str()
2026-05-22 18:35 ` Andrii Nakryiko
@ 2026-05-22 18:50 ` Carlos Llamas
0 siblings, 0 replies; 10+ messages in thread
From: Carlos Llamas @ 2026-05-22 18:50 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: mykyta.yatsenko5, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
kernel-team, linux-kernel, Mykyta Yatsenko,
open list:BPF [GENERAL] (Safe Dynamic Programs and Tools)
On Fri, May 22, 2026 at 11:35:58AM -0700, Andrii Nakryiko wrote:
> >
> > sashiko suggests to use uintptr_t for this detection to avoid
> > undefined behavior, let's do that then)?
> >
> > but other than that and a nit below, looks good, thanks!
> >
> > pw-bot: cr
>
> Ping. Carlos, are you planning to send a follow up?
Hey Andrii sorry about the delay I got caught up with something else
here. I'll send a new version in a bit with the uintptr_t casting and
your suggestion. I would just say that it is a bit unfortunate that
we'll have to clutter the code with this kind of parsing to avoid the
compiler behavior that was mentioned. :(
Thanks,
Carlos Llamas
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4] libbpf: fix UAF in strset__add_str()
2026-05-18 17:36 ` Andrii Nakryiko
2026-05-22 18:35 ` Andrii Nakryiko
@ 2026-05-23 16:27 ` Carlos Llamas
2026-05-23 17:05 ` bot+bpf-ci
2026-05-28 21:31 ` Andrii Nakryiko
1 sibling, 2 replies; 10+ messages in thread
From: Carlos Llamas @ 2026-05-23 16:27 UTC (permalink / raw)
To: mykyta.yatsenko5, andrii.nakryiko, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Song Liu,
Yonghong Song, Jiri Olsa
Cc: kernel-team, linux-kernel, Carlos Llamas, Mykyta Yatsenko,
open list:BPF [LIBRARY] (libbpf)
strset_add_str_mem() might reallocate the strset data buffer in order to
accommodate the provided string 's'. However, if 's' points to a string
already present in the buffer, it becomes dangling after the realloc.
This leads to a use-after-free when attempting to memcpy() the string
into the new buffer.
One scenario that triggers this problematic path is when resolve_btfids
attempts to patch kfunc prototypes using existing BTF parameter names:
| resolve_btfids: function bpf_list_push_back_impl already exists in BTF
| Segmentation fault (core dumped)
Compiling resolve_btfids with fsanitize=address generates a detailed
report of the UAF:
| =================================================================
| ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
| ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
| READ of size 5 at 0x7f4c4a500bd4 thread T0
| #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
| #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
| #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
| #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
| #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
| #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
| #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
| #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
| #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
| #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
|
| 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
| freed by thread T0 here:
| #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
| #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
| #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
|
| previously allocated by thread T0 here:
| #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
| #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
While resolve_btfids could be refactored to avoid this call path, let's
instead fix this issue at the source in strset__add_str() and avoid
similar scenarios.
Let's check if set->strs_data was reallocated and whether 's' points to
an internal string within the old strset buffer. In such case, 's' is
reconstructed to point to the new buffer.
While already here, also fix strset__find_str() which suffers from the
same problem by factoring out the common operations into a new helper
function strset_str_append().
Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
---
v4:
Store pointers as integers in advance before realloc to prevent UB.
Access set->strs_data directly, not through external API.
v3:
Switch to 's' reconstruction approach suggested by Andrii.
Adjusted names and commit log accordingly.
https://lore.kernel.org/all/20260518050550.2600101-1-cmllamas@google.com/
v2:
Implemented the fix in strset__offset() helper as suggested by Mykyta.
Added support to handle "substrings" of existing ones.
Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
https://lore.kernel.org/all/20260515044759.2863546-1-cmllamas@google.com/
v1:
https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@google.com/
tools/lib/bpf/strset.c | 59 +++++++++++++++++++++++++++---------------
1 file changed, 38 insertions(+), 21 deletions(-)
diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
index 2464bcbd04e0..b9faca828f09 100644
--- a/tools/lib/bpf/strset.c
+++ b/tools/lib/bpf/strset.c
@@ -107,6 +107,38 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
set->strs_data_len, set->strs_data_max_len, add_sz);
}
+static long strset_str_append(struct strset *set, const char *s)
+{
+ uintptr_t old_data = (uintptr_t)set->strs_data;
+ uintptr_t old_s = (uintptr_t)s;
+ long len = strlen(s) + 1;
+ void *p;
+
+ /* Hashmap keys are always offsets within set->strs_data, so to even
+ * look up some string from the "outside", we need to first append it
+ * at the end, so that it can be addressed with an offset. Luckily,
+ * until set->strs_data_len is incremented, that string is just a piece
+ * of garbage for the rest of the code, so no harm, no foul. On the
+ * other hand, if the string is unique, it's already appended and
+ * ready to be used, only a simple set->strs_data_len increment away.
+ */
+ p = strset_add_str_mem(set, len);
+ if (!p)
+ return -ENOMEM;
+
+ /* The set->strs_data might have reallocated and if 's' pointed
+ * to an internal string within the old buffer, then it became
+ * dangling and needs to be reconstructed before the copy.
+ */
+ if (old_data && old_data != (uintptr_t)set->strs_data &&
+ old_s >= old_data && old_s < old_data + set->strs_data_len)
+ s = set->strs_data + (old_s - old_data);
+
+ memcpy(p, s, len);
+
+ return len;
+}
+
/* Find string offset that corresponds to a given string *s*.
* Returns:
* - >0 offset into string data, if string is found;
@@ -116,16 +148,12 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
int strset__find_str(struct strset *set, const char *s)
{
long old_off, new_off, len;
- void *p;
- /* see strset__add_str() for why we do this */
- len = strlen(s) + 1;
- p = strset_add_str_mem(set, len);
- if (!p)
- return -ENOMEM;
+ len = strset_str_append(set, s);
+ if (len < 0)
+ return len;
new_off = set->strs_data_len;
- memcpy(p, s, len);
if (hashmap__find(set->strs_hash, new_off, &old_off))
return old_off;
@@ -142,24 +170,13 @@ int strset__find_str(struct strset *set, const char *s)
int strset__add_str(struct strset *set, const char *s)
{
long old_off, new_off, len;
- void *p;
int err;
- /* Hashmap keys are always offsets within set->strs_data, so to even
- * look up some string from the "outside", we need to first append it
- * at the end, so that it can be addressed with an offset. Luckily,
- * until set->strs_data_len is incremented, that string is just a piece
- * of garbage for the rest of the code, so no harm, no foul. On the
- * other hand, if the string is unique, it's already appended and
- * ready to be used, only a simple set->strs_data_len increment away.
- */
- len = strlen(s) + 1;
- p = strset_add_str_mem(set, len);
- if (!p)
- return -ENOMEM;
+ len = strset_str_append(set, s);
+ if (len < 0)
+ return len;
new_off = set->strs_data_len;
- memcpy(p, s, len);
/* Now attempt to add the string, but only if the string with the same
* contents doesn't exist already (HASHMAP_ADD strategy). If such
--
2.54.0.746.g67dd491aae-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4] libbpf: fix UAF in strset__add_str()
2026-05-23 16:27 ` [PATCH v4] " Carlos Llamas
@ 2026-05-23 17:05 ` bot+bpf-ci
2026-05-28 21:31 ` Andrii Nakryiko
1 sibling, 0 replies; 10+ messages in thread
From: bot+bpf-ci @ 2026-05-23 17:05 UTC (permalink / raw)
To: cmllamas, mykyta.yatsenko5, andrii.nakryiko, ast, daniel, andrii,
martin.lau, eddyz87, memxor, song, yonghong.song, jolsa
Cc: kernel-team, linux-kernel, cmllamas, yatsenko, bpf, ast, andrii,
daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 2779 bytes --]
> diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
> index 2464bcbd04e0..b9faca828f09 100644
> --- a/tools/lib/bpf/strset.c
> +++ b/tools/lib/bpf/strset.c
> @@ -107,6 +107,38 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> set->strs_data_len, set->strs_data_max_len, add_sz);
> }
>
> +static long strset_str_append(struct strset *set, const char *s)
> +{
> + uintptr_t old_data = (uintptr_t)set->strs_data;
> + uintptr_t old_s = (uintptr_t)s;
> + long len = strlen(s) + 1;
> + void *p;
> +
> + /* Hashmap keys are always offsets within set->strs_data, so to even
> + * look up some string from the "outside", we need to first append it
> + * at the end, so that it can be addressed with an offset. Luckily,
> + * until set->strs_data_len is incremented, that string is just a piece
> + * of garbage for the rest of the code, so no harm, no foul. On the
> + * other hand, if the string is unique, it's already appended and
> + * ready to be used, only a simple set->strs_data_len increment away.
> + */
> + p = strset_add_str_mem(set, len);
> + if (!p)
> + return -ENOMEM;
> +
> + /* The set->strs_data might have reallocated and if 's' pointed
> + * to an internal string within the old buffer, then it became
> + * dangling and needs to be reconstructed before the copy.
> + */
> + if (old_data && old_data != (uintptr_t)set->strs_data &&
> + old_s >= old_data && old_s < old_data + set->strs_data_len)
> + s = set->strs_data + (old_s - old_data);
> +
> + memcpy(p, s, len);
> +
> + return len;
> +}
Does a similar use-after-free still exist in the btf__add_* APIs (e.g.
btf__add_str) when called with strings from btf__name_by_offset() on an
unmodifiable BTF?
If a user passes a string retrieved via btf__name_by_offset(), the
string pointer points directly into btf->raw_data. Inside the btf__add_*
APIs, btf_ensure_modifiable(btf) is called first, which calls
btf_invalidate_raw_data() and frees btf->raw_data.
Could this leave the s pointer dangling before it even reaches this
check? The check added here uses old_data which corresponds to
btf->strs_set, not the freed btf->raw_data, so it wouldn't rescue the
pointer.
This concern was raised by sashiko-bot@kernel.org on lore but was not
addressed in subsequent patch versions.
Reference: https://lore.kernel.org/all/20260518053124.2EF28C2BCB7@smtp.kernel.org/
> +
> /* Find string offset that corresponds to a given string *s*.
> * Returns:
> * - >0 offset into string data, if string is found;
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26338132147
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4] libbpf: fix UAF in strset__add_str()
2026-05-23 16:27 ` [PATCH v4] " Carlos Llamas
2026-05-23 17:05 ` bot+bpf-ci
@ 2026-05-28 21:31 ` Andrii Nakryiko
1 sibling, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2026-05-28 21:31 UTC (permalink / raw)
To: Carlos Llamas
Cc: mykyta.yatsenko5, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
kernel-team, linux-kernel, Mykyta Yatsenko,
open list:BPF [LIBRARY] (libbpf)
On Sat, May 23, 2026 at 9:27 AM Carlos Llamas <cmllamas@google.com> wrote:
>
> strset_add_str_mem() might reallocate the strset data buffer in order to
> accommodate the provided string 's'. However, if 's' points to a string
> already present in the buffer, it becomes dangling after the realloc.
> This leads to a use-after-free when attempting to memcpy() the string
> into the new buffer.
>
> One scenario that triggers this problematic path is when resolve_btfids
> attempts to patch kfunc prototypes using existing BTF parameter names:
>
> | resolve_btfids: function bpf_list_push_back_impl already exists in BTF
> | Segmentation fault (core dumped)
>
> Compiling resolve_btfids with fsanitize=address generates a detailed
> report of the UAF:
>
> | =================================================================
> | ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
> | ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
> | READ of size 5 at 0x7f4c4a500bd4 thread T0
> | #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
> | #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
> | #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
> | #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
> | #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
> | #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
> | #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
> | #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
> | #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
> | #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
> |
> | 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
> | freed by thread T0 here:
> | #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
> | #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
> | #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
> |
> | previously allocated by thread T0 here:
> | #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
> | #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20
>
> While resolve_btfids could be refactored to avoid this call path, let's
> instead fix this issue at the source in strset__add_str() and avoid
> similar scenarios.
>
> Let's check if set->strs_data was reallocated and whether 's' points to
> an internal string within the old strset buffer. In such case, 's' is
> reconstructed to point to the new buffer.
>
> While already here, also fix strset__find_str() which suffers from the
> same problem by factoring out the common operations into a new helper
> function strset_str_append().
>
> Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
> Suggested-by: Andrii Nakryiko <andrii@kernel.org>
> Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
> Signed-off-by: Carlos Llamas <cmllamas@google.com>
> ---
> v4:
> Store pointers as integers in advance before realloc to prevent UB.
> Access set->strs_data directly, not through external API.
>
> v3:
> Switch to 's' reconstruction approach suggested by Andrii.
> Adjusted names and commit log accordingly.
> https://lore.kernel.org/all/20260518050550.2600101-1-cmllamas@google.com/
>
> v2:
> Implemented the fix in strset__offset() helper as suggested by Mykyta.
> Added support to handle "substrings" of existing ones.
> Used 90d76d3ececc as Fixes tag as suggested by Sashiko.
> https://lore.kernel.org/all/20260515044759.2863546-1-cmllamas@google.com/
>
> v1:
> https://lore.kernel.org/all/20260513232055.1681859-1-cmllamas@google.com/
>
> tools/lib/bpf/strset.c | 59 +++++++++++++++++++++++++++---------------
> 1 file changed, 38 insertions(+), 21 deletions(-)
>
> diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
> index 2464bcbd04e0..b9faca828f09 100644
> --- a/tools/lib/bpf/strset.c
> +++ b/tools/lib/bpf/strset.c
> @@ -107,6 +107,38 @@ static void *strset_add_str_mem(struct strset *set, size_t add_sz)
> set->strs_data_len, set->strs_data_max_len, add_sz);
> }
>
> +static long strset_str_append(struct strset *set, const char *s)
> +{
> + uintptr_t old_data = (uintptr_t)set->strs_data;
> + uintptr_t old_s = (uintptr_t)s;
> + long len = strlen(s) + 1;
> + void *p;
> +
> + /* Hashmap keys are always offsets within set->strs_data, so to even
> + * look up some string from the "outside", we need to first append it
> + * at the end, so that it can be addressed with an offset. Luckily,
> + * until set->strs_data_len is incremented, that string is just a piece
> + * of garbage for the rest of the code, so no harm, no foul. On the
> + * other hand, if the string is unique, it's already appended and
> + * ready to be used, only a simple set->strs_data_len increment away.
> + */
> + p = strset_add_str_mem(set, len);
> + if (!p)
> + return -ENOMEM;
> +
> + /* The set->strs_data might have reallocated and if 's' pointed
> + * to an internal string within the old buffer, then it became
> + * dangling and needs to be reconstructed before the copy.
> + */
> + if (old_data && old_data != (uintptr_t)set->strs_data &&
> + old_s >= old_data && old_s < old_data + set->strs_data_len)
we should use ols strs_data_len here, I fixed it up (and comment style
issues pointed out by AI) like so:
$ git diff
diff --git a/tools/lib/bpf/strset.c b/tools/lib/bpf/strset.c
index b9faca828f09..ace73c6b3d62 100644
--- a/tools/lib/bpf/strset.c
+++ b/tools/lib/bpf/strset.c
@@ -110,11 +110,13 @@ static void *strset_add_str_mem(struct strset
*set, size_t add_sz)
static long strset_str_append(struct strset *set, const char *s)
{
uintptr_t old_data = (uintptr_t)set->strs_data;
+ size_t old_data_len = set->strs_data_len;
uintptr_t old_s = (uintptr_t)s;
long len = strlen(s) + 1;
void *p;
- /* Hashmap keys are always offsets within set->strs_data, so to even
+ /*
+ * Hashmap keys are always offsets within set->strs_data, so to even
* look up some string from the "outside", we need to first append it
* at the end, so that it can be addressed with an offset. Luckily,
* until set->strs_data_len is incremented, that string is just a piece
@@ -126,12 +128,13 @@ static long strset_str_append(struct strset
*set, const char *s)
if (!p)
return -ENOMEM;
- /* The set->strs_data might have reallocated and if 's' pointed
+ /*
+ * The set->strs_data might have reallocated and if 's' pointed
* to an internal string within the old buffer, then it became
* dangling and needs to be reconstructed before the copy.
*/
if (old_data && old_data != (uintptr_t)set->strs_data &&
- old_s >= old_data && old_s < old_data + set->strs_data_len)
+ old_s >= old_data && old_s < old_data + old_data_len)
s = set->strs_data + (old_s - old_data);
memcpy(p, s, len);
applied to bpf-next, thanks!
> + s = set->strs_data + (old_s - old_data);
> +
> + memcpy(p, s, len);
> +
> + return len;
> +}
> +
[...]
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-28 21:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <89d34016-cf82-4beb-989c-e4fc2e3cd29e@gmail.com>
2026-05-15 4:47 ` [PATCH v2] libbpf: fix UAF in strset__add_str() Carlos Llamas
2026-05-15 22:08 ` Andrii Nakryiko
2026-05-18 4:59 ` Carlos Llamas
2026-05-18 5:05 ` [PATCH v3] " Carlos Llamas
2026-05-18 17:36 ` Andrii Nakryiko
2026-05-22 18:35 ` Andrii Nakryiko
2026-05-22 18:50 ` Carlos Llamas
2026-05-23 16:27 ` [PATCH v4] " Carlos Llamas
2026-05-23 17:05 ` bot+bpf-ci
2026-05-28 21:31 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox