* [PATCH bpf-next v1 0/3] Task local data bug fixes and improvement
@ 2026-03-26 5:24 Amery Hung
2026-03-26 5:24 ` [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size Amery Hung
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Amery Hung @ 2026-03-26 5:24 UTC (permalink / raw)
To: bpf
Cc: alexei.starovoitov, andrii, daniel, eddyz87, memxor, ameryhung,
kernel-team
Hi,
This patchset fixed two task local data bugs and improved the
memory allocation code. Please find the detail in each patch's commit
msg.
Amery Hung (3):
selftests/bpf: Fix task_local_data data allocation size
selftests/bpf: Simplify task_local_data memory allocation
selftests/bpf: Make sure TLD_DEFINE_KEY runs first
.../bpf/prog_tests/task_local_data.h | 66 ++++++++-----------
1 file changed, 26 insertions(+), 40 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size 2026-03-26 5:24 [PATCH bpf-next v1 0/3] Task local data bug fixes and improvement Amery Hung @ 2026-03-26 5:24 ` Amery Hung 2026-03-26 10:27 ` sun jian 2026-03-26 16:17 ` Mykyta Yatsenko 2026-03-26 5:24 ` [PATCH bpf-next v1 2/3] selftests/bpf: Simplify task_local_data memory allocation Amery Hung 2026-03-26 5:24 ` [PATCH bpf-next v1 3/3] selftests/bpf: Make sure TLD_DEFINE_KEY runs first Amery Hung 2 siblings, 2 replies; 12+ messages in thread From: Amery Hung @ 2026-03-26 5:24 UTC (permalink / raw) To: bpf Cc: alexei.starovoitov, andrii, daniel, eddyz87, memxor, ameryhung, kernel-team Currently, when allocating memory for data, size of tld_data_u->start is not taken into account. This may cause OOB access. Fixed it by adding the non-flexible array part of tld_datg_u. Besides, explicitly align tld_data_u->data to 8 bytes in case some fields are added before data in the future. It could break the assumption that every data field is 8 byte aligned and sizeof(tld_data_u) will no longer be equal to offsetof(struct tld_data_u, data), which we use interchangeably. Signed-off-by: Amery Hung <ameryhung@gmail.com> --- .../selftests/bpf/prog_tests/task_local_data.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h index 7819f318b2fb..a52d8b549425 100644 --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h @@ -90,7 +90,7 @@ typedef struct { struct tld_metadata { char name[TLD_NAME_LEN]; - _Atomic __u16 size; + _Atomic __u16 size; /* size of tld_data_u->data */ }; struct tld_meta_u { @@ -101,7 +101,7 @@ struct tld_meta_u { struct tld_data_u { __u64 start; /* offset of tld_data_u->data in a page */ - char data[]; + char data[] __attribute__((aligned(8))); }; struct tld_map_value { @@ -158,6 +158,7 @@ static int __tld_init_data_p(int map_fd) struct tld_data_u *data; void *data_alloc = NULL; int err, tid_fd = -1; + size_t size; tid_fd = syscall(SYS_pidfd_open, sys_gettid(), O_EXCL); if (tid_fd < 0) { @@ -173,9 +174,10 @@ static int __tld_init_data_p(int map_fd) * tld_meta_p->size = TLD_DYN_DATA_SIZE + * total size of TLDs defined via TLD_DEFINE_KEY() */ - data_alloc = (use_aligned_alloc || tld_meta_p->size * 2 >= TLD_PAGE_SIZE) ? - aligned_alloc(TLD_PAGE_SIZE, tld_meta_p->size) : - malloc(tld_meta_p->size * 2); + size = tld_meta_p->size + sizeof(struct tld_data_u); + data_alloc = (use_aligned_alloc || size * 2 >= TLD_PAGE_SIZE) ? + aligned_alloc(TLD_PAGE_SIZE, size) : + malloc(size * 2); if (!data_alloc) { err = -ENOMEM; goto out; -- 2.52.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size 2026-03-26 5:24 ` [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size Amery Hung @ 2026-03-26 10:27 ` sun jian 2026-03-26 15:56 ` Amery Hung 2026-03-26 16:17 ` Mykyta Yatsenko 1 sibling, 1 reply; 12+ messages in thread From: sun jian @ 2026-03-26 10:27 UTC (permalink / raw) To: Amery Hung Cc: bpf, alexei.starovoitov, andrii, daniel, eddyz87, memxor, kernel-team On Thu, Mar 26, 2026 at 1:25 PM Amery Hung <ameryhung@gmail.com> wrote: > > Currently, when allocating memory for data, size of tld_data_u->start > is not taken into account. This may cause OOB access. Fixed it by adding > the non-flexible array part of tld_datg_u. nit: seems to be a typo of tld_data_u? > > Besides, explicitly align tld_data_u->data to 8 bytes in case some > fields are added before data in the future. It could break the > assumption that every data field is 8 byte aligned and > sizeof(tld_data_u) will no longer be equal to > offsetof(struct tld_data_u, data), which we use interchangeably. > > Signed-off-by: Amery Hung <ameryhung@gmail.com> > --- > .../selftests/bpf/prog_tests/task_local_data.h | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > index 7819f318b2fb..a52d8b549425 100644 > --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > @@ -90,7 +90,7 @@ typedef struct { > > struct tld_metadata { > char name[TLD_NAME_LEN]; > - _Atomic __u16 size; > + _Atomic __u16 size; /* size of tld_data_u->data */ > }; > > struct tld_meta_u { > @@ -101,7 +101,7 @@ struct tld_meta_u { > > struct tld_data_u { > __u64 start; /* offset of tld_data_u->data in a page */ > - char data[]; > + char data[] __attribute__((aligned(8))); > }; > > struct tld_map_value { > @@ -158,6 +158,7 @@ static int __tld_init_data_p(int map_fd) > struct tld_data_u *data; > void *data_alloc = NULL; > int err, tid_fd = -1; > + size_t size; > > tid_fd = syscall(SYS_pidfd_open, sys_gettid(), O_EXCL); > if (tid_fd < 0) { > @@ -173,9 +174,10 @@ static int __tld_init_data_p(int map_fd) > * tld_meta_p->size = TLD_DYN_DATA_SIZE + > * total size of TLDs defined via TLD_DEFINE_KEY() > */ > - data_alloc = (use_aligned_alloc || tld_meta_p->size * 2 >= TLD_PAGE_SIZE) ? > - aligned_alloc(TLD_PAGE_SIZE, tld_meta_p->size) : > - malloc(tld_meta_p->size * 2); > + size = tld_meta_p->size + sizeof(struct tld_data_u); > + data_alloc = (use_aligned_alloc || size * 2 >= TLD_PAGE_SIZE) ? > + aligned_alloc(TLD_PAGE_SIZE, size) : with `start` now included, `size` seems even less likely to be a multiple of TLD_PAGE_SIZE. Could aligned_alloc() fail here? > + malloc(size * 2); > if (!data_alloc) { > err = -ENOMEM; > goto out; > -- > 2.52.0 > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size 2026-03-26 10:27 ` sun jian @ 2026-03-26 15:56 ` Amery Hung 0 siblings, 0 replies; 12+ messages in thread From: Amery Hung @ 2026-03-26 15:56 UTC (permalink / raw) To: sun jian Cc: bpf, alexei.starovoitov, andrii, daniel, eddyz87, memxor, kernel-team On Thu, Mar 26, 2026 at 3:27 AM sun jian <sun.jian.kdev@gmail.com> wrote: > > On Thu, Mar 26, 2026 at 1:25 PM Amery Hung <ameryhung@gmail.com> wrote: > > > > Currently, when allocating memory for data, size of tld_data_u->start > > is not taken into account. This may cause OOB access. Fixed it by adding > > the non-flexible array part of tld_datg_u. > nit: seems to be a typo of tld_data_u? Ah yes > > > > Besides, explicitly align tld_data_u->data to 8 bytes in case some > > fields are added before data in the future. It could break the > > assumption that every data field is 8 byte aligned and > > sizeof(tld_data_u) will no longer be equal to > > offsetof(struct tld_data_u, data), which we use interchangeably. > > > > Signed-off-by: Amery Hung <ameryhung@gmail.com> > > --- > > .../selftests/bpf/prog_tests/task_local_data.h | 12 +++++++----- > > 1 file changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > > index 7819f318b2fb..a52d8b549425 100644 > > --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h > > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > > @@ -90,7 +90,7 @@ typedef struct { > > > > struct tld_metadata { > > char name[TLD_NAME_LEN]; > > - _Atomic __u16 size; > > + _Atomic __u16 size; /* size of tld_data_u->data */ > > }; > > > > struct tld_meta_u { > > @@ -101,7 +101,7 @@ struct tld_meta_u { > > > > struct tld_data_u { > > __u64 start; /* offset of tld_data_u->data in a page */ > > - char data[]; > > + char data[] __attribute__((aligned(8))); > > }; > > > > struct tld_map_value { > > @@ -158,6 +158,7 @@ static int __tld_init_data_p(int map_fd) > > struct tld_data_u *data; > > void *data_alloc = NULL; > > int err, tid_fd = -1; > > + size_t size; > > > > tid_fd = syscall(SYS_pidfd_open, sys_gettid(), O_EXCL); > > if (tid_fd < 0) { > > @@ -173,9 +174,10 @@ static int __tld_init_data_p(int map_fd) > > * tld_meta_p->size = TLD_DYN_DATA_SIZE + > > * total size of TLDs defined via TLD_DEFINE_KEY() > > */ > > - data_alloc = (use_aligned_alloc || tld_meta_p->size * 2 >= TLD_PAGE_SIZE) ? > > - aligned_alloc(TLD_PAGE_SIZE, tld_meta_p->size) : > > - malloc(tld_meta_p->size * 2); > > + size = tld_meta_p->size + sizeof(struct tld_data_u); > > + data_alloc = (use_aligned_alloc || size * 2 >= TLD_PAGE_SIZE) ? > > + aligned_alloc(TLD_PAGE_SIZE, size) : > with `start` now included, `size` seems even less likely to be a multiple of > TLD_PAGE_SIZE. Could aligned_alloc() fail here? Hi, Thanks for taking a look. This indeed could be a problem. Passing size not being an integral multiple of alignment can be undefined behavior for some memory allocators. This is okay for TLD_DATA_USE_ALIGNED_ALLOC as it should only be enabled when users know the memory allocator supports it. However, aligned_alloc() is always used when size * 2 > TLD_PAGE_SIZE, which may lead to UB. This issue is addressed in patch 2 when the allocation path is cleaned up. > > + malloc(size * 2); > > if (!data_alloc) { > > err = -ENOMEM; > > goto out; > > -- > > 2.52.0 > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size 2026-03-26 5:24 ` [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size Amery Hung 2026-03-26 10:27 ` sun jian @ 2026-03-26 16:17 ` Mykyta Yatsenko 2026-03-26 20:32 ` Amery Hung 1 sibling, 1 reply; 12+ messages in thread From: Mykyta Yatsenko @ 2026-03-26 16:17 UTC (permalink / raw) To: Amery Hung, bpf Cc: alexei.starovoitov, andrii, daniel, eddyz87, memxor, ameryhung, kernel-team Amery Hung <ameryhung@gmail.com> writes: > Currently, when allocating memory for data, size of tld_data_u->start > is not taken into account. This may cause OOB access. Fixed it by adding > the non-flexible array part of tld_datg_u. > > Besides, explicitly align tld_data_u->data to 8 bytes in case some > fields are added before data in the future. It could break the > assumption that every data field is 8 byte aligned and > sizeof(tld_data_u) will no longer be equal to > offsetof(struct tld_data_u, data), which we use interchangeably. > > Signed-off-by: Amery Hung <ameryhung@gmail.com> > --- > .../selftests/bpf/prog_tests/task_local_data.h | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > index 7819f318b2fb..a52d8b549425 100644 > --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > @@ -90,7 +90,7 @@ typedef struct { > > struct tld_metadata { > char name[TLD_NAME_LEN]; > - _Atomic __u16 size; > + _Atomic __u16 size; /* size of tld_data_u->data */ > }; > > struct tld_meta_u { > @@ -101,7 +101,7 @@ struct tld_meta_u { > > struct tld_data_u { > __u64 start; /* offset of tld_data_u->data in a page */ > - char data[]; > + char data[] __attribute__((aligned(8))); > }; > > struct tld_map_value { > @@ -158,6 +158,7 @@ static int __tld_init_data_p(int map_fd) > struct tld_data_u *data; > void *data_alloc = NULL; > int err, tid_fd = -1; > + size_t size; > > tid_fd = syscall(SYS_pidfd_open, sys_gettid(), O_EXCL); > if (tid_fd < 0) { > @@ -173,9 +174,10 @@ static int __tld_init_data_p(int map_fd) > * tld_meta_p->size = TLD_DYN_DATA_SIZE + > * total size of TLDs defined via TLD_DEFINE_KEY() > */ > - data_alloc = (use_aligned_alloc || tld_meta_p->size * 2 >= TLD_PAGE_SIZE) ? > - aligned_alloc(TLD_PAGE_SIZE, tld_meta_p->size) : > - malloc(tld_meta_p->size * 2); > + size = tld_meta_p->size + sizeof(struct tld_data_u); > + data_alloc = (use_aligned_alloc || size * 2 >= TLD_PAGE_SIZE) ? > + aligned_alloc(TLD_PAGE_SIZE, size) : > + malloc(size * 2); It looks like there is no point having Patch 1 separately, most of it is overwritten in the Patch 2, I understand you want to have fix and simplification separately, but is it really worth it for this selftest. > if (!data_alloc) { > err = -ENOMEM; > goto out; > -- > 2.52.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size 2026-03-26 16:17 ` Mykyta Yatsenko @ 2026-03-26 20:32 ` Amery Hung 0 siblings, 0 replies; 12+ messages in thread From: Amery Hung @ 2026-03-26 20:32 UTC (permalink / raw) To: Mykyta Yatsenko Cc: bpf, alexei.starovoitov, andrii, daniel, eddyz87, memxor, kernel-team On Thu, Mar 26, 2026 at 9:18 AM Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> wrote: > > Amery Hung <ameryhung@gmail.com> writes: > > > Currently, when allocating memory for data, size of tld_data_u->start > > is not taken into account. This may cause OOB access. Fixed it by adding > > the non-flexible array part of tld_datg_u. > > > > Besides, explicitly align tld_data_u->data to 8 bytes in case some > > fields are added before data in the future. It could break the > > assumption that every data field is 8 byte aligned and > > sizeof(tld_data_u) will no longer be equal to > > offsetof(struct tld_data_u, data), which we use interchangeably. > > > > Signed-off-by: Amery Hung <ameryhung@gmail.com> > > --- > > .../selftests/bpf/prog_tests/task_local_data.h | 12 +++++++----- > > 1 file changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > > index 7819f318b2fb..a52d8b549425 100644 > > --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h > > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > > @@ -90,7 +90,7 @@ typedef struct { > > > > struct tld_metadata { > > char name[TLD_NAME_LEN]; > > - _Atomic __u16 size; > > + _Atomic __u16 size; /* size of tld_data_u->data */ > > }; > > > > struct tld_meta_u { > > @@ -101,7 +101,7 @@ struct tld_meta_u { > > > > struct tld_data_u { > > __u64 start; /* offset of tld_data_u->data in a page */ > > - char data[]; > > + char data[] __attribute__((aligned(8))); > > }; > > > > struct tld_map_value { > > @@ -158,6 +158,7 @@ static int __tld_init_data_p(int map_fd) > > struct tld_data_u *data; > > void *data_alloc = NULL; > > int err, tid_fd = -1; > > + size_t size; > > > > tid_fd = syscall(SYS_pidfd_open, sys_gettid(), O_EXCL); > > if (tid_fd < 0) { > > @@ -173,9 +174,10 @@ static int __tld_init_data_p(int map_fd) > > * tld_meta_p->size = TLD_DYN_DATA_SIZE + > > * total size of TLDs defined via TLD_DEFINE_KEY() > > */ > > - data_alloc = (use_aligned_alloc || tld_meta_p->size * 2 >= TLD_PAGE_SIZE) ? > > - aligned_alloc(TLD_PAGE_SIZE, tld_meta_p->size) : > > - malloc(tld_meta_p->size * 2); > > + size = tld_meta_p->size + sizeof(struct tld_data_u); > > + data_alloc = (use_aligned_alloc || size * 2 >= TLD_PAGE_SIZE) ? > > + aligned_alloc(TLD_PAGE_SIZE, size) : > > + malloc(size * 2); > It looks like there is no point having Patch 1 separately, most of it is > overwritten in the Patch 2, I understand you want to have fix and > simplification separately, but is it really worth it for this selftest. The allocation path is already a bit puzzling, so I have this fix as a separate patch to make patch 2 easier to follow. That said, I can squash it if it is simpler and needs a respin. > > if (!data_alloc) { > > err = -ENOMEM; > > goto out; > > -- > > 2.52.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH bpf-next v1 2/3] selftests/bpf: Simplify task_local_data memory allocation 2026-03-26 5:24 [PATCH bpf-next v1 0/3] Task local data bug fixes and improvement Amery Hung 2026-03-26 5:24 ` [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size Amery Hung @ 2026-03-26 5:24 ` Amery Hung 2026-03-27 16:36 ` Mykyta Yatsenko 2026-03-26 5:24 ` [PATCH bpf-next v1 3/3] selftests/bpf: Make sure TLD_DEFINE_KEY runs first Amery Hung 2 siblings, 1 reply; 12+ messages in thread From: Amery Hung @ 2026-03-26 5:24 UTC (permalink / raw) To: bpf Cc: alexei.starovoitov, andrii, daniel, eddyz87, memxor, ameryhung, kernel-team Simplify data allocation by always using aligned_alloc() and passing size_pot, size rounded up to the closest power of two to alignment. Currently, aligned_alloc(page_size, size) is only intended to be used with memory allocators that can fulfill the request without rounding size up to page_size to conserve memory. This is enabled by defining TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to the limitation of UPTR where only a page can be pinned to the kernel. Otherwise, malloc(size * 2) is used to allocate memory for data. However, we don't need to call aligned_alloc(page_size, size) to get a contiguous memory of size bytes within a page. aligned_alloc(size_pot, ...) will also do the trick. Therefore, just use aligned_alloc(size_pot, ...) universally. As for the size argument, create a new option, TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size. This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing memory allocators with low overhead aligned_alloc() to not waste memory. To enable this, users need to make sure it is not an undefined behavior for the memory allocator to have size not being an integral multiple of alignment. Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC used to always waste size-byte of memory due to malloc(size * 2). Now the worst case becomes size - 1 and the best case is 0 when the size is already a power of two. Signed-off-by: Amery Hung <ameryhung@gmail.com> --- .../bpf/prog_tests/task_local_data.h | 60 +++++++------------ 1 file changed, 22 insertions(+), 38 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h index a52d8b549425..366a6739c086 100644 --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h @@ -50,16 +50,13 @@ * TLD_MAX_DATA_CNT. * * - * TLD_DATA_USE_ALIGNED_ALLOC - Always use aligned_alloc() instead of malloc() + * TLD_DONT_ROUND_UP_DATA_SIZE - Don't round up memory size allocated for data if + * the memory allocator has low overhead aligned_alloc() implementation. * - * When allocating the memory for storing TLDs, we need to make sure there is a memory - * region of the X bytes within a page. This is due to the limit posed by UPTR: memory - * pinned to the kernel cannot exceed a page nor can it cross the page boundary. The - * library normally calls malloc(2*X) given X bytes of total TLDs, and only uses - * aligned_alloc(PAGE_SIZE, X) when X >= PAGE_SIZE / 2. This is to reduce memory wastage - * as not all memory allocator can use the exact amount of memory requested to fulfill - * aligned_alloc(). For example, some may round the size up to the alignment. Enable the - * option to always use aligned_alloc() if the implementation has low memory overhead. + * For some memory allocators, when calling aligned_alloc(alignment, size), size + * does not need to be an integral multiple of alignment and it can be fulfilled + * without using round_up(size, alignment) bytes of memory. Enable this option to + * reduce memory usage. */ #define TLD_PAGE_SIZE getpagesize() @@ -68,6 +65,8 @@ #define TLD_ROUND_MASK(x, y) ((__typeof__(x))((y) - 1)) #define TLD_ROUND_UP(x, y) ((((x) - 1) | TLD_ROUND_MASK(x, y)) + 1) +#define TLD_ROUND_UP_POWER_OF_TWO(x) (1UL << (sizeof(x) * 8 - __builtin_clzl(x - 1))) + #define TLD_READ_ONCE(x) (*(volatile typeof(x) *)&(x)) #ifndef TLD_DYN_DATA_SIZE @@ -111,7 +110,6 @@ struct tld_map_value { struct tld_meta_u * _Atomic tld_meta_p __attribute__((weak)); __thread struct tld_data_u *tld_data_p __attribute__((weak)); -__thread void *tld_data_alloc_p __attribute__((weak)); #ifdef TLD_FREE_DATA_ON_THREAD_EXIT pthread_key_t tld_pthread_key __attribute__((weak)); @@ -153,12 +151,10 @@ static int __tld_init_meta_p(void) static int __tld_init_data_p(int map_fd) { - bool use_aligned_alloc = false; struct tld_map_value map_val; struct tld_data_u *data; - void *data_alloc = NULL; int err, tid_fd = -1; - size_t size; + size_t size, size_pot; tid_fd = syscall(SYS_pidfd_open, sys_gettid(), O_EXCL); if (tid_fd < 0) { @@ -166,48 +162,37 @@ static int __tld_init_data_p(int map_fd) goto out; } -#ifdef TLD_DATA_USE_ALIGNED_ALLOC - use_aligned_alloc = true; -#endif - /* * tld_meta_p->size = TLD_DYN_DATA_SIZE + * total size of TLDs defined via TLD_DEFINE_KEY() */ size = tld_meta_p->size + sizeof(struct tld_data_u); - data_alloc = (use_aligned_alloc || size * 2 >= TLD_PAGE_SIZE) ? - aligned_alloc(TLD_PAGE_SIZE, size) : - malloc(size * 2); - if (!data_alloc) { + size_pot = TLD_ROUND_UP_POWER_OF_TWO(size); +#ifdef TLD_DONT_ROUND_UP_DATA_SIZE + data = (struct tld_data_u *)aligned_alloc(size_pot, size); +#else + data = (struct tld_data_u *)aligned_alloc(size_pot, size_pot); +#endif + if (!data) { err = -ENOMEM; goto out; } /* * Always pass a page-aligned address to UPTR since the size of tld_map_value::data - * is a page in BTF. If data_alloc spans across two pages, use the page that contains large - * enough memory. + * is a page in BTF. */ - if (TLD_PAGE_SIZE - (~TLD_PAGE_MASK & (intptr_t)data_alloc) >= tld_meta_p->size) { - map_val.data = (void *)(TLD_PAGE_MASK & (intptr_t)data_alloc); - data = data_alloc; - data->start = (~TLD_PAGE_MASK & (intptr_t)data_alloc) + - offsetof(struct tld_data_u, data); - } else { - map_val.data = (void *)(TLD_ROUND_UP((intptr_t)data_alloc, TLD_PAGE_SIZE)); - data = (void *)(TLD_ROUND_UP((intptr_t)data_alloc, TLD_PAGE_SIZE)); - data->start = offsetof(struct tld_data_u, data); - } + map_val.data = (void *)(TLD_PAGE_MASK & (intptr_t)data); + data->start = (~TLD_PAGE_MASK & (intptr_t)data) + sizeof(struct tld_data_u); map_val.meta = TLD_READ_ONCE(tld_meta_p); err = bpf_map_update_elem(map_fd, &tid_fd, &map_val, 0); if (err) { - free(data_alloc); + free(data); goto out; } tld_data_p = data; - tld_data_alloc_p = data_alloc; #ifdef TLD_FREE_DATA_ON_THREAD_EXIT pthread_setspecific(tld_pthread_key, (void *)1); #endif @@ -375,9 +360,8 @@ static void *tld_get_data(int map_fd, tld_key_t key) __attribute__((unused)) static void tld_free(void) { - if (tld_data_alloc_p) { - free(tld_data_alloc_p); - tld_data_alloc_p = NULL; + if (tld_data_p) { + free(tld_data_p); tld_data_p = NULL; } } -- 2.52.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v1 2/3] selftests/bpf: Simplify task_local_data memory allocation 2026-03-26 5:24 ` [PATCH bpf-next v1 2/3] selftests/bpf: Simplify task_local_data memory allocation Amery Hung @ 2026-03-27 16:36 ` Mykyta Yatsenko 2026-03-27 20:56 ` Amery Hung 0 siblings, 1 reply; 12+ messages in thread From: Mykyta Yatsenko @ 2026-03-27 16:36 UTC (permalink / raw) To: Amery Hung, bpf Cc: alexei.starovoitov, andrii, daniel, eddyz87, memxor, ameryhung, kernel-team Amery Hung <ameryhung@gmail.com> writes: > Simplify data allocation by always using aligned_alloc() and passing > size_pot, size rounded up to the closest power of two to alignment. > > Currently, aligned_alloc(page_size, size) is only intended to be used > with memory allocators that can fulfill the request without rounding > size up to page_size to conserve memory. This is enabled by defining > TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to > the limitation of UPTR where only a page can be pinned to the kernel. > Otherwise, malloc(size * 2) is used to allocate memory for data. > > However, we don't need to call aligned_alloc(page_size, size) to get > a contiguous memory of size bytes within a page. aligned_alloc(size_pot, > ...) will also do the trick. Therefore, just use aligned_alloc(size_pot, > ...) universally. > > As for the size argument, create a new option, > TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size. > This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing > memory allocators with low overhead aligned_alloc() to not waste memory. > To enable this, users need to make sure it is not an undefined behavior > for the memory allocator to have size not being an integral multiple of > alignment. Why not simplify this further and just mmap() a page? data = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); That gives you a page-aligned, page-sized buffer with no ambiguity about alignment, page boundary crossings, or aligned_alloc() POSIX compliance. munmap() on cleanup instead of free(). The whole power-of-two rounding and TLD_DONT_ROUND_UP_DATA_SIZE option adds complexity that's hard to justify for selftest infrastructure. How does a user decide when to define/undefine TLD_DONT_ROUND_UP? This is a selftest library -- the common case should just work with the simplest possible approach > > Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC > used to always waste size-byte of memory due to malloc(size * 2). > Now the worst case becomes size - 1 and the best case is 0 when the size > is already a power of two. > > Signed-off-by: Amery Hung <ameryhung@gmail.com> > --- > .../bpf/prog_tests/task_local_data.h | 60 +++++++------------ > 1 file changed, 22 insertions(+), 38 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > index a52d8b549425..366a6739c086 100644 > --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > @@ -50,16 +50,13 @@ > * TLD_MAX_DATA_CNT. > * > * > - * TLD_DATA_USE_ALIGNED_ALLOC - Always use aligned_alloc() instead of malloc() > + * TLD_DONT_ROUND_UP_DATA_SIZE - Don't round up memory size allocated for data if > + * the memory allocator has low overhead aligned_alloc() implementation. > * > - * When allocating the memory for storing TLDs, we need to make sure there is a memory > - * region of the X bytes within a page. This is due to the limit posed by UPTR: memory > - * pinned to the kernel cannot exceed a page nor can it cross the page boundary. The > - * library normally calls malloc(2*X) given X bytes of total TLDs, and only uses > - * aligned_alloc(PAGE_SIZE, X) when X >= PAGE_SIZE / 2. This is to reduce memory wastage > - * as not all memory allocator can use the exact amount of memory requested to fulfill > - * aligned_alloc(). For example, some may round the size up to the alignment. Enable the > - * option to always use aligned_alloc() if the implementation has low memory overhead. > + * For some memory allocators, when calling aligned_alloc(alignment, size), size > + * does not need to be an integral multiple of alignment and it can be fulfilled > + * without using round_up(size, alignment) bytes of memory. Enable this option to > + * reduce memory usage. > */ > > #define TLD_PAGE_SIZE getpagesize() > @@ -68,6 +65,8 @@ > #define TLD_ROUND_MASK(x, y) ((__typeof__(x))((y) - 1)) > #define TLD_ROUND_UP(x, y) ((((x) - 1) | TLD_ROUND_MASK(x, y)) + 1) > > +#define TLD_ROUND_UP_POWER_OF_TWO(x) (1UL << (sizeof(x) * 8 - __builtin_clzl(x - 1))) > + > #define TLD_READ_ONCE(x) (*(volatile typeof(x) *)&(x)) > > #ifndef TLD_DYN_DATA_SIZE > @@ -111,7 +110,6 @@ struct tld_map_value { > > struct tld_meta_u * _Atomic tld_meta_p __attribute__((weak)); > __thread struct tld_data_u *tld_data_p __attribute__((weak)); > -__thread void *tld_data_alloc_p __attribute__((weak)); > > #ifdef TLD_FREE_DATA_ON_THREAD_EXIT > pthread_key_t tld_pthread_key __attribute__((weak)); > @@ -153,12 +151,10 @@ static int __tld_init_meta_p(void) > > static int __tld_init_data_p(int map_fd) > { > - bool use_aligned_alloc = false; > struct tld_map_value map_val; > struct tld_data_u *data; > - void *data_alloc = NULL; > int err, tid_fd = -1; > - size_t size; > + size_t size, size_pot; > > tid_fd = syscall(SYS_pidfd_open, sys_gettid(), O_EXCL); > if (tid_fd < 0) { > @@ -166,48 +162,37 @@ static int __tld_init_data_p(int map_fd) > goto out; > } > > -#ifdef TLD_DATA_USE_ALIGNED_ALLOC > - use_aligned_alloc = true; > -#endif > - > /* > * tld_meta_p->size = TLD_DYN_DATA_SIZE + > * total size of TLDs defined via TLD_DEFINE_KEY() > */ > size = tld_meta_p->size + sizeof(struct tld_data_u); > - data_alloc = (use_aligned_alloc || size * 2 >= TLD_PAGE_SIZE) ? > - aligned_alloc(TLD_PAGE_SIZE, size) : > - malloc(size * 2); > - if (!data_alloc) { > + size_pot = TLD_ROUND_UP_POWER_OF_TWO(size); > +#ifdef TLD_DONT_ROUND_UP_DATA_SIZE > + data = (struct tld_data_u *)aligned_alloc(size_pot, size); > +#else > + data = (struct tld_data_u *)aligned_alloc(size_pot, size_pot); > +#endif > + if (!data) { > err = -ENOMEM; > goto out; > } > > /* > * Always pass a page-aligned address to UPTR since the size of tld_map_value::data > - * is a page in BTF. If data_alloc spans across two pages, use the page that contains large > - * enough memory. > + * is a page in BTF. > */ > - if (TLD_PAGE_SIZE - (~TLD_PAGE_MASK & (intptr_t)data_alloc) >= tld_meta_p->size) { > - map_val.data = (void *)(TLD_PAGE_MASK & (intptr_t)data_alloc); > - data = data_alloc; > - data->start = (~TLD_PAGE_MASK & (intptr_t)data_alloc) + > - offsetof(struct tld_data_u, data); > - } else { > - map_val.data = (void *)(TLD_ROUND_UP((intptr_t)data_alloc, TLD_PAGE_SIZE)); > - data = (void *)(TLD_ROUND_UP((intptr_t)data_alloc, TLD_PAGE_SIZE)); > - data->start = offsetof(struct tld_data_u, data); > - } > + map_val.data = (void *)(TLD_PAGE_MASK & (intptr_t)data); > + data->start = (~TLD_PAGE_MASK & (intptr_t)data) + sizeof(struct tld_data_u); > map_val.meta = TLD_READ_ONCE(tld_meta_p); > > err = bpf_map_update_elem(map_fd, &tid_fd, &map_val, 0); > if (err) { > - free(data_alloc); > + free(data); > goto out; > } > > tld_data_p = data; > - tld_data_alloc_p = data_alloc; > #ifdef TLD_FREE_DATA_ON_THREAD_EXIT > pthread_setspecific(tld_pthread_key, (void *)1); > #endif > @@ -375,9 +360,8 @@ static void *tld_get_data(int map_fd, tld_key_t key) > __attribute__((unused)) > static void tld_free(void) > { > - if (tld_data_alloc_p) { > - free(tld_data_alloc_p); > - tld_data_alloc_p = NULL; > + if (tld_data_p) { > + free(tld_data_p); > tld_data_p = NULL; > } > } > -- > 2.52.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v1 2/3] selftests/bpf: Simplify task_local_data memory allocation 2026-03-27 16:36 ` Mykyta Yatsenko @ 2026-03-27 20:56 ` Amery Hung 2026-03-27 21:15 ` Mykyta Yatsenko 0 siblings, 1 reply; 12+ messages in thread From: Amery Hung @ 2026-03-27 20:56 UTC (permalink / raw) To: Mykyta Yatsenko Cc: bpf, alexei.starovoitov, andrii, daniel, eddyz87, memxor, kernel-team On Fri, Mar 27, 2026 at 9:36 AM Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> wrote: > > Amery Hung <ameryhung@gmail.com> writes: > > > Simplify data allocation by always using aligned_alloc() and passing > > size_pot, size rounded up to the closest power of two to alignment. > > > > Currently, aligned_alloc(page_size, size) is only intended to be used > > with memory allocators that can fulfill the request without rounding > > size up to page_size to conserve memory. This is enabled by defining > > TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to > > the limitation of UPTR where only a page can be pinned to the kernel. > > Otherwise, malloc(size * 2) is used to allocate memory for data. > > > > However, we don't need to call aligned_alloc(page_size, size) to get > > a contiguous memory of size bytes within a page. aligned_alloc(size_pot, > > ...) will also do the trick. Therefore, just use aligned_alloc(size_pot, > > ...) universally. > > > > As for the size argument, create a new option, > > TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size. > > This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing > > memory allocators with low overhead aligned_alloc() to not waste memory. > > To enable this, users need to make sure it is not an undefined behavior > > for the memory allocator to have size not being an integral multiple of > > alignment. > Why not simplify this further and just mmap() a page? > > data = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > > That gives you a page-aligned, page-sized buffer with no ambiguity > about alignment, page boundary crossings, or aligned_alloc() POSIX > compliance. munmap() on cleanup instead of free(). > > The whole power-of-two rounding and TLD_DONT_ROUND_UP_DATA_SIZE > option adds complexity that's hard to justify for selftest > infrastructure. How does a user decide when to define/undefine > TLD_DONT_ROUND_UP? This is a selftest library -- the common case > should just work with the simplest possible approach This library, while residing in selftest, is designed to be used in production projects. sched_ext is in the process of adopting it. Therefore, simplicity is traded for efficiency. data here is allocated for every thread that calls tld_get_data(). So if we mmap a page for every thread (which could be in 100k order), it is going to waste a significant amount of memory. > > > > Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC > > used to always waste size-byte of memory due to malloc(size * 2). > > Now the worst case becomes size - 1 and the best case is 0 when the size > > is already a power of two. > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v1 2/3] selftests/bpf: Simplify task_local_data memory allocation 2026-03-27 20:56 ` Amery Hung @ 2026-03-27 21:15 ` Mykyta Yatsenko 0 siblings, 0 replies; 12+ messages in thread From: Mykyta Yatsenko @ 2026-03-27 21:15 UTC (permalink / raw) To: Amery Hung Cc: bpf, alexei.starovoitov, andrii, daniel, eddyz87, memxor, kernel-team Amery Hung <ameryhung@gmail.com> writes: > On Fri, Mar 27, 2026 at 9:36 AM Mykyta Yatsenko > <mykyta.yatsenko5@gmail.com> wrote: >> >> Amery Hung <ameryhung@gmail.com> writes: >> >> > Simplify data allocation by always using aligned_alloc() and passing >> > size_pot, size rounded up to the closest power of two to alignment. >> > >> > Currently, aligned_alloc(page_size, size) is only intended to be used >> > with memory allocators that can fulfill the request without rounding >> > size up to page_size to conserve memory. This is enabled by defining >> > TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to >> > the limitation of UPTR where only a page can be pinned to the kernel. >> > Otherwise, malloc(size * 2) is used to allocate memory for data. >> > >> > However, we don't need to call aligned_alloc(page_size, size) to get >> > a contiguous memory of size bytes within a page. aligned_alloc(size_pot, >> > ...) will also do the trick. Therefore, just use aligned_alloc(size_pot, >> > ...) universally. >> > >> > As for the size argument, create a new option, >> > TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size. >> > This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing >> > memory allocators with low overhead aligned_alloc() to not waste memory. >> > To enable this, users need to make sure it is not an undefined behavior >> > for the memory allocator to have size not being an integral multiple of >> > alignment. >> Why not simplify this further and just mmap() a page? >> >> data = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, >> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); >> >> That gives you a page-aligned, page-sized buffer with no ambiguity >> about alignment, page boundary crossings, or aligned_alloc() POSIX >> compliance. munmap() on cleanup instead of free(). >> >> The whole power-of-two rounding and TLD_DONT_ROUND_UP_DATA_SIZE >> option adds complexity that's hard to justify for selftest >> infrastructure. How does a user decide when to define/undefine >> TLD_DONT_ROUND_UP? This is a selftest library -- the common case >> should just work with the simplest possible approach > > This library, while residing in selftest, is designed to be used in > production projects. sched_ext is in the process of adopting it. > Therefore, simplicity is traded for efficiency. > > data here is allocated for every thread that calls tld_get_data(). So > if we mmap a page for every thread (which could be in 100k order), it > is going to waste a significant amount of memory. > Thanks for providing this context, it makes sense now. >> > >> > Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC >> > used to always waste size-byte of memory due to malloc(size * 2). >> > Now the worst case becomes size - 1 and the best case is 0 when the size >> > is already a power of two. >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH bpf-next v1 3/3] selftests/bpf: Make sure TLD_DEFINE_KEY runs first 2026-03-26 5:24 [PATCH bpf-next v1 0/3] Task local data bug fixes and improvement Amery Hung 2026-03-26 5:24 ` [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size Amery Hung 2026-03-26 5:24 ` [PATCH bpf-next v1 2/3] selftests/bpf: Simplify task_local_data memory allocation Amery Hung @ 2026-03-26 5:24 ` Amery Hung 2026-03-27 16:56 ` Mykyta Yatsenko 2 siblings, 1 reply; 12+ messages in thread From: Amery Hung @ 2026-03-26 5:24 UTC (permalink / raw) To: bpf Cc: alexei.starovoitov, andrii, daniel, eddyz87, memxor, ameryhung, kernel-team Without specifying constructor priority of the hidden constructor function defined by TLD_DEFINE_KEY, __tld_create_key(..., dyn_data = false) may run after tld_get_data() called from other constructors. Threads calling tld_get_data() before __tld_create_key(..., dyn_data = false) will not allocate enough memory for all TLDs and later result in OOB access. Therefore, set it to the lowest value available to users. Note that lower means higher priority and 0-100 is reserved to the compiler. Signed-off-by: Amery Hung <ameryhung@gmail.com> --- tools/testing/selftests/bpf/prog_tests/task_local_data.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h index 366a6739c086..e242c455ddae 100644 --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h @@ -278,7 +278,7 @@ static tld_key_t __tld_create_key(const char *name, size_t size, bool dyn_data) #define TLD_DEFINE_KEY(key, name, size) \ tld_key_t key; \ \ -__attribute__((constructor)) \ +__attribute__((constructor(101))) \ void __tld_define_key_##key(void) \ { \ key = __tld_create_key(name, size, false); \ -- 2.52.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH bpf-next v1 3/3] selftests/bpf: Make sure TLD_DEFINE_KEY runs first 2026-03-26 5:24 ` [PATCH bpf-next v1 3/3] selftests/bpf: Make sure TLD_DEFINE_KEY runs first Amery Hung @ 2026-03-27 16:56 ` Mykyta Yatsenko 0 siblings, 0 replies; 12+ messages in thread From: Mykyta Yatsenko @ 2026-03-27 16:56 UTC (permalink / raw) To: Amery Hung, bpf Cc: alexei.starovoitov, andrii, daniel, eddyz87, memxor, ameryhung, kernel-team Amery Hung <ameryhung@gmail.com> writes: > Without specifying constructor priority of the hidden constructor > function defined by TLD_DEFINE_KEY, __tld_create_key(..., dyn_data = > false) may run after tld_get_data() called from other constructors. > Threads calling tld_get_data() before __tld_create_key(..., dyn_data > = false) will not allocate enough memory for all TLDs and later result > in OOB access. Therefore, set it to the lowest value available to > users. Note that lower means higher priority and 0-100 is reserved to > the compiler. > > Signed-off-by: Amery Hung <ameryhung@gmail.com> > --- > tools/testing/selftests/bpf/prog_tests/task_local_data.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > index 366a6739c086..e242c455ddae 100644 > --- a/tools/testing/selftests/bpf/prog_tests/task_local_data.h > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h > @@ -278,7 +278,7 @@ static tld_key_t __tld_create_key(const char *name, size_t size, bool dyn_data) > #define TLD_DEFINE_KEY(key, name, size) \ > tld_key_t key; \ > \ > -__attribute__((constructor)) \ > +__attribute__((constructor(101))) \ > void __tld_define_key_##key(void) \ > { \ > key = __tld_create_key(name, size, false); \ > -- > 2.52.0 Acked-by: Mykyta Yatsenko <yatsenko@meta.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-03-27 21:15 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-26 5:24 [PATCH bpf-next v1 0/3] Task local data bug fixes and improvement Amery Hung 2026-03-26 5:24 ` [PATCH bpf-next v1 1/3] selftests/bpf: Fix task_local_data data allocation size Amery Hung 2026-03-26 10:27 ` sun jian 2026-03-26 15:56 ` Amery Hung 2026-03-26 16:17 ` Mykyta Yatsenko 2026-03-26 20:32 ` Amery Hung 2026-03-26 5:24 ` [PATCH bpf-next v1 2/3] selftests/bpf: Simplify task_local_data memory allocation Amery Hung 2026-03-27 16:36 ` Mykyta Yatsenko 2026-03-27 20:56 ` Amery Hung 2026-03-27 21:15 ` Mykyta Yatsenko 2026-03-26 5:24 ` [PATCH bpf-next v1 3/3] selftests/bpf: Make sure TLD_DEFINE_KEY runs first Amery Hung 2026-03-27 16:56 ` Mykyta Yatsenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox