* [PATCH bpf-next 0/2] Allow struct_ops maps with a large number of programs
@ 2024-02-16 18:28 thinker.li
2024-02-16 18:28 ` [PATCH bpf-next 1/2] bpf: struct_ops supports more than one page for trampolines thinker.li
2024-02-16 18:28 ` [PATCH bpf-next 2/2] selftests/bpf: Test struct_ops maps with a large number of program links thinker.li
0 siblings, 2 replies; 5+ messages in thread
From: thinker.li @ 2024-02-16 18:28 UTC (permalink / raw)
To: bpf, ast, martin.lau, song, kernel-team, andrii
Cc: sinquersw, kuifeng, Kui-Feng Lee
From: Kui-Feng Lee <thinker.li@gmail.com>
The BPF struct_ops previously only allowed for one page to be used for
the trampolines of all links in a map. However, we have recently run
out of space due to the large number of BPF program links. By
allocating additional pages when we exhaust an existing page, we can
accommodate more links in a single map.
The variable st_map->image has been changed to st_map->image_pages,
and its type has been changed to an array of pointers to buffers of
PAGE_SIZE. The array is dynamically resized and additional pages are
allocated when all existing pages are exhausted.
The test case loads a struct_ops maps having 40 programs. Their
trampolines takes about 6.6k+ bytes over 1.5 pages on x86.
Kui-Feng Lee (2):
bpf: struct_ops supports more than one page for trampolines.
selftests/bpf: Test struct_ops maps with a large number of program
links.
kernel/bpf/bpf_struct_ops.c | 99 +++++++++++++----
.../selftests/bpf/bpf_testmod/bpf_testmod.h | 44 ++++++++
.../prog_tests/test_struct_ops_multi_pages.c | 24 +++++
.../bpf/progs/struct_ops_multi_pages.c | 102 ++++++++++++++++++
4 files changed, 250 insertions(+), 19 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c
create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH bpf-next 1/2] bpf: struct_ops supports more than one page for trampolines. 2024-02-16 18:28 [PATCH bpf-next 0/2] Allow struct_ops maps with a large number of programs thinker.li @ 2024-02-16 18:28 ` thinker.li 2024-02-20 21:47 ` Martin KaFai Lau 2024-02-16 18:28 ` [PATCH bpf-next 2/2] selftests/bpf: Test struct_ops maps with a large number of program links thinker.li 1 sibling, 1 reply; 5+ messages in thread From: thinker.li @ 2024-02-16 18:28 UTC (permalink / raw) To: bpf, ast, martin.lau, song, kernel-team, andrii Cc: sinquersw, kuifeng, Kui-Feng Lee From: Kui-Feng Lee <thinker.li@gmail.com> The BPF struct_ops previously only allowed for one page to be used for the trampolines of all links in a map. However, we have recently run out of space due to the large number of BPF program links. By allocating additional pages when we exhaust an existing page, we can accommodate more links in a single map. The variable st_map->image has been changed to st_map->image_pages, and its type has been changed to an array of pointers to buffers of PAGE_SIZE. The array is dynamically resized and additional pages are allocated when all existing pages are exhausted. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> --- kernel/bpf/bpf_struct_ops.c | 99 ++++++++++++++++++++++++++++++------- 1 file changed, 80 insertions(+), 19 deletions(-) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 0d7be97a2411..bb7ae665006a 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -30,12 +30,11 @@ struct bpf_struct_ops_map { */ struct bpf_link **links; u32 links_cnt; - /* image is a page that has all the trampolines + u32 image_pages_cnt; + /* image_pages is an array of pages that has all the trampolines * that stores the func args before calling the bpf_prog. - * A PAGE_SIZE "image" is enough to store all trampoline for - * "links[]". */ - void *image; + void **image_pages; /* The owner moduler's btf. */ struct btf *btf; /* uvalue->data stores the kernel struct @@ -535,7 +534,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, void *udata, *kdata; int prog_fd, err; void *image, *image_end; - u32 i; + void **image_pages; + u32 i, next_page = 0; if (flags) return -EINVAL; @@ -573,8 +573,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, udata = &uvalue->data; kdata = &kvalue->data; - image = st_map->image; - image_end = st_map->image + PAGE_SIZE; + image = st_map->image_pages[next_page++]; + image_end = image + PAGE_SIZE; module_type = btf_type_by_id(btf_vmlinux, st_ops_ids[IDX_MODULE_ID]); for_each_member(i, t, member) { @@ -657,6 +657,43 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, &st_ops->func_models[i], *(void **)(st_ops->cfi_stubs + moff), image, image_end); + if (err == -E2BIG) { + /* Use an additional page to try again. + * + * It may reuse pages allocated for the previous + * failed calls. + */ + if (next_page >= st_map->image_pages_cnt) { + /* Allocate an additional page */ + image_pages = krealloc(st_map->image_pages, + (st_map->image_pages_cnt + 1) * sizeof(void *), + GFP_KERNEL); + if (!image_pages) { + err = -ENOMEM; + goto reset_unlock; + } + st_map->image_pages = image_pages; + + err = bpf_jit_charge_modmem(PAGE_SIZE); + if (err) + goto reset_unlock; + + image = arch_alloc_bpf_trampoline(PAGE_SIZE); + if (!image) { + bpf_jit_uncharge_modmem(PAGE_SIZE); + err = -ENOMEM; + goto reset_unlock; + } + st_map->image_pages[st_map->image_pages_cnt++] = image; + } + image = st_map->image_pages[next_page++]; + image_end = image + PAGE_SIZE; + + err = bpf_struct_ops_prepare_trampoline(tlinks, link, + &st_ops->func_models[i], + *(void **)(st_ops->cfi_stubs + moff), + image, image_end); + } if (err < 0) goto reset_unlock; @@ -667,6 +704,18 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, *(unsigned long *)(udata + moff) = prog->aux->id; } + while (next_page < st_map->image_pages_cnt) { + /* Free unused pages + * + * The value can not be updated anymore if the value is not + * rejected by st_ops->validate() or st_ops->reg(). So, + * there is no reason to keep the unused pages. + */ + bpf_jit_uncharge_modmem(PAGE_SIZE); + image = st_map->image_pages[--st_map->image_pages_cnt]; + arch_free_bpf_trampoline(image, PAGE_SIZE); + } + if (st_map->map.map_flags & BPF_F_LINK) { err = 0; if (st_ops->validate) { @@ -674,7 +723,9 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, if (err) goto reset_unlock; } - arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); + for (i = 0; i < next_page; i++) + arch_protect_bpf_trampoline(st_map->image_pages[i], + PAGE_SIZE); /* Let bpf_link handle registration & unregistration. * * Pair with smp_load_acquire() during lookup_elem(). @@ -683,7 +734,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, goto unlock; } - arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); + for (i = 0; i < next_page; i++) + arch_protect_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); err = st_ops->reg(kdata); if (likely(!err)) { /* This refcnt increment on the map here after @@ -706,7 +758,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, * there was a race in registering the struct_ops (under the same name) to * a sub-system through different struct_ops's maps. */ - arch_unprotect_bpf_trampoline(st_map->image, PAGE_SIZE); + for (i = 0; i < next_page; i++) + arch_unprotect_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); reset_unlock: bpf_struct_ops_map_put_progs(st_map); @@ -771,14 +824,15 @@ static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key, static void __bpf_struct_ops_map_free(struct bpf_map *map) { struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; + int i; if (st_map->links) bpf_struct_ops_map_put_progs(st_map); bpf_map_area_free(st_map->links); - if (st_map->image) { - arch_free_bpf_trampoline(st_map->image, PAGE_SIZE); - bpf_jit_uncharge_modmem(PAGE_SIZE); - } + for (i = 0; i < st_map->image_pages_cnt; i++) + arch_free_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); + bpf_jit_uncharge_modmem(PAGE_SIZE * st_map->image_pages_cnt); + kfree(st_map->image_pages); bpf_map_area_free(st_map->uvalue); bpf_map_area_free(st_map); } @@ -888,20 +942,27 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) st_map->st_ops_desc = st_ops_desc; map = &st_map->map; + st_map->image_pages = kcalloc(1, sizeof(void *), GFP_KERNEL); + if (!st_map->image_pages) { + ret = -ENOMEM; + goto errout_free; + } + ret = bpf_jit_charge_modmem(PAGE_SIZE); if (ret) goto errout_free; - st_map->image = arch_alloc_bpf_trampoline(PAGE_SIZE); - if (!st_map->image) { - /* __bpf_struct_ops_map_free() uses st_map->image as flag - * for "charged or not". In this case, we need to unchange - * here. + st_map->image_pages[0] = arch_alloc_bpf_trampoline(PAGE_SIZE); + if (!st_map->image_pages[0]) { + /* __bpf_struct_ops_map_free() uses st_map->image_pages_cnt + * to for uncharging a number of pages. In this case, we + * need to uncharge here. */ bpf_jit_uncharge_modmem(PAGE_SIZE); ret = -ENOMEM; goto errout_free; } + st_map->image_pages_cnt = 1; st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE); st_map->links_cnt = btf_type_vlen(t); st_map->links = -- 2.34.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: struct_ops supports more than one page for trampolines. 2024-02-16 18:28 ` [PATCH bpf-next 1/2] bpf: struct_ops supports more than one page for trampolines thinker.li @ 2024-02-20 21:47 ` Martin KaFai Lau 2024-02-20 23:23 ` Kui-Feng Lee 0 siblings, 1 reply; 5+ messages in thread From: Martin KaFai Lau @ 2024-02-20 21:47 UTC (permalink / raw) To: thinker.li; +Cc: sinquersw, kuifeng, bpf, ast, song, kernel-team, andrii On 2/16/24 10:28 AM, thinker.li@gmail.com wrote: > From: Kui-Feng Lee <thinker.li@gmail.com> > > The BPF struct_ops previously only allowed for one page to be used for the > trampolines of all links in a map. However, we have recently run out of > space due to the large number of BPF program links. By allocating > additional pages when we exhaust an existing page, we can accommodate more > links in a single map. > > The variable st_map->image has been changed to st_map->image_pages, and its > type has been changed to an array of pointers to buffers of PAGE_SIZE. The > array is dynamically resized and additional pages are allocated when all > existing pages are exhausted. > > Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> > --- > kernel/bpf/bpf_struct_ops.c | 99 ++++++++++++++++++++++++++++++------- > 1 file changed, 80 insertions(+), 19 deletions(-) > > diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c > index 0d7be97a2411..bb7ae665006a 100644 > --- a/kernel/bpf/bpf_struct_ops.c > +++ b/kernel/bpf/bpf_struct_ops.c > @@ -30,12 +30,11 @@ struct bpf_struct_ops_map { > */ > struct bpf_link **links; > u32 links_cnt; > - /* image is a page that has all the trampolines > + u32 image_pages_cnt; > + /* image_pages is an array of pages that has all the trampolines > * that stores the func args before calling the bpf_prog. > - * A PAGE_SIZE "image" is enough to store all trampoline for > - * "links[]". > */ > - void *image; > + void **image_pages; > /* The owner moduler's btf. */ > struct btf *btf; > /* uvalue->data stores the kernel struct > @@ -535,7 +534,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, > void *udata, *kdata; > int prog_fd, err; > void *image, *image_end; > - u32 i; > + void **image_pages; > + u32 i, next_page = 0; > > if (flags) > return -EINVAL; > @@ -573,8 +573,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, > > udata = &uvalue->data; > kdata = &kvalue->data; > - image = st_map->image; > - image_end = st_map->image + PAGE_SIZE; > + image = st_map->image_pages[next_page++]; > + image_end = image + PAGE_SIZE; > > module_type = btf_type_by_id(btf_vmlinux, st_ops_ids[IDX_MODULE_ID]); > for_each_member(i, t, member) { > @@ -657,6 +657,43 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, > &st_ops->func_models[i], > *(void **)(st_ops->cfi_stubs + moff), > image, image_end); > + if (err == -E2BIG) { > + /* Use an additional page to try again. > + * > + * It may reuse pages allocated for the previous > + * failed calls. > + */ > + if (next_page >= st_map->image_pages_cnt) { This check (more on this later) ... > + /* Allocate an additional page */ > + image_pages = krealloc(st_map->image_pages, > + (st_map->image_pages_cnt + 1) * sizeof(void *), > + GFP_KERNEL); From the patch 2 test, one page is enough for at least 20 ops. How about keep it simple and allow a max 8 pages which should be much more than enough for sane usage. (i.e. add "void *images[MAX_STRUCT_OPS_PAGES];" to "struct bpf_struct_ops_map"). > + if (!image_pages) { > + err = -ENOMEM; > + goto reset_unlock; > + } > + st_map->image_pages = image_pages; > + > + err = bpf_jit_charge_modmem(PAGE_SIZE); > + if (err) > + goto reset_unlock; > + > + image = arch_alloc_bpf_trampoline(PAGE_SIZE); > + if (!image) { > + bpf_jit_uncharge_modmem(PAGE_SIZE); > + err = -ENOMEM; > + goto reset_unlock; > + } > + st_map->image_pages[st_map->image_pages_cnt++] = image; > + } > + image = st_map->image_pages[next_page++]; > + image_end = image + PAGE_SIZE; > + > + err = bpf_struct_ops_prepare_trampoline(tlinks, link, > + &st_ops->func_models[i], > + *(void **)(st_ops->cfi_stubs + moff), > + image, image_end); > + } > if (err < 0) > goto reset_unlock; > > @@ -667,6 +704,18 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, > *(unsigned long *)(udata + moff) = prog->aux->id; > } > > + while (next_page < st_map->image_pages_cnt) { This check and the above "if (next_page >= st_map->image_pages_cnt)" should not happen for the common case? Together with the new comment after the above "if (err == -E2BIG)", is it trying to optimize to reuse the pages allocated in the previous error-ed out map_update_elem() call? How about keep it simple for the common case and always free all pages when map_update_elem() failed? Also, after this patch, the same calls are used in different places. arch_alloc_bpf_trampoline() is done in two different places, one in map_alloc() and one in map_update_elem(). How about do all the page alloc in map_update_elem()? bpf_struct_ops_prepare_trampoline() is also called in two different places within the same map_update_elem(). When looking inside the bpf_struct_ops_prepare_trampoline(), it does call arch_bpf_trampoline_size() to learn the required size first. bpf_struct_ops_prepare_trampoline() should be a better place to call arch_alloc_bpf_trampoline() when needed. Then there is no need to retry bpf_struct_ops_prepare_trampoline() in map_update_elem()? > + /* Free unused pages > + * > + * The value can not be updated anymore if the value is not > + * rejected by st_ops->validate() or st_ops->reg(). So, > + * there is no reason to keep the unused pages. > + */ > + bpf_jit_uncharge_modmem(PAGE_SIZE); > + image = st_map->image_pages[--st_map->image_pages_cnt]; > + arch_free_bpf_trampoline(image, PAGE_SIZE); > + } > + > if (st_map->map.map_flags & BPF_F_LINK) { > err = 0; > if (st_ops->validate) { > @@ -674,7 +723,9 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, > if (err) > goto reset_unlock; > } > - arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); > + for (i = 0; i < next_page; i++) > + arch_protect_bpf_trampoline(st_map->image_pages[i], > + PAGE_SIZE); arch_protect_bpf_trampoline() is called here for BPF_F_LINK. > /* Let bpf_link handle registration & unregistration. > * > * Pair with smp_load_acquire() during lookup_elem(). > @@ -683,7 +734,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, > goto unlock; > } > > - arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); > + for (i = 0; i < next_page; i++) > + arch_protect_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); arch_protect_bpf_trampoline() is called here also in the same function for non BPF_F_LINK. Can this be cleaned up a bit? For example, "st_ops->validate(kdata);" should not be specific to BPF_F_LINK which had been brought up earlier when making the "->validate" optional. It is a good time to clean this up also. ---- pw-bot: cr > err = st_ops->reg(kdata); > if (likely(!err)) { > /* This refcnt increment on the map here after > @@ -706,7 +758,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, > * there was a race in registering the struct_ops (under the same name) to > * a sub-system through different struct_ops's maps. > */ > - arch_unprotect_bpf_trampoline(st_map->image, PAGE_SIZE); > + for (i = 0; i < next_page; i++) > + arch_unprotect_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); > > reset_unlock: > bpf_struct_ops_map_put_progs(st_map); > @@ -771,14 +824,15 @@ static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key, > static void __bpf_struct_ops_map_free(struct bpf_map *map) > { > struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; > + int i; > > if (st_map->links) > bpf_struct_ops_map_put_progs(st_map); > bpf_map_area_free(st_map->links); > - if (st_map->image) { > - arch_free_bpf_trampoline(st_map->image, PAGE_SIZE); > - bpf_jit_uncharge_modmem(PAGE_SIZE); > - } > + for (i = 0; i < st_map->image_pages_cnt; i++) > + arch_free_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); > + bpf_jit_uncharge_modmem(PAGE_SIZE * st_map->image_pages_cnt); > + kfree(st_map->image_pages); > bpf_map_area_free(st_map->uvalue); > bpf_map_area_free(st_map); > } > @@ -888,20 +942,27 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) > st_map->st_ops_desc = st_ops_desc; > map = &st_map->map; > > + st_map->image_pages = kcalloc(1, sizeof(void *), GFP_KERNEL); > + if (!st_map->image_pages) { > + ret = -ENOMEM; > + goto errout_free; > + } > + > ret = bpf_jit_charge_modmem(PAGE_SIZE); > if (ret) > goto errout_free; > > - st_map->image = arch_alloc_bpf_trampoline(PAGE_SIZE); > - if (!st_map->image) { > - /* __bpf_struct_ops_map_free() uses st_map->image as flag > - * for "charged or not". In this case, we need to unchange > - * here. > + st_map->image_pages[0] = arch_alloc_bpf_trampoline(PAGE_SIZE); > + if (!st_map->image_pages[0]) { > + /* __bpf_struct_ops_map_free() uses st_map->image_pages_cnt > + * to for uncharging a number of pages. In this case, we > + * need to uncharge here. > */ > bpf_jit_uncharge_modmem(PAGE_SIZE); > ret = -ENOMEM; > goto errout_free; > } > + st_map->image_pages_cnt = 1; > st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE); > st_map->links_cnt = btf_type_vlen(t); > st_map->links = ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: struct_ops supports more than one page for trampolines. 2024-02-20 21:47 ` Martin KaFai Lau @ 2024-02-20 23:23 ` Kui-Feng Lee 0 siblings, 0 replies; 5+ messages in thread From: Kui-Feng Lee @ 2024-02-20 23:23 UTC (permalink / raw) To: Martin KaFai Lau, thinker.li; +Cc: kuifeng, bpf, ast, song, kernel-team, andrii On 2/20/24 13:47, Martin KaFai Lau wrote: > On 2/16/24 10:28 AM, thinker.li@gmail.com wrote: >> From: Kui-Feng Lee <thinker.li@gmail.com> >> >> The BPF struct_ops previously only allowed for one page to be used for >> the >> trampolines of all links in a map. However, we have recently run out of >> space due to the large number of BPF program links. By allocating >> additional pages when we exhaust an existing page, we can accommodate >> more >> links in a single map. >> >> The variable st_map->image has been changed to st_map->image_pages, >> and its >> type has been changed to an array of pointers to buffers of PAGE_SIZE. >> The >> array is dynamically resized and additional pages are allocated when all >> existing pages are exhausted. >> >> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> >> --- >> kernel/bpf/bpf_struct_ops.c | 99 ++++++++++++++++++++++++++++++------- >> 1 file changed, 80 insertions(+), 19 deletions(-) >> >> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c >> index 0d7be97a2411..bb7ae665006a 100644 >> --- a/kernel/bpf/bpf_struct_ops.c >> +++ b/kernel/bpf/bpf_struct_ops.c >> @@ -30,12 +30,11 @@ struct bpf_struct_ops_map { >> */ >> struct bpf_link **links; >> u32 links_cnt; >> - /* image is a page that has all the trampolines >> + u32 image_pages_cnt; >> + /* image_pages is an array of pages that has all the trampolines >> * that stores the func args before calling the bpf_prog. >> - * A PAGE_SIZE "image" is enough to store all trampoline for >> - * "links[]". >> */ >> - void *image; >> + void **image_pages; >> /* The owner moduler's btf. */ >> struct btf *btf; >> /* uvalue->data stores the kernel struct >> @@ -535,7 +534,8 @@ static long bpf_struct_ops_map_update_elem(struct >> bpf_map *map, void *key, >> void *udata, *kdata; >> int prog_fd, err; >> void *image, *image_end; >> - u32 i; >> + void **image_pages; >> + u32 i, next_page = 0; >> if (flags) >> return -EINVAL; >> @@ -573,8 +573,8 @@ static long bpf_struct_ops_map_update_elem(struct >> bpf_map *map, void *key, >> udata = &uvalue->data; >> kdata = &kvalue->data; >> - image = st_map->image; >> - image_end = st_map->image + PAGE_SIZE; >> + image = st_map->image_pages[next_page++]; >> + image_end = image + PAGE_SIZE; >> module_type = btf_type_by_id(btf_vmlinux, >> st_ops_ids[IDX_MODULE_ID]); >> for_each_member(i, t, member) { >> @@ -657,6 +657,43 @@ static long bpf_struct_ops_map_update_elem(struct >> bpf_map *map, void *key, >> &st_ops->func_models[i], >> *(void **)(st_ops->cfi_stubs + moff), >> image, image_end); >> + if (err == -E2BIG) { >> + /* Use an additional page to try again. >> + * >> + * It may reuse pages allocated for the previous >> + * failed calls. >> + */ >> + if (next_page >= st_map->image_pages_cnt) { > > This check (more on this later) ... > >> + /* Allocate an additional page */ >> + image_pages = krealloc(st_map->image_pages, >> + (st_map->image_pages_cnt + 1) * >> sizeof(void *), >> + GFP_KERNEL); > > From the patch 2 test, one page is enough for at least 20 ops. How > about keep it simple and allow a max 8 pages which should be much more > than enough for sane usage. (i.e. add "void > *images[MAX_STRUCT_OPS_PAGES];" to "struct bpf_struct_ops_map"). Ok! > >> + if (!image_pages) { >> + err = -ENOMEM; >> + goto reset_unlock; >> + } >> + st_map->image_pages = image_pages; >> + >> + err = bpf_jit_charge_modmem(PAGE_SIZE); >> + if (err) >> + goto reset_unlock; >> + >> + image = arch_alloc_bpf_trampoline(PAGE_SIZE); >> + if (!image) { >> + bpf_jit_uncharge_modmem(PAGE_SIZE); >> + err = -ENOMEM; >> + goto reset_unlock; >> + } >> + st_map->image_pages[st_map->image_pages_cnt++] = image; >> + } >> + image = st_map->image_pages[next_page++]; >> + image_end = image + PAGE_SIZE; >> + >> + err = bpf_struct_ops_prepare_trampoline(tlinks, link, >> + &st_ops->func_models[i], >> + *(void **)(st_ops->cfi_stubs + moff), >> + image, image_end); >> + } >> if (err < 0) >> goto reset_unlock; >> @@ -667,6 +704,18 @@ static long bpf_struct_ops_map_update_elem(struct >> bpf_map *map, void *key, >> *(unsigned long *)(udata + moff) = prog->aux->id; >> } >> + while (next_page < st_map->image_pages_cnt) { > > This check and the above "if (next_page >= st_map->image_pages_cnt)" > should not happen for the common case? > > Together with the new comment after the above "if (err == -E2BIG)", is > it trying to optimize to reuse the pages allocated in the previous > error-ed out map_update_elem() call? > > How about keep it simple for the common case and always free all pages > when map_update_elem() failed? Yes, it is an optimization. So, together with max 8 pages and free all pages when fails, we can remove all these code. > > Also, after this patch, the same calls are used in different places. > > arch_alloc_bpf_trampoline() is done in two different places, one in > map_alloc() and one in map_update_elem(). How about do all the page > alloc in map_update_elem()? > > bpf_struct_ops_prepare_trampoline() is also called in two different > places within the same map_update_elem(). When looking inside the > bpf_struct_ops_prepare_trampoline(), it does call > arch_bpf_trampoline_size() to learn the required size first. > bpf_struct_ops_prepare_trampoline() should be a better place to call > arch_alloc_bpf_trampoline() when needed. Then there is no need to retry > bpf_struct_ops_prepare_trampoline() in map_update_elem()? Agree! > > >> + /* Free unused pages >> + * >> + * The value can not be updated anymore if the value is not >> + * rejected by st_ops->validate() or st_ops->reg(). So, >> + * there is no reason to keep the unused pages. >> + */ >> + bpf_jit_uncharge_modmem(PAGE_SIZE); >> + image = st_map->image_pages[--st_map->image_pages_cnt]; >> + arch_free_bpf_trampoline(image, PAGE_SIZE); >> + } >> + >> if (st_map->map.map_flags & BPF_F_LINK) { >> err = 0; >> if (st_ops->validate) { >> @@ -674,7 +723,9 @@ static long bpf_struct_ops_map_update_elem(struct >> bpf_map *map, void *key, >> if (err) >> goto reset_unlock; >> } >> - arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); >> + for (i = 0; i < next_page; i++) >> + arch_protect_bpf_trampoline(st_map->image_pages[i], >> + PAGE_SIZE); > > arch_protect_bpf_trampoline() is called here for BPF_F_LINK. > >> /* Let bpf_link handle registration & unregistration. >> * >> * Pair with smp_load_acquire() during lookup_elem(). >> @@ -683,7 +734,8 @@ static long bpf_struct_ops_map_update_elem(struct >> bpf_map *map, void *key, >> goto unlock; >> } >> - arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); >> + for (i = 0; i < next_page; i++) >> + arch_protect_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); > > arch_protect_bpf_trampoline() is called here also in the same function > for non BPF_F_LINK. > > Can this be cleaned up a bit? For example, "st_ops->validate(kdata);" > should not be specific to BPF_F_LINK which had been brought up earlier > when making the "->validate" optional. It is a good time to clean this > up also. Ok > > ---- > pw-bot: cr > >> err = st_ops->reg(kdata); >> if (likely(!err)) { >> /* This refcnt increment on the map here after >> @@ -706,7 +758,8 @@ static long bpf_struct_ops_map_update_elem(struct >> bpf_map *map, void *key, >> * there was a race in registering the struct_ops (under the >> same name) to >> * a sub-system through different struct_ops's maps. >> */ >> - arch_unprotect_bpf_trampoline(st_map->image, PAGE_SIZE); >> + for (i = 0; i < next_page; i++) >> + arch_unprotect_bpf_trampoline(st_map->image_pages[i], >> PAGE_SIZE); >> reset_unlock: >> bpf_struct_ops_map_put_progs(st_map); >> @@ -771,14 +824,15 @@ static void >> bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key, >> static void __bpf_struct_ops_map_free(struct bpf_map *map) >> { >> struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map >> *)map; >> + int i; >> if (st_map->links) >> bpf_struct_ops_map_put_progs(st_map); >> bpf_map_area_free(st_map->links); >> - if (st_map->image) { >> - arch_free_bpf_trampoline(st_map->image, PAGE_SIZE); >> - bpf_jit_uncharge_modmem(PAGE_SIZE); >> - } >> + for (i = 0; i < st_map->image_pages_cnt; i++) >> + arch_free_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); >> + bpf_jit_uncharge_modmem(PAGE_SIZE * st_map->image_pages_cnt); >> + kfree(st_map->image_pages); >> bpf_map_area_free(st_map->uvalue); >> bpf_map_area_free(st_map); >> } >> @@ -888,20 +942,27 @@ static struct bpf_map >> *bpf_struct_ops_map_alloc(union bpf_attr *attr) >> st_map->st_ops_desc = st_ops_desc; >> map = &st_map->map; >> + st_map->image_pages = kcalloc(1, sizeof(void *), GFP_KERNEL); >> + if (!st_map->image_pages) { >> + ret = -ENOMEM; >> + goto errout_free; >> + } >> + >> ret = bpf_jit_charge_modmem(PAGE_SIZE); >> if (ret) >> goto errout_free; >> - st_map->image = arch_alloc_bpf_trampoline(PAGE_SIZE); >> - if (!st_map->image) { >> - /* __bpf_struct_ops_map_free() uses st_map->image as flag >> - * for "charged or not". In this case, we need to unchange >> - * here. >> + st_map->image_pages[0] = arch_alloc_bpf_trampoline(PAGE_SIZE); >> + if (!st_map->image_pages[0]) { >> + /* __bpf_struct_ops_map_free() uses st_map->image_pages_cnt >> + * to for uncharging a number of pages. In this case, we >> + * need to uncharge here. >> */ >> bpf_jit_uncharge_modmem(PAGE_SIZE); >> ret = -ENOMEM; >> goto errout_free; >> } >> + st_map->image_pages_cnt = 1; >> st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE); >> st_map->links_cnt = btf_type_vlen(t); >> st_map->links = > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH bpf-next 2/2] selftests/bpf: Test struct_ops maps with a large number of program links. 2024-02-16 18:28 [PATCH bpf-next 0/2] Allow struct_ops maps with a large number of programs thinker.li 2024-02-16 18:28 ` [PATCH bpf-next 1/2] bpf: struct_ops supports more than one page for trampolines thinker.li @ 2024-02-16 18:28 ` thinker.li 1 sibling, 0 replies; 5+ messages in thread From: thinker.li @ 2024-02-16 18:28 UTC (permalink / raw) To: bpf, ast, martin.lau, song, kernel-team, andrii Cc: sinquersw, kuifeng, Kui-Feng Lee From: Kui-Feng Lee <thinker.li@gmail.com> Create and load a struct_ops map with a large number of programs to generate trampolines taking a size over multiple pages. The map includes 40 programs. Their trampolines takes 6.6k+, more than 1.5 pages, on x86. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> --- .../selftests/bpf/bpf_testmod/bpf_testmod.h | 44 ++++++++ .../prog_tests/test_struct_ops_multi_pages.c | 24 +++++ .../bpf/progs/struct_ops_multi_pages.c | 102 ++++++++++++++++++ 3 files changed, 170 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index c3b0cf788f9f..3e6481b2a50a 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -35,6 +35,50 @@ struct bpf_testmod_ops { void (*test_2)(int a, int b); /* Used to test nullable arguments. */ int (*test_maybe_null)(int dummy, struct task_struct *task); + + /* The following pointers are used to test the maps having multiple + * pages of trampolines. + */ + int (*tramp_1)(int value); + int (*tramp_2)(int value); + int (*tramp_3)(int value); + int (*tramp_4)(int value); + int (*tramp_5)(int value); + int (*tramp_6)(int value); + int (*tramp_7)(int value); + int (*tramp_8)(int value); + int (*tramp_9)(int value); + int (*tramp_10)(int value); + int (*tramp_11)(int value); + int (*tramp_12)(int value); + int (*tramp_13)(int value); + int (*tramp_14)(int value); + int (*tramp_15)(int value); + int (*tramp_16)(int value); + int (*tramp_17)(int value); + int (*tramp_18)(int value); + int (*tramp_19)(int value); + int (*tramp_20)(int value); + int (*tramp_21)(int value); + int (*tramp_22)(int value); + int (*tramp_23)(int value); + int (*tramp_24)(int value); + int (*tramp_25)(int value); + int (*tramp_26)(int value); + int (*tramp_27)(int value); + int (*tramp_28)(int value); + int (*tramp_29)(int value); + int (*tramp_30)(int value); + int (*tramp_31)(int value); + int (*tramp_32)(int value); + int (*tramp_33)(int value); + int (*tramp_34)(int value); + int (*tramp_35)(int value); + int (*tramp_36)(int value); + int (*tramp_37)(int value); + int (*tramp_38)(int value); + int (*tramp_39)(int value); + int (*tramp_40)(int value); }; #endif /* _BPF_TESTMOD_H */ diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c new file mode 100644 index 000000000000..9495b21e1021 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <test_progs.h> + +#include "struct_ops_multi_pages.skel.h" + +void test_struct_ops_multi_pages(void) +{ + struct struct_ops_multi_pages *skel; + struct bpf_link *link; + + /* The size of all trampolines of skel->maps.multi_pages should be + * over 1 page (at least for x86). + */ + skel = struct_ops_multi_pages__open_and_load(); + if (!ASSERT_OK_PTR(skel, "struct_ops_multi_pages_open_and_load")) + return; + + link = bpf_map__attach_struct_ops(skel->maps.multi_pages); + ASSERT_OK_PTR(link, "attach_multi_pages"); + + bpf_link__destroy(link); + struct_ops_multi_pages__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c b/tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c new file mode 100644 index 000000000000..9efcc6e4d356 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <vmlinux.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +#define TRAMP(x) \ + SEC("struct_ops/tramp_" #x) \ + int BPF_PROG(tramp_ ## x, int a) \ + { \ + return a; \ + } + +TRAMP(1) +TRAMP(2) +TRAMP(3) +TRAMP(4) +TRAMP(5) +TRAMP(6) +TRAMP(7) +TRAMP(8) +TRAMP(9) +TRAMP(10) +TRAMP(11) +TRAMP(12) +TRAMP(13) +TRAMP(14) +TRAMP(15) +TRAMP(16) +TRAMP(17) +TRAMP(18) +TRAMP(19) +TRAMP(20) +TRAMP(21) +TRAMP(22) +TRAMP(23) +TRAMP(24) +TRAMP(25) +TRAMP(26) +TRAMP(27) +TRAMP(28) +TRAMP(29) +TRAMP(30) +TRAMP(31) +TRAMP(32) +TRAMP(33) +TRAMP(34) +TRAMP(35) +TRAMP(36) +TRAMP(37) +TRAMP(38) +TRAMP(39) +TRAMP(40) + +#define F_TRAMP(x) .tramp_ ## x = (void *)tramp_ ## x + +SEC(".struct_ops.link") +struct bpf_testmod_ops multi_pages = { + F_TRAMP(1), + F_TRAMP(2), + F_TRAMP(3), + F_TRAMP(4), + F_TRAMP(5), + F_TRAMP(6), + F_TRAMP(7), + F_TRAMP(8), + F_TRAMP(9), + F_TRAMP(10), + F_TRAMP(11), + F_TRAMP(12), + F_TRAMP(13), + F_TRAMP(14), + F_TRAMP(15), + F_TRAMP(16), + F_TRAMP(17), + F_TRAMP(18), + F_TRAMP(19), + F_TRAMP(20), + F_TRAMP(21), + F_TRAMP(22), + F_TRAMP(23), + F_TRAMP(24), + F_TRAMP(25), + F_TRAMP(26), + F_TRAMP(27), + F_TRAMP(28), + F_TRAMP(29), + F_TRAMP(30), + F_TRAMP(31), + F_TRAMP(32), + F_TRAMP(33), + F_TRAMP(34), + F_TRAMP(35), + F_TRAMP(36), + F_TRAMP(37), + F_TRAMP(38), + F_TRAMP(39), + F_TRAMP(40), +}; -- 2.34.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-02-20 23:23 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-16 18:28 [PATCH bpf-next 0/2] Allow struct_ops maps with a large number of programs thinker.li 2024-02-16 18:28 ` [PATCH bpf-next 1/2] bpf: struct_ops supports more than one page for trampolines thinker.li 2024-02-20 21:47 ` Martin KaFai Lau 2024-02-20 23:23 ` Kui-Feng Lee 2024-02-16 18:28 ` [PATCH bpf-next 2/2] selftests/bpf: Test struct_ops maps with a large number of program links thinker.li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox