* [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
@ 2024-04-11 13:11 Yafang Shao
2024-04-11 13:11 ` [PATCH bpf-next v6 1/2] bpf: Add " Yafang Shao
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Yafang Shao @ 2024-04-11 13:11 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa
Cc: bpf, Yafang Shao
Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
added for the new bpf_iter_bits functionality. These kfuncs enable the
iteration of the bits from a given address and a given number of bits.
- bpf_iter_bits_new
Initialize a new bits iterator for a given memory area. Due to the
limitation of bpf memalloc, the max number of bits to be iterated
over is (4096 * 8).
- bpf_iter_bits_next
Get the next bit in a bpf_iter_bits
- bpf_iter_bits_destroy
Destroy a bpf_iter_bits
The bits iterator can be used in any context and on any address.
Changes:
- v5->v6:
- Add positive tests (Andrii)
- v4->v5:
- Simplify test cases (Andrii)
- v3->v4:
- Fix endianness error on s390x (Andrii)
- zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
- v2->v3:
- Optimization for u64/u32 mask (Andrii)
- v1->v2:
- Simplify the CPU number verification code to avoid the failure on s390x
(Eduard)
- bpf: Add bpf_iter_cpumask
https://lwn.net/Articles/961104/
- bpf: Add new bpf helper bpf_for_each_cpu
https://lwn.net/Articles/939939/
Yafang Shao (2):
bpf: Add bits iterator
selftests/bpf: Add selftest for bits iter
kernel/bpf/helpers.c | 120 +++++++++++++++++
.../selftests/bpf/prog_tests/verifier.c | 2 +
.../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++
3 files changed, 249 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
--
2.39.1
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH bpf-next v6 1/2] bpf: Add bits iterator 2024-04-11 13:11 [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao @ 2024-04-11 13:11 ` Yafang Shao 2024-04-11 13:11 ` [PATCH bpf-next v6 2/2] selftests/bpf: Add selftest for bits iter Yafang Shao 2024-04-11 13:50 ` [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao 2 siblings, 0 replies; 15+ messages in thread From: Yafang Shao @ 2024-04-11 13:11 UTC (permalink / raw) To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa Cc: bpf, Yafang Shao Add three new kfuncs for the bits iterator: - bpf_iter_bits_new Initialize a new bits iterator for a given memory area. Due to the limitation of bpf memalloc, the max number of bits that can be iterated over is limited to (4096 * 8). - bpf_iter_bits_next Get the next bit in a bpf_iter_bits - bpf_iter_bits_destroy Destroy a bpf_iter_bits The bits iterator facilitates the iteration of the bits of a memory area, such as cpumask. It can be used in any context and on any address. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> --- kernel/bpf/helpers.c | 120 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 8cde717137bd..421e42663736 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2549,6 +2549,123 @@ __bpf_kfunc void bpf_throw(u64 cookie) WARN(1, "A call to BPF exception callback should never return\n"); } +struct bpf_iter_bits { + __u64 __opaque[2]; +} __aligned(8); + +struct bpf_iter_bits_kern { + union { + unsigned long *bits; + unsigned long bits_copy; + }; + u32 nr_bits; + int bit; +} __aligned(8); + +/** + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area + * @it: The new bpf_iter_bits to be created + * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over + * @nr_bits: The number of bits to be iterated over. Due to the limitation of + * memalloc, it can't greater than (4096 * 8). + * + * This function initializes a new bpf_iter_bits structure for iterating over + * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It + * copy the data of the memory area to the newly created bpf_iter_bits @it for + * subsequent iteration operations. + * + * On success, 0 is returned. On failure, ERR is returned. + */ +__bpf_kfunc int +bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits) +{ + u32 size = BITS_TO_LONGS(nr_bits) * sizeof(unsigned long); + struct bpf_iter_bits_kern *kit = (void *)it; + int err; + + BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) != + __alignof__(struct bpf_iter_bits)); + + if (!unsafe_ptr__ign || !nr_bits) { + kit->bits = NULL; + return -EINVAL; + } + + kit->nr_bits = 0; + kit->bits_copy = 0; + /* Optimization for u64/u32 mask */ + if (nr_bits <= 64) { + err = bpf_probe_read_kernel_common(&kit->bits_copy, size, unsafe_ptr__ign); + if (err) + return -EFAULT; + + kit->nr_bits = nr_bits; + kit->bit = -1; + return 0; + } + + /* Fallback to memalloc */ + kit->bits = bpf_mem_alloc(&bpf_global_ma, size); + if (!kit->bits) + return -ENOMEM; + + err = bpf_probe_read_kernel_common(kit->bits, size, unsafe_ptr__ign); + if (err) { + bpf_mem_free(&bpf_global_ma, kit->bits); + return err; + } + + kit->nr_bits = nr_bits; + kit->bit = -1; + return 0; +} + +/** + * bpf_iter_bits_next() - Get the next bit in a bpf_iter_bits + * @it: The bpf_iter_bits to be checked + * + * This function returns a pointer to a number representing the value of the + * next bit in the bits. + * + * If there are no further bit available, it returns NULL. + */ +__bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it) +{ + struct bpf_iter_bits_kern *kit = (void *)it; + u32 nr_bits = kit->nr_bits; + const unsigned long *bits; + int bit; + + if (nr_bits == 0) + return NULL; + + bits = nr_bits <= 64 ? &kit->bits_copy : kit->bits; + bit = find_next_bit(bits, nr_bits, kit->bit + 1); + if (bit >= nr_bits) { + kit->nr_bits = 0; + return NULL; + } + + kit->bit = bit; + return &kit->bit; +} + +/** + * bpf_iter_bits_destroy() - Destroy a bpf_iter_bits + * @it: The bpf_iter_bits to be destroyed + * + * Destroy the resource associated with the bpf_iter_bits. + */ +__bpf_kfunc void bpf_iter_bits_destroy(struct bpf_iter_bits *it) +{ + struct bpf_iter_bits_kern *kit = (void *)it; + + if (kit->nr_bits <= 64) + return; + bpf_mem_free(&bpf_global_ma, kit->bits); +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(generic_btf_ids) @@ -2626,6 +2743,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) BTF_ID_FLAGS(func, bpf_dynptr_size) BTF_ID_FLAGS(func, bpf_dynptr_clone) BTF_ID_FLAGS(func, bpf_modify_return_test_tp) +BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW) +BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(common_btf_ids) static const struct btf_kfunc_id_set common_kfunc_set = { -- 2.39.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH bpf-next v6 2/2] selftests/bpf: Add selftest for bits iter 2024-04-11 13:11 [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao 2024-04-11 13:11 ` [PATCH bpf-next v6 1/2] bpf: Add " Yafang Shao @ 2024-04-11 13:11 ` Yafang Shao 2024-04-11 13:50 ` [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao 2 siblings, 0 replies; 15+ messages in thread From: Yafang Shao @ 2024-04-11 13:11 UTC (permalink / raw) To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa Cc: bpf, Yafang Shao Add test cases for the bits iter: - positive case - bit mask smaller than 8 bytes - a typical case of having 8-byte bit mask - another typical case where bit mask is > 8 bytes - the index of set bit - nagative cases - bpf_iter_bits_destroy() is required after calling bpf_iter_bits_new() - bpf_iter_bits_destroy() can only destroy an initialized iter - bpf_iter_bits_next() must use an initialized iter Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- .../selftests/bpf/prog_tests/verifier.c | 2 + .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ 2 files changed, 129 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c index c4f9f306646e..7e04ecaaa20a 100644 --- a/tools/testing/selftests/bpf/prog_tests/verifier.c +++ b/tools/testing/selftests/bpf/prog_tests/verifier.c @@ -84,6 +84,7 @@ #include "verifier_xadd.skel.h" #include "verifier_xdp.skel.h" #include "verifier_xdp_direct_packet_access.skel.h" +#include "verifier_bits_iter.skel.h" #define MAX_ENTRIES 11 @@ -198,6 +199,7 @@ void test_verifier_var_off(void) { RUN(verifier_var_off); } void test_verifier_xadd(void) { RUN(verifier_xadd); } void test_verifier_xdp(void) { RUN(verifier_xdp); } void test_verifier_xdp_direct_packet_access(void) { RUN(verifier_xdp_direct_packet_access); } +void test_verifier_bits_iter(void) { RUN(verifier_bits_iter); } static int init_test_val_map(struct bpf_object *obj, char *map_name) { diff --git a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c new file mode 100644 index 000000000000..2a02540cfd26 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c @@ -0,0 +1,127 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024 Yafang Shao <laoar.shao@gmail.com> */ + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +#include "bpf_misc.h" +#include "task_kfunc_common.h" + +char _license[] SEC("license") = "GPL"; + +struct data_t { + u64 a; + u32 b; +}; + +int bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, + u32 nr_bits) __ksym __weak; +int *bpf_iter_bits_next(struct bpf_iter_bits *it) __ksym __weak; +void bpf_iter_bits_destroy(struct bpf_iter_bits *it) __ksym __weak; + +SEC("iter.s/cgroup") +__description("bits iter without destroy") +__failure __msg("Unreleased reference") +int BPF_PROG(no_destroy, struct bpf_iter_meta *meta, struct cgroup *cgrp) +{ + struct bpf_iter_bits it; + struct task_struct *p; + + p = bpf_task_from_pid(1); + if (!p) + return 1; + + bpf_iter_bits_new(&it, p->cpus_ptr, 8192); + + bpf_iter_bits_next(&it); + bpf_task_release(p); + return 0; +} + +SEC("iter/cgroup") +__description("bits iter with uninitialized iter in ->next()") +__failure __msg("expected an initialized iter_bits as arg #1") +int BPF_PROG(next_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp) +{ + struct bpf_iter_bits *it = NULL; + + bpf_iter_bits_next(it); + return 0; +} + +SEC("iter/cgroup") +__description("bits iter with uninitialized iter in ->destroy()") +__failure __msg("expected an initialized iter_bits as arg #1") +int BPF_PROG(destroy_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp) +{ + struct bpf_iter_bits it = {}; + + bpf_iter_bits_destroy(&it); + return 0; +} + +SEC("syscall") +__description("bits copy 32") +__success __retval(10) +int bits_copy32(void) +{ + /* 21 bits: --------------------- */ + u32 data = 0b11111101111101111100001000100101U; + int nr = 0; + int *bit; + + bpf_for_each(bits, bit, &data, 21) + nr++; + return nr; +} + +SEC("syscall") +__description("bits copy 64") +__success __retval(18) +int bits_copy64(void) +{ + /* 34 bits: ~-------- */ + u64 data = 0xffffefdf0f0f0f0fUL; + int nr = 0; + int *bit; + + bpf_for_each(bits, bit, &data, 34) + nr++; + return nr; +} + +SEC("syscall") +__description("bits memalloc") +__success __retval(56) +int bits_memalloc(void) +{ + struct data_t data = { + .a = 0xaaaaaaaaaaaaaaaaUL, /* 32 bits are set */ + .b = 0xbbbbbbbbU, /* 24 bits are set */ + }; + int nr = 0; + int *bit; + + bpf_for_each(bits, bit, &data, 96) + nr++; + return nr; +} + +SEC("syscall") +__description("bit index") +__success __retval(8) +int bit_index(void) +{ + u64 data = 0x100; + int bit_idx = 0; + int *bit; + + bpf_for_each(bits, bit, &data, 64) { + if (*bit == 0) + continue; + bit_idx = *bit; + } + return bit_idx; +} + -- 2.39.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-11 13:11 [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao 2024-04-11 13:11 ` [PATCH bpf-next v6 1/2] bpf: Add " Yafang Shao 2024-04-11 13:11 ` [PATCH bpf-next v6 2/2] selftests/bpf: Add selftest for bits iter Yafang Shao @ 2024-04-11 13:50 ` Yafang Shao 2024-04-25 0:34 ` Andrii Nakryiko 2 siblings, 1 reply; 15+ messages in thread From: Yafang Shao @ 2024-04-11 13:50 UTC (permalink / raw) To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa Cc: bpf On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > added for the new bpf_iter_bits functionality. These kfuncs enable the > iteration of the bits from a given address and a given number of bits. > > - bpf_iter_bits_new > Initialize a new bits iterator for a given memory area. Due to the > limitation of bpf memalloc, the max number of bits to be iterated > over is (4096 * 8). > - bpf_iter_bits_next > Get the next bit in a bpf_iter_bits > - bpf_iter_bits_destroy > Destroy a bpf_iter_bits > > The bits iterator can be used in any context and on any address. > > Changes: > - v5->v6: > - Add positive tests (Andrii) > - v4->v5: > - Simplify test cases (Andrii) > - v3->v4: > - Fix endianness error on s390x (Andrii) > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > - v2->v3: > - Optimization for u64/u32 mask (Andrii) > - v1->v2: > - Simplify the CPU number verification code to avoid the failure on s390x > (Eduard) > - bpf: Add bpf_iter_cpumask > https://lwn.net/Articles/961104/ > - bpf: Add new bpf helper bpf_for_each_cpu > https://lwn.net/Articles/939939/ > > Yafang Shao (2): > bpf: Add bits iterator > selftests/bpf: Add selftest for bits iter > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > .../selftests/bpf/prog_tests/verifier.c | 2 + > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > 3 files changed, 249 insertions(+) > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > -- > 2.39.1 > It appears that the test case failed on s390x when the data is a u32 value because we need to set the higher 32 bits. will analyze it. -- Regards Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-11 13:50 ` [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao @ 2024-04-25 0:34 ` Andrii Nakryiko 2024-04-25 5:36 ` Yafang Shao 0 siblings, 1 reply; 15+ messages in thread From: Andrii Nakryiko @ 2024-04-25 0:34 UTC (permalink / raw) To: Yafang Shao Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > iteration of the bits from a given address and a given number of bits. > > > > - bpf_iter_bits_new > > Initialize a new bits iterator for a given memory area. Due to the > > limitation of bpf memalloc, the max number of bits to be iterated > > over is (4096 * 8). > > - bpf_iter_bits_next > > Get the next bit in a bpf_iter_bits > > - bpf_iter_bits_destroy > > Destroy a bpf_iter_bits > > > > The bits iterator can be used in any context and on any address. > > > > Changes: > > - v5->v6: > > - Add positive tests (Andrii) > > - v4->v5: > > - Simplify test cases (Andrii) > > - v3->v4: > > - Fix endianness error on s390x (Andrii) > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > - v2->v3: > > - Optimization for u64/u32 mask (Andrii) > > - v1->v2: > > - Simplify the CPU number verification code to avoid the failure on s390x > > (Eduard) > > - bpf: Add bpf_iter_cpumask > > https://lwn.net/Articles/961104/ > > - bpf: Add new bpf helper bpf_for_each_cpu > > https://lwn.net/Articles/939939/ > > > > Yafang Shao (2): > > bpf: Add bits iterator > > selftests/bpf: Add selftest for bits iter > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > 3 files changed, 249 insertions(+) > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > -- > > 2.39.1 > > > > It appears that the test case failed on s390x when the data is > a u32 value because we need to set the higher 32 bits. > will analyze it. > Hey Yafang, did you get a chance to debug and fix the issue? > > -- > Regards > Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-25 0:34 ` Andrii Nakryiko @ 2024-04-25 5:36 ` Yafang Shao 2024-04-25 6:05 ` Yonghong Song 2024-04-25 18:15 ` Andrii Nakryiko 0 siblings, 2 replies; 15+ messages in thread From: Yafang Shao @ 2024-04-25 5:36 UTC (permalink / raw) To: Andrii Nakryiko Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf [-- Attachment #1: Type: text/plain, Size: 3372 bytes --] On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > iteration of the bits from a given address and a given number of bits. > > > > > > - bpf_iter_bits_new > > > Initialize a new bits iterator for a given memory area. Due to the > > > limitation of bpf memalloc, the max number of bits to be iterated > > > over is (4096 * 8). > > > - bpf_iter_bits_next > > > Get the next bit in a bpf_iter_bits > > > - bpf_iter_bits_destroy > > > Destroy a bpf_iter_bits > > > > > > The bits iterator can be used in any context and on any address. > > > > > > Changes: > > > - v5->v6: > > > - Add positive tests (Andrii) > > > - v4->v5: > > > - Simplify test cases (Andrii) > > > - v3->v4: > > > - Fix endianness error on s390x (Andrii) > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > - v2->v3: > > > - Optimization for u64/u32 mask (Andrii) > > > - v1->v2: > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > (Eduard) > > > - bpf: Add bpf_iter_cpumask > > > https://lwn.net/Articles/961104/ > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > https://lwn.net/Articles/939939/ > > > > > > Yafang Shao (2): > > > bpf: Add bits iterator > > > selftests/bpf: Add selftest for bits iter > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > 3 files changed, 249 insertions(+) > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > -- > > > 2.39.1 > > > > > > > It appears that the test case failed on s390x when the data is > > a u32 value because we need to set the higher 32 bits. > > will analyze it. > > > > Hey Yafang, did you get a chance to debug and fix the issue? > Hi Andrii, Apologies for the delay; I recently returned from an extended holiday. The issue stems from the limitations of bpf_probe_read_kernel() on s390 architecture. The attachment provides a straightforward example to illustrate this issue. The observed results are as follows: Error: #463/1 verifier_probe_read/probe read 4 bytes 8897 run_subtest:PASS:obj_open_mem 0 nsec 8898 run_subtest:PASS:unexpected_load_failure 0 nsec 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 Error: #463/2 verifier_probe_read/probe read 8 bytes 8903 run_subtest:PASS:obj_open_mem 0 nsec 8904 run_subtest:PASS:unexpected_load_failure 0 nsec 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 Should we consider this behavior of bpf_probe_read_kernel() as expected, or is it something that requires fixing? -- Regards Yafang [-- Attachment #2: 0001-selftests-bpf-Add-test-for-probe_read_kernel.patch --] [-- Type: application/octet-stream, Size: 2438 bytes --] From ba6bbbbc2648ac3741458a62f336694415d49d52 Mon Sep 17 00:00:00 2001 From: Yafang Shao <laoar.shao@gmail.com> Date: Thu, 25 Apr 2024 13:07:47 +0800 Subject: [PATCH] selftests/bpf: Add test for probe_read_kernel It will fail on s390 which is big-endian. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> --- .../selftests/bpf/prog_tests/verifier.c | 2 + .../selftests/bpf/progs/verifier_probe_read.c | 37 +++++++++++++++++++ 2 files changed, 39 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/verifier_probe_read.c diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c index c4f9f306646e..edac67d43447 100644 --- a/tools/testing/selftests/bpf/prog_tests/verifier.c +++ b/tools/testing/selftests/bpf/prog_tests/verifier.c @@ -84,6 +84,7 @@ #include "verifier_xadd.skel.h" #include "verifier_xdp.skel.h" #include "verifier_xdp_direct_packet_access.skel.h" +#include "verifier_probe_read.skel.h" #define MAX_ENTRIES 11 @@ -198,6 +199,7 @@ void test_verifier_var_off(void) { RUN(verifier_var_off); } void test_verifier_xadd(void) { RUN(verifier_xadd); } void test_verifier_xdp(void) { RUN(verifier_xdp); } void test_verifier_xdp_direct_packet_access(void) { RUN(verifier_xdp_direct_packet_access); } +void test_verifier_probe_read(void) { RUN(verifier_probe_read); } static int init_test_val_map(struct bpf_object *obj, char *map_name) { diff --git a/tools/testing/selftests/bpf/progs/verifier_probe_read.c b/tools/testing/selftests/bpf/progs/verifier_probe_read.c new file mode 100644 index 000000000000..d0a7c9109960 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/verifier_probe_read.c @@ -0,0 +1,37 @@ +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +#include "bpf_misc.h" + +char _license[] SEC("license") = "GPL"; + +SEC("syscall") +__description("probe read 4 bytes") +__success __retval(0x200) +u64 probe_read_4(void) +{ + u32 data = 0x200; + u64 data_dst; + int err; + + err = bpf_probe_read_kernel(&data_dst, 4, &data); + if (err) + return 0; + return data_dst; +} + +SEC("syscall") +__description("probe read 8 bytes") +__success __retval(0x200) +u64 probe_read_8(void) +{ + u32 data = 0x200; + u64 data_dst; + int err; + + err = bpf_probe_read_kernel(&data_dst, 8, &data); + if (err) + return 0; + return data_dst; +} -- 2.39.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-25 5:36 ` Yafang Shao @ 2024-04-25 6:05 ` Yonghong Song 2024-04-25 8:03 ` Yafang Shao 2024-04-25 18:15 ` Andrii Nakryiko 1 sibling, 1 reply; 15+ messages in thread From: Yonghong Song @ 2024-04-25 6:05 UTC (permalink / raw) To: Yafang Shao, Andrii Nakryiko Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, kpsingh, sdf, haoluo, jolsa, bpf On 4/24/24 10:36 PM, Yafang Shao wrote: > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: >> On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: >>> On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: >>>> Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been >>>> added for the new bpf_iter_bits functionality. These kfuncs enable the >>>> iteration of the bits from a given address and a given number of bits. >>>> >>>> - bpf_iter_bits_new >>>> Initialize a new bits iterator for a given memory area. Due to the >>>> limitation of bpf memalloc, the max number of bits to be iterated >>>> over is (4096 * 8). >>>> - bpf_iter_bits_next >>>> Get the next bit in a bpf_iter_bits >>>> - bpf_iter_bits_destroy >>>> Destroy a bpf_iter_bits >>>> >>>> The bits iterator can be used in any context and on any address. >>>> >>>> Changes: >>>> - v5->v6: >>>> - Add positive tests (Andrii) >>>> - v4->v5: >>>> - Simplify test cases (Andrii) >>>> - v3->v4: >>>> - Fix endianness error on s390x (Andrii) >>>> - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) >>>> - v2->v3: >>>> - Optimization for u64/u32 mask (Andrii) >>>> - v1->v2: >>>> - Simplify the CPU number verification code to avoid the failure on s390x >>>> (Eduard) >>>> - bpf: Add bpf_iter_cpumask >>>> https://lwn.net/Articles/961104/ >>>> - bpf: Add new bpf helper bpf_for_each_cpu >>>> https://lwn.net/Articles/939939/ >>>> >>>> Yafang Shao (2): >>>> bpf: Add bits iterator >>>> selftests/bpf: Add selftest for bits iter >>>> >>>> kernel/bpf/helpers.c | 120 +++++++++++++++++ >>>> .../selftests/bpf/prog_tests/verifier.c | 2 + >>>> .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ >>>> 3 files changed, 249 insertions(+) >>>> create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c >>>> >>>> -- >>>> 2.39.1 >>>> >>> It appears that the test case failed on s390x when the data is >>> a u32 value because we need to set the higher 32 bits. >>> will analyze it. >>> >> Hey Yafang, did you get a chance to debug and fix the issue? >> > Hi Andrii, > > Apologies for the delay; I recently returned from an extended holiday. > > The issue stems from the limitations of bpf_probe_read_kernel() on > s390 architecture. The attachment provides a straightforward example > to illustrate this issue. The observed results are as follows: > > Error: #463/1 verifier_probe_read/probe read 4 bytes > 8897 run_subtest:PASS:obj_open_mem 0 nsec > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > Error: #463/2 verifier_probe_read/probe read 8 bytes > 8903 run_subtest:PASS:obj_open_mem 0 nsec > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > Should we consider this behavior of bpf_probe_read_kernel() as > expected, or is it something that requires fixing? Maybe to guard the result with macros like __s390x__ to differentiate s390 vs. arm64/x86_64? There are some examples in prog_tests/* having such guards. probe_user.c:#if defined(__s390x__) test_bpf_syscall_macro.c:#if defined(__aarch64__) || defined(__s390__) xdp_adjust_tail.c:#if defined(__s390x__) xdp_do_redirect.c:#if defined(__s390x__) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-25 6:05 ` Yonghong Song @ 2024-04-25 8:03 ` Yafang Shao 0 siblings, 0 replies; 15+ messages in thread From: Yafang Shao @ 2024-04-25 8:03 UTC (permalink / raw) To: Yonghong Song Cc: Andrii Nakryiko, ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, kpsingh, sdf, haoluo, jolsa, bpf On Thu, Apr 25, 2024 at 2:05 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 4/24/24 10:36 PM, Yafang Shao wrote: > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > >> On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > >>> On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > >>>> Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > >>>> added for the new bpf_iter_bits functionality. These kfuncs enable the > >>>> iteration of the bits from a given address and a given number of bits. > >>>> > >>>> - bpf_iter_bits_new > >>>> Initialize a new bits iterator for a given memory area. Due to the > >>>> limitation of bpf memalloc, the max number of bits to be iterated > >>>> over is (4096 * 8). > >>>> - bpf_iter_bits_next > >>>> Get the next bit in a bpf_iter_bits > >>>> - bpf_iter_bits_destroy > >>>> Destroy a bpf_iter_bits > >>>> > >>>> The bits iterator can be used in any context and on any address. > >>>> > >>>> Changes: > >>>> - v5->v6: > >>>> - Add positive tests (Andrii) > >>>> - v4->v5: > >>>> - Simplify test cases (Andrii) > >>>> - v3->v4: > >>>> - Fix endianness error on s390x (Andrii) > >>>> - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > >>>> - v2->v3: > >>>> - Optimization for u64/u32 mask (Andrii) > >>>> - v1->v2: > >>>> - Simplify the CPU number verification code to avoid the failure on s390x > >>>> (Eduard) > >>>> - bpf: Add bpf_iter_cpumask > >>>> https://lwn.net/Articles/961104/ > >>>> - bpf: Add new bpf helper bpf_for_each_cpu > >>>> https://lwn.net/Articles/939939/ > >>>> > >>>> Yafang Shao (2): > >>>> bpf: Add bits iterator > >>>> selftests/bpf: Add selftest for bits iter > >>>> > >>>> kernel/bpf/helpers.c | 120 +++++++++++++++++ > >>>> .../selftests/bpf/prog_tests/verifier.c | 2 + > >>>> .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > >>>> 3 files changed, 249 insertions(+) > >>>> create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > >>>> > >>>> -- > >>>> 2.39.1 > >>>> > >>> It appears that the test case failed on s390x when the data is > >>> a u32 value because we need to set the higher 32 bits. > >>> will analyze it. > >>> > >> Hey Yafang, did you get a chance to debug and fix the issue? > >> > > Hi Andrii, > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > s390 architecture. The attachment provides a straightforward example > > to illustrate this issue. The observed results are as follows: > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > expected, or is it something that requires fixing? > > Maybe to guard the result with macros like __s390x__ to differentiate > s390 vs. arm64/x86_64? There are some examples in prog_tests/* having > such guards. probe_user.c:#if defined(__s390x__) > test_bpf_syscall_macro.c:#if defined(__aarch64__) || defined(__s390__) > xdp_adjust_tail.c:#if defined(__s390x__) xdp_do_redirect.c:#if > defined(__s390x__) > That's feasible. Thank you for your suggestion. I'll make the necessary changes. -- Regards Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-25 5:36 ` Yafang Shao 2024-04-25 6:05 ` Yonghong Song @ 2024-04-25 18:15 ` Andrii Nakryiko 2024-04-26 5:04 ` Yafang Shao 1 sibling, 1 reply; 15+ messages in thread From: Andrii Nakryiko @ 2024-04-25 18:15 UTC (permalink / raw) To: Yafang Shao Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > - bpf_iter_bits_new > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > over is (4096 * 8). > > > > - bpf_iter_bits_next > > > > Get the next bit in a bpf_iter_bits > > > > - bpf_iter_bits_destroy > > > > Destroy a bpf_iter_bits > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > Changes: > > > > - v5->v6: > > > > - Add positive tests (Andrii) > > > > - v4->v5: > > > > - Simplify test cases (Andrii) > > > > - v3->v4: > > > > - Fix endianness error on s390x (Andrii) > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > - v2->v3: > > > > - Optimization for u64/u32 mask (Andrii) > > > > - v1->v2: > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > (Eduard) > > > > - bpf: Add bpf_iter_cpumask > > > > https://lwn.net/Articles/961104/ > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > https://lwn.net/Articles/939939/ > > > > > > > > Yafang Shao (2): > > > > bpf: Add bits iterator > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > 3 files changed, 249 insertions(+) > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > -- > > > > 2.39.1 > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > a u32 value because we need to set the higher 32 bits. > > > will analyze it. > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > Hi Andrii, > > Apologies for the delay; I recently returned from an extended holiday. > > The issue stems from the limitations of bpf_probe_read_kernel() on > s390 architecture. The attachment provides a straightforward example > to illustrate this issue. The observed results are as follows: > > Error: #463/1 verifier_probe_read/probe read 4 bytes > 8897 run_subtest:PASS:obj_open_mem 0 nsec > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > Error: #463/2 verifier_probe_read/probe read 8 bytes > 8903 run_subtest:PASS:obj_open_mem 0 nsec > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > Should we consider this behavior of bpf_probe_read_kernel() as > expected, or is it something that requires fixing? > I might be missing something, but there is nothing wrong with bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" you are reading (upper) 4 bytes of garbage from uninitialized data_dst. So getting back to iter implementation. Make sure you are zero-initializing that u64 value you are reading into? > -- > Regards > Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-25 18:15 ` Andrii Nakryiko @ 2024-04-26 5:04 ` Yafang Shao 2024-04-26 16:50 ` Andrii Nakryiko 0 siblings, 1 reply; 15+ messages in thread From: Yafang Shao @ 2024-04-26 5:04 UTC (permalink / raw) To: Andrii Nakryiko Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > - bpf_iter_bits_new > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > over is (4096 * 8). > > > > > - bpf_iter_bits_next > > > > > Get the next bit in a bpf_iter_bits > > > > > - bpf_iter_bits_destroy > > > > > Destroy a bpf_iter_bits > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > Changes: > > > > > - v5->v6: > > > > > - Add positive tests (Andrii) > > > > > - v4->v5: > > > > > - Simplify test cases (Andrii) > > > > > - v3->v4: > > > > > - Fix endianness error on s390x (Andrii) > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > - v2->v3: > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > - v1->v2: > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > (Eduard) > > > > > - bpf: Add bpf_iter_cpumask > > > > > https://lwn.net/Articles/961104/ > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > Yafang Shao (2): > > > > > bpf: Add bits iterator > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > 3 files changed, 249 insertions(+) > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > -- > > > > > 2.39.1 > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > a u32 value because we need to set the higher 32 bits. > > > > will analyze it. > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > Hi Andrii, > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > s390 architecture. The attachment provides a straightforward example > > to illustrate this issue. The observed results are as follows: > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > expected, or is it something that requires fixing? > > > > I might be missing something, but there is nothing wrong with > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > you are reading (upper) 4 bytes of garbage from uninitialized > data_dst. The issue doesn't lie with the dst but rather with the src. Even after initializing the destination, the operation still fails. You can find more details in the following link: https://github.com/kernel-patches/bpf/pull/6882. It appears that bpf_probe_read_kernel() encounters difficulties when dealing with non-long-aligned source addresses. > > So getting back to iter implementation. Make sure you are > zero-initializing that u64 value you are reading into? > It has been zero-initialized: + kit->nr_bits = 0; + kit->bits_copy = 0; -- Regards Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-26 5:04 ` Yafang Shao @ 2024-04-26 16:50 ` Andrii Nakryiko 2024-04-28 13:47 ` Yafang Shao 0 siblings, 1 reply; 15+ messages in thread From: Andrii Nakryiko @ 2024-04-26 16:50 UTC (permalink / raw) To: Yafang Shao Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > over is (4096 * 8). > > > > > > - bpf_iter_bits_next > > > > > > Get the next bit in a bpf_iter_bits > > > > > > - bpf_iter_bits_destroy > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > Changes: > > > > > > - v5->v6: > > > > > > - Add positive tests (Andrii) > > > > > > - v4->v5: > > > > > > - Simplify test cases (Andrii) > > > > > > - v3->v4: > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > - v2->v3: > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > - v1->v2: > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > (Eduard) > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > https://lwn.net/Articles/961104/ > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > Yafang Shao (2): > > > > > > bpf: Add bits iterator > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > 3 files changed, 249 insertions(+) > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > -- > > > > > > 2.39.1 > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > a u32 value because we need to set the higher 32 bits. > > > > > will analyze it. > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > Hi Andrii, > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > s390 architecture. The attachment provides a straightforward example > > > to illustrate this issue. The observed results are as follows: > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > expected, or is it something that requires fixing? > > > > > > > I might be missing something, but there is nothing wrong with > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > you are reading (upper) 4 bytes of garbage from uninitialized > > data_dst. > > The issue doesn't lie with the dst but rather with the src. Even after > initializing the destination, the operation still fails. You can find Are you sure the operation "fails"? If it would fail, you'd get a negative error code, but you are getting zero. Which actually makes sense. I think you are just getting confused by big endianness of s390x, and there is nothing wrong with bpf_probe_read_kernel(). In both of your tests (I pasted your code below, it would be better if you did it in your initial emails) you end up with 0x200 in *upper* 32 bits (on big endian) and lower bits are zeros. And __retval thing is 32-bit (despite BPF program returning long), so this return value is truncated to *lower* 32-bits, which are, expectedly, zeroes. So I think everything works as expected, but your tests (at least) don't handle the big-endian arch well. __description("probe read 4 bytes") __success __retval(0x200) long probe_read_4(void) { int data = 0x200; long data_dst = 0; int err; err = bpf_probe_read_kernel(&data_dst, 4, &data); if (err) return err; return data_dst; } SEC("syscall") __description("probe read 8 bytes") __success __retval(0x200) long probe_read_8(void) { int data = 0x200; long data_dst = 0; int err; err = bpf_probe_read_kernel(&data_dst, 8, &data); if (err) return err; return data_dst; } > more details in the following link: > https://github.com/kernel-patches/bpf/pull/6882. It appears that > bpf_probe_read_kernel() encounters difficulties when dealing with > non-long-aligned source addresses. > > > > > So getting back to iter implementation. Make sure you are > > zero-initializing that u64 value you are reading into? > > > > It has been zero-initialized: > > + kit->nr_bits = 0; > + kit->bits_copy = 0; > ok, then the problem is somewhere else, but it doesn't seem to be in bpf_probe_read_kernel(). I'm forgetting what was the original test failure for your patch set, but please double check again, taking into account the big endianness of s390x. > -- > Regards > Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-26 16:50 ` Andrii Nakryiko @ 2024-04-28 13:47 ` Yafang Shao 2024-04-29 16:27 ` Andrii Nakryiko 0 siblings, 1 reply; 15+ messages in thread From: Yafang Shao @ 2024-04-28 13:47 UTC (permalink / raw) To: Andrii Nakryiko Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > > over is (4096 * 8). > > > > > > > - bpf_iter_bits_next > > > > > > > Get the next bit in a bpf_iter_bits > > > > > > > - bpf_iter_bits_destroy > > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > > > Changes: > > > > > > > - v5->v6: > > > > > > > - Add positive tests (Andrii) > > > > > > > - v4->v5: > > > > > > > - Simplify test cases (Andrii) > > > > > > > - v3->v4: > > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > > - v2->v3: > > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > > - v1->v2: > > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > > (Eduard) > > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > > https://lwn.net/Articles/961104/ > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > > > Yafang Shao (2): > > > > > > > bpf: Add bits iterator > > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > > 3 files changed, 249 insertions(+) > > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > > > -- > > > > > > > 2.39.1 > > > > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > > a u32 value because we need to set the higher 32 bits. > > > > > > will analyze it. > > > > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > > > > Hi Andrii, > > > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > > s390 architecture. The attachment provides a straightforward example > > > > to illustrate this issue. The observed results are as follows: > > > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > > expected, or is it something that requires fixing? > > > > > > > > > > I might be missing something, but there is nothing wrong with > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > > you are reading (upper) 4 bytes of garbage from uninitialized > > > data_dst. > > > > The issue doesn't lie with the dst but rather with the src. Even after > > initializing the destination, the operation still fails. You can find > > Are you sure the operation "fails"? If it would fail, you'd get a > negative error code, but you are getting zero. Which actually makes > sense. > > I think you are just getting confused by big endianness of s390x, and > there is nothing wrong with bpf_probe_read_kernel(). > > In both of your tests (I pasted your code below, it would be better if > you did it in your initial emails) you end up with 0x200 in *upper* 32 > bits (on big endian) and lower bits are zeros. And __retval thing is > 32-bit (despite BPF program returning long), so this return value is > truncated to *lower* 32-bits, which are, expectedly, zeroes. Thank you for clarifying. The presence of the 32-bit __retval led to my misunderstanding :( > > So I think everything works as expected, but your tests (at least) > don't handle the big-endian arch well. The issue arises when the dst and src have different sizes, causing bpf_probe_read_kernel_common() to handle them poorly on big-endian machines. To address this, we need to calculate the offset for copying, as demonstrated by the following bpf_probe_read_kernel_common(&kit->bits_copy + offset, size, unsafe_ptr__ign); One might wonder why this calculation is not incorporated directly into the implementation of bpf_probe_read_kernel_common() ? > > __description("probe read 4 bytes") > __success __retval(0x200) > long probe_read_4(void) > { > int data = 0x200; > long data_dst = 0; > int err; > > err = bpf_probe_read_kernel(&data_dst, 4, &data); > if (err) > return err; > > return data_dst; > } > > SEC("syscall") > __description("probe read 8 bytes") > __success __retval(0x200) > long probe_read_8(void) > { > int data = 0x200; > long data_dst = 0; > int err; > > err = bpf_probe_read_kernel(&data_dst, 8, &data); > if (err) > return err; > > return data_dst; > > } > > > more details in the following link: > > https://github.com/kernel-patches/bpf/pull/6882. It appears that > > bpf_probe_read_kernel() encounters difficulties when dealing with > > non-long-aligned source addresses. > > > > > > > > So getting back to iter implementation. Make sure you are > > > zero-initializing that u64 value you are reading into? > > > > > > > It has been zero-initialized: > > > > + kit->nr_bits = 0; > > + kit->bits_copy = 0; > > > > ok, then the problem is somewhere else, but it doesn't seem to be in > bpf_probe_read_kernel(). I'm forgetting what was the original test > failure for your patch set, but please double check again, taking into > account the big endianness of s390x. > If we aim to make it compatible with s390, we need to introduce some constraints regarding the bits iteration. 1. We must replace nr_bits with size: bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 size) 2. The size must adhere to alignment requirements: if (size <= sizeof(u64)) { int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0; switch (size) { case 1: case 2: case 4: case 8: break; default: return -EINVAL; } err = bpf_probe_read_kernel_common(((char *)&kit->bits_copy) + offset, size, unsafe_ptr__ign); if (err) return -EFAULT; kit->size = size; kit->bit = -1; return 0; } /* Not long-aligned */ if (size & (sizeof(unsigned long) - 1)) return -EINVAL; .... Does this meet your expectations? -- Regards Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-28 13:47 ` Yafang Shao @ 2024-04-29 16:27 ` Andrii Nakryiko 2024-05-01 2:12 ` Yafang Shao 0 siblings, 1 reply; 15+ messages in thread From: Andrii Nakryiko @ 2024-04-29 16:27 UTC (permalink / raw) To: Yafang Shao Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > > > over is (4096 * 8). > > > > > > > > - bpf_iter_bits_next > > > > > > > > Get the next bit in a bpf_iter_bits > > > > > > > > - bpf_iter_bits_destroy > > > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > > > > > Changes: > > > > > > > > - v5->v6: > > > > > > > > - Add positive tests (Andrii) > > > > > > > > - v4->v5: > > > > > > > > - Simplify test cases (Andrii) > > > > > > > > - v3->v4: > > > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > > > - v2->v3: > > > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > > > - v1->v2: > > > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > > > (Eduard) > > > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > > > https://lwn.net/Articles/961104/ > > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > > > > > Yafang Shao (2): > > > > > > > > bpf: Add bits iterator > > > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > > > 3 files changed, 249 insertions(+) > > > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > > > > > -- > > > > > > > > 2.39.1 > > > > > > > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > > > a u32 value because we need to set the higher 32 bits. > > > > > > > will analyze it. > > > > > > > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > > > > > > > Hi Andrii, > > > > > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > > > s390 architecture. The attachment provides a straightforward example > > > > > to illustrate this issue. The observed results are as follows: > > > > > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > > > expected, or is it something that requires fixing? > > > > > > > > > > > > > I might be missing something, but there is nothing wrong with > > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > > > you are reading (upper) 4 bytes of garbage from uninitialized > > > > data_dst. > > > > > > The issue doesn't lie with the dst but rather with the src. Even after > > > initializing the destination, the operation still fails. You can find > > > > Are you sure the operation "fails"? If it would fail, you'd get a > > negative error code, but you are getting zero. Which actually makes > > sense. > > > > I think you are just getting confused by big endianness of s390x, and > > there is nothing wrong with bpf_probe_read_kernel(). > > > > In both of your tests (I pasted your code below, it would be better if > > you did it in your initial emails) you end up with 0x200 in *upper* 32 > > bits (on big endian) and lower bits are zeros. And __retval thing is > > 32-bit (despite BPF program returning long), so this return value is > > truncated to *lower* 32-bits, which are, expectedly, zeroes. > > Thank you for clarifying. The presence of the 32-bit __retval led to > my misunderstanding :( > > > > > So I think everything works as expected, but your tests (at least) > > don't handle the big-endian arch well. > > The issue arises when the dst and src have different sizes, causing > bpf_probe_read_kernel_common() to handle them poorly on big-endian > machines. To address this, we need to calculate the offset for > copying, as demonstrated by the following > > bpf_probe_read_kernel_common(&kit->bits_copy + offset, size, > unsafe_ptr__ign); > > One might wonder why this calculation is not incorporated directly > into the implementation of bpf_probe_read_kernel_common() ? Let's stop talking about bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything wrong or not handling anything wrong. It's not, it's correct. Your code is *using* it wrong, and that's what we'll have to fix. The contract of bpf_probe_read_kernel is in terms of memory addresses of source/destination *bytes* and the *common* size of both source and destination. bpf_probe_read_kernel() cannot know that you are passing a pointer to int or long or whatever, it's just a void * pointer. So if you are writing bytes over long, you need to take care of adjusting offsets to accommodate big-endian. It's a distraction to talk about bpf_probe_read_kernel() here (but it's useful to understand its contract). > > > > > __description("probe read 4 bytes") > > __success __retval(0x200) > > long probe_read_4(void) > > { > > int data = 0x200; > > long data_dst = 0; > > int err; > > > > err = bpf_probe_read_kernel(&data_dst, 4, &data); > > if (err) > > return err; > > > > return data_dst; > > } > > > > SEC("syscall") > > __description("probe read 8 bytes") > > __success __retval(0x200) > > long probe_read_8(void) > > { > > int data = 0x200; > > long data_dst = 0; > > int err; > > > > err = bpf_probe_read_kernel(&data_dst, 8, &data); > > if (err) > > return err; > > > > return data_dst; > > > > } > > > > > more details in the following link: > > > https://github.com/kernel-patches/bpf/pull/6882. It appears that > > > bpf_probe_read_kernel() encounters difficulties when dealing with > > > non-long-aligned source addresses. > > > > > > > > > > > So getting back to iter implementation. Make sure you are > > > > zero-initializing that u64 value you are reading into? > > > > > > > > > > It has been zero-initialized: > > > > > > + kit->nr_bits = 0; > > > + kit->bits_copy = 0; > > > > > > > ok, then the problem is somewhere else, but it doesn't seem to be in > > bpf_probe_read_kernel(). I'm forgetting what was the original test > > failure for your patch set, but please double check again, taking into > > account the big endianness of s390x. > > > > If we aim to make it compatible with s390, we need to introduce some > constraints regarding the bits iteration. > > 1. We must replace nr_bits with size: > > bpf_iter_bits_new(struct bpf_iter_bits *it, const void > *unsafe_ptr__ign, u32 size) > > 2. The size must adhere to alignment requirements: > > if (size <= sizeof(u64)) { > int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0; > > switch (size) { > case 1: > case 2: > case 4: > case 8: > break; > default: > return -EINVAL; > } > > err = bpf_probe_read_kernel_common(((char > *)&kit->bits_copy) + offset, size, unsafe_ptr__ign); > if (err) > return -EFAULT; > > kit->size = size; > kit->bit = -1; > return 0; > } > > /* Not long-aligned */ > if (size & (sizeof(unsigned long) - 1)) > return -EINVAL; > > .... > > Does this meet your expectations? > I don't think you need to add any restrictions or change anything to be byte-sized. You just need to calculate byte offset and do a bit shift to place N bits from the mask into upper N bits of long on big-endian. You might need to do some masking even for little endian as well. Which is why I suggested investing in simple but thorough tests. Write down a few bit mask patterns of variable length (not just multiple-of-8 sizes) and think about the sequence of bits that should be returned. And then codify that. Then check that both little- and big-endian arches work correctly. This is all a bit tricky because kernel's find_next_bit() makes the assumption that bit masks are long-based, while this bits iterator protocol is based on bit sizes, which are not necessarily multiples of 8 bits. So after you copy memory as bytes, you might need to clear (and shift, for big endian) some bits. Use simple but non-symmetrical bit masks to test everything. > -- > Regards > > > > Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-04-29 16:27 ` Andrii Nakryiko @ 2024-05-01 2:12 ` Yafang Shao 2024-05-01 4:14 ` Andrii Nakryiko 0 siblings, 1 reply; 15+ messages in thread From: Yafang Shao @ 2024-05-01 2:12 UTC (permalink / raw) To: Andrii Nakryiko Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf On Tue, Apr 30, 2024 at 12:28 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > > > > over is (4096 * 8). > > > > > > > > > - bpf_iter_bits_next > > > > > > > > > Get the next bit in a bpf_iter_bits > > > > > > > > > - bpf_iter_bits_destroy > > > > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > > > > > > > Changes: > > > > > > > > > - v5->v6: > > > > > > > > > - Add positive tests (Andrii) > > > > > > > > > - v4->v5: > > > > > > > > > - Simplify test cases (Andrii) > > > > > > > > > - v3->v4: > > > > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > > > > - v2->v3: > > > > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > > > > - v1->v2: > > > > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > > > > (Eduard) > > > > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > > > > https://lwn.net/Articles/961104/ > > > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > > > > > > > Yafang Shao (2): > > > > > > > > > bpf: Add bits iterator > > > > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > > > > 3 files changed, 249 insertions(+) > > > > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > > > > > > > -- > > > > > > > > > 2.39.1 > > > > > > > > > > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > > > > a u32 value because we need to set the higher 32 bits. > > > > > > > > will analyze it. > > > > > > > > > > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > > > > > > > > > > Hi Andrii, > > > > > > > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > > > > s390 architecture. The attachment provides a straightforward example > > > > > > to illustrate this issue. The observed results are as follows: > > > > > > > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > > > > expected, or is it something that requires fixing? > > > > > > > > > > > > > > > > I might be missing something, but there is nothing wrong with > > > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > > > > you are reading (upper) 4 bytes of garbage from uninitialized > > > > > data_dst. > > > > > > > > The issue doesn't lie with the dst but rather with the src. Even after > > > > initializing the destination, the operation still fails. You can find > > > > > > Are you sure the operation "fails"? If it would fail, you'd get a > > > negative error code, but you are getting zero. Which actually makes > > > sense. > > > > > > I think you are just getting confused by big endianness of s390x, and > > > there is nothing wrong with bpf_probe_read_kernel(). > > > > > > In both of your tests (I pasted your code below, it would be better if > > > you did it in your initial emails) you end up with 0x200 in *upper* 32 > > > bits (on big endian) and lower bits are zeros. And __retval thing is > > > 32-bit (despite BPF program returning long), so this return value is > > > truncated to *lower* 32-bits, which are, expectedly, zeroes. > > > > Thank you for clarifying. The presence of the 32-bit __retval led to > > my misunderstanding :( > > > > > > > > So I think everything works as expected, but your tests (at least) > > > don't handle the big-endian arch well. > > > > The issue arises when the dst and src have different sizes, causing > > bpf_probe_read_kernel_common() to handle them poorly on big-endian > > machines. To address this, we need to calculate the offset for > > copying, as demonstrated by the following > > > > bpf_probe_read_kernel_common(&kit->bits_copy + offset, size, > > unsafe_ptr__ign); > > > > One might wonder why this calculation is not incorporated directly > > into the implementation of bpf_probe_read_kernel_common() ? > > Let's stop talking about > bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything > wrong or not handling anything wrong. It's not, it's correct. Your > code is *using* it wrong, and that's what we'll have to fix. The > contract of bpf_probe_read_kernel is in terms of memory addresses of > source/destination *bytes* and the *common* size of both source and > destination. bpf_probe_read_kernel() cannot know that you are passing > a pointer to int or long or whatever, it's just a void * pointer. So > if you are writing bytes over long, you need to take care of adjusting > offsets to accommodate big-endian. > > It's a distraction to talk about bpf_probe_read_kernel() here (but > it's useful to understand its contract). > > > > > > > > > __description("probe read 4 bytes") > > > __success __retval(0x200) > > > long probe_read_4(void) > > > { > > > int data = 0x200; > > > long data_dst = 0; > > > int err; > > > > > > err = bpf_probe_read_kernel(&data_dst, 4, &data); > > > if (err) > > > return err; > > > > > > return data_dst; > > > } > > > > > > SEC("syscall") > > > __description("probe read 8 bytes") > > > __success __retval(0x200) > > > long probe_read_8(void) > > > { > > > int data = 0x200; > > > long data_dst = 0; > > > int err; > > > > > > err = bpf_probe_read_kernel(&data_dst, 8, &data); > > > if (err) > > > return err; > > > > > > return data_dst; > > > > > > } > > > > > > > more details in the following link: > > > > https://github.com/kernel-patches/bpf/pull/6882. It appears that > > > > bpf_probe_read_kernel() encounters difficulties when dealing with > > > > non-long-aligned source addresses. > > > > > > > > > > > > > > So getting back to iter implementation. Make sure you are > > > > > zero-initializing that u64 value you are reading into? > > > > > > > > > > > > > It has been zero-initialized: > > > > > > > > + kit->nr_bits = 0; > > > > + kit->bits_copy = 0; > > > > > > > > > > ok, then the problem is somewhere else, but it doesn't seem to be in > > > bpf_probe_read_kernel(). I'm forgetting what was the original test > > > failure for your patch set, but please double check again, taking into > > > account the big endianness of s390x. > > > > > > > If we aim to make it compatible with s390, we need to introduce some > > constraints regarding the bits iteration. > > > > 1. We must replace nr_bits with size: > > > > bpf_iter_bits_new(struct bpf_iter_bits *it, const void > > *unsafe_ptr__ign, u32 size) > > > > 2. The size must adhere to alignment requirements: > > > > if (size <= sizeof(u64)) { > > int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0; > > > > switch (size) { > > case 1: > > case 2: > > case 4: > > case 8: > > break; > > default: > > return -EINVAL; > > } > > > > err = bpf_probe_read_kernel_common(((char > > *)&kit->bits_copy) + offset, size, unsafe_ptr__ign); > > if (err) > > return -EFAULT; > > > > kit->size = size; > > kit->bit = -1; > > return 0; > > } > > > > /* Not long-aligned */ > > if (size & (sizeof(unsigned long) - 1)) > > return -EINVAL; > > > > .... > > > > Does this meet your expectations? > > > > I don't think you need to add any restrictions or change anything to > be byte-sized. You just need to calculate byte offset and do a bit > shift to place N bits from the mask into upper N bits of long on > big-endian. You might need to do some masking even for little endian > as well. > > Which is why I suggested investing in simple but thorough tests. Write > down a few bit mask patterns of variable length (not just > multiple-of-8 sizes) and think about the sequence of bits that should > be returned. And then codify that. > > Then check that both little- and big-endian arches work correctly. > > This is all a bit tricky because kernel's find_next_bit() makes the > assumption that bit masks are long-based, while this bits iterator > protocol is based on bit sizes, which are not necessarily multiples of > 8 bits. So after you copy memory as bytes, you might need to clear > (and shift, for big endian) some bits. Use simple but non-symmetrical > bit masks to test everything. > The reason for using 'size' instead of 'nr_bits' lies in the nature of the pointer 'unsafe_ptr__ign', which is of type void *. Due to this ambiguity in the type, the 'size' parameter serves to indicate the size of the data being accessed. For instance, if the type is u32, then the 'size' parameter corresponds to sizeof(u32) u32 src = 0x100; bpf_for_each(bits, bit, &src, sizeof(u32)); This approach shields the user of the bits iterator from concerns about endianness, simplifying usage. Conversely, if we were to use 'nr_bits', the user would need to account for endianness themselves. In other words, the user would be responsible for calculating the offset of 'src'.For instance, on a big-endian machine, if the 'src' is of type u64 and the user only want to iterate over 32 bits, the code would appear as follows: u64 src = 0x100; bpf_for_each(bits, bit, ((void *)&src) + (sizeof(u64) - (32 + 7) / 8), 32); It may indeed be less user-friendly. However, I can make the adjustment as you suggested, given that it aligns with the pattern observed in bpf_probe_read_kernel_common(). -- Regards Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator 2024-05-01 2:12 ` Yafang Shao @ 2024-05-01 4:14 ` Andrii Nakryiko 0 siblings, 0 replies; 15+ messages in thread From: Andrii Nakryiko @ 2024-05-01 4:14 UTC (permalink / raw) To: Yafang Shao Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf On Tue, Apr 30, 2024 at 7:12 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Tue, Apr 30, 2024 at 12:28 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > > > > > over is (4096 * 8). > > > > > > > > > > - bpf_iter_bits_next > > > > > > > > > > Get the next bit in a bpf_iter_bits > > > > > > > > > > - bpf_iter_bits_destroy > > > > > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > > > > > > > > > Changes: > > > > > > > > > > - v5->v6: > > > > > > > > > > - Add positive tests (Andrii) > > > > > > > > > > - v4->v5: > > > > > > > > > > - Simplify test cases (Andrii) > > > > > > > > > > - v3->v4: > > > > > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > > > > > - v2->v3: > > > > > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > > > > > - v1->v2: > > > > > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > > > > > (Eduard) > > > > > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > > > > > https://lwn.net/Articles/961104/ > > > > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > > > > > > > > > Yafang Shao (2): > > > > > > > > > > bpf: Add bits iterator > > > > > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > > > > > 3 files changed, 249 insertions(+) > > > > > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > 2.39.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > > > > > a u32 value because we need to set the higher 32 bits. > > > > > > > > > will analyze it. > > > > > > > > > > > > > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > > > > > > > > > > > > > Hi Andrii, > > > > > > > > > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > > > > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > > > > > s390 architecture. The attachment provides a straightforward example > > > > > > > to illustrate this issue. The observed results are as follows: > > > > > > > > > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > > > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > > > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > > > > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > > > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > > > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > > > > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > > > > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > > > > > expected, or is it something that requires fixing? > > > > > > > > > > > > > > > > > > > I might be missing something, but there is nothing wrong with > > > > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > > > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > > > > > you are reading (upper) 4 bytes of garbage from uninitialized > > > > > > data_dst. > > > > > > > > > > The issue doesn't lie with the dst but rather with the src. Even after > > > > > initializing the destination, the operation still fails. You can find > > > > > > > > Are you sure the operation "fails"? If it would fail, you'd get a > > > > negative error code, but you are getting zero. Which actually makes > > > > sense. > > > > > > > > I think you are just getting confused by big endianness of s390x, and > > > > there is nothing wrong with bpf_probe_read_kernel(). > > > > > > > > In both of your tests (I pasted your code below, it would be better if > > > > you did it in your initial emails) you end up with 0x200 in *upper* 32 > > > > bits (on big endian) and lower bits are zeros. And __retval thing is > > > > 32-bit (despite BPF program returning long), so this return value is > > > > truncated to *lower* 32-bits, which are, expectedly, zeroes. > > > > > > Thank you for clarifying. The presence of the 32-bit __retval led to > > > my misunderstanding :( > > > > > > > > > > > So I think everything works as expected, but your tests (at least) > > > > don't handle the big-endian arch well. > > > > > > The issue arises when the dst and src have different sizes, causing > > > bpf_probe_read_kernel_common() to handle them poorly on big-endian > > > machines. To address this, we need to calculate the offset for > > > copying, as demonstrated by the following > > > > > > bpf_probe_read_kernel_common(&kit->bits_copy + offset, size, > > > unsafe_ptr__ign); > > > > > > One might wonder why this calculation is not incorporated directly > > > into the implementation of bpf_probe_read_kernel_common() ? > > > > Let's stop talking about > > bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything > > wrong or not handling anything wrong. It's not, it's correct. Your > > code is *using* it wrong, and that's what we'll have to fix. The > > contract of bpf_probe_read_kernel is in terms of memory addresses of > > source/destination *bytes* and the *common* size of both source and > > destination. bpf_probe_read_kernel() cannot know that you are passing > > a pointer to int or long or whatever, it's just a void * pointer. So > > if you are writing bytes over long, you need to take care of adjusting > > offsets to accommodate big-endian. > > > > It's a distraction to talk about bpf_probe_read_kernel() here (but > > it's useful to understand its contract). > > > > > > > > > > > > > __description("probe read 4 bytes") > > > > __success __retval(0x200) > > > > long probe_read_4(void) > > > > { > > > > int data = 0x200; > > > > long data_dst = 0; > > > > int err; > > > > > > > > err = bpf_probe_read_kernel(&data_dst, 4, &data); > > > > if (err) > > > > return err; > > > > > > > > return data_dst; > > > > } > > > > > > > > SEC("syscall") > > > > __description("probe read 8 bytes") > > > > __success __retval(0x200) > > > > long probe_read_8(void) > > > > { > > > > int data = 0x200; > > > > long data_dst = 0; > > > > int err; > > > > > > > > err = bpf_probe_read_kernel(&data_dst, 8, &data); > > > > if (err) > > > > return err; > > > > > > > > return data_dst; > > > > > > > > } > > > > > > > > > more details in the following link: > > > > > https://github.com/kernel-patches/bpf/pull/6882. It appears that > > > > > bpf_probe_read_kernel() encounters difficulties when dealing with > > > > > non-long-aligned source addresses. > > > > > > > > > > > > > > > > > So getting back to iter implementation. Make sure you are > > > > > > zero-initializing that u64 value you are reading into? > > > > > > > > > > > > > > > > It has been zero-initialized: > > > > > > > > > > + kit->nr_bits = 0; > > > > > + kit->bits_copy = 0; > > > > > > > > > > > > > ok, then the problem is somewhere else, but it doesn't seem to be in > > > > bpf_probe_read_kernel(). I'm forgetting what was the original test > > > > failure for your patch set, but please double check again, taking into > > > > account the big endianness of s390x. > > > > > > > > > > If we aim to make it compatible with s390, we need to introduce some > > > constraints regarding the bits iteration. > > > > > > 1. We must replace nr_bits with size: > > > > > > bpf_iter_bits_new(struct bpf_iter_bits *it, const void > > > *unsafe_ptr__ign, u32 size) > > > > > > 2. The size must adhere to alignment requirements: > > > > > > if (size <= sizeof(u64)) { > > > int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0; > > > > > > switch (size) { > > > case 1: > > > case 2: > > > case 4: > > > case 8: > > > break; > > > default: > > > return -EINVAL; > > > } > > > > > > err = bpf_probe_read_kernel_common(((char > > > *)&kit->bits_copy) + offset, size, unsafe_ptr__ign); > > > if (err) > > > return -EFAULT; > > > > > > kit->size = size; > > > kit->bit = -1; > > > return 0; > > > } > > > > > > /* Not long-aligned */ > > > if (size & (sizeof(unsigned long) - 1)) > > > return -EINVAL; > > > > > > .... > > > > > > Does this meet your expectations? > > > > > > > I don't think you need to add any restrictions or change anything to > > be byte-sized. You just need to calculate byte offset and do a bit > > shift to place N bits from the mask into upper N bits of long on > > big-endian. You might need to do some masking even for little endian > > as well. > > > > Which is why I suggested investing in simple but thorough tests. Write > > down a few bit mask patterns of variable length (not just > > multiple-of-8 sizes) and think about the sequence of bits that should > > be returned. And then codify that. > > > > Then check that both little- and big-endian arches work correctly. > > > > This is all a bit tricky because kernel's find_next_bit() makes the > > assumption that bit masks are long-based, while this bits iterator > > protocol is based on bit sizes, which are not necessarily multiples of > > 8 bits. So after you copy memory as bytes, you might need to clear > > (and shift, for big endian) some bits. Use simple but non-symmetrical > > bit masks to test everything. > > > > The reason for using 'size' instead of 'nr_bits' lies in the nature of > the pointer 'unsafe_ptr__ign', which is of type void *. Due to this > ambiguity in the type, the 'size' parameter serves to indicate the > size of the data being accessed. For instance, if the type is u32, > then the 'size' parameter corresponds to sizeof(u32) > > u32 src = 0x100; > bpf_for_each(bits, bit, &src, sizeof(u32)); > > This approach shields the user of the bits iterator from concerns > about endianness, simplifying usage. > > Conversely, if we were to use 'nr_bits', the user would need to > account for endianness themselves. In other words, the user would be Well, not exactly. Only if they try to cleverly use int/long for bitmask instead of sticking to an array of bytes. All this endianness comes into effect only when you are dealing with something else than pure bytes. And if they do, I think it's ok to expect them to deal with endianness correctly. (But also for little endian it will just work even if they don't think much about it). So I'd keep it as nr_bits and define an interface based on `char *` or `void *` to make it clear we are dealing with bits in arbitrary-sized byte arrays. But if someone has strong opinions otherwise, I'd be happy to hear some other arguments, of course. I just think it's confusing to be dealing with *bit*masks but in terms of byte sizes, tbh. And also, consider that in practice most users will be dealing with sizeof(u32) or sizeof(u64) (or larger, but multiple-of-4-or-8-bytes) bit masks. But the implementation has to be correct for all valid inputs and use cases. > responsible for calculating the offset of 'src'.For instance, on a > big-endian machine, if the 'src' is of type u64 and the user only want > to iterate over 32 bits, the code would appear as follows: > > u64 src = 0x100; > bpf_for_each(bits, bit, ((void *)&src) + (sizeof(u64) - (32 + 7) / 8), 32); > > It may indeed be less user-friendly. However, I can make the > adjustment as you suggested, given that it aligns with the pattern > observed in bpf_probe_read_kernel_common(). > > -- > Regards > Yafang ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-05-01 4:14 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-11 13:11 [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao 2024-04-11 13:11 ` [PATCH bpf-next v6 1/2] bpf: Add " Yafang Shao 2024-04-11 13:11 ` [PATCH bpf-next v6 2/2] selftests/bpf: Add selftest for bits iter Yafang Shao 2024-04-11 13:50 ` [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao 2024-04-25 0:34 ` Andrii Nakryiko 2024-04-25 5:36 ` Yafang Shao 2024-04-25 6:05 ` Yonghong Song 2024-04-25 8:03 ` Yafang Shao 2024-04-25 18:15 ` Andrii Nakryiko 2024-04-26 5:04 ` Yafang Shao 2024-04-26 16:50 ` Andrii Nakryiko 2024-04-28 13:47 ` Yafang Shao 2024-04-29 16:27 ` Andrii Nakryiko 2024-05-01 2:12 ` Yafang Shao 2024-05-01 4:14 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox