[PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
@ 2024-04-11 13:11 Yafang Shao
  2024-04-11 13:11 ` [PATCH bpf-next v6 1/2] bpf: Add " Yafang Shao
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Yafang Shao @ 2024-04-11 13:11 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, Yafang Shao

Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
added for the new bpf_iter_bits functionality. These kfuncs enable the
iteration of the bits from a given address and a given number of bits.

- bpf_iter_bits_new
  Initialize a new bits iterator for a given memory area. Due to the
  limitation of bpf memalloc, the max number of bits to be iterated
  over is (4096 * 8).
- bpf_iter_bits_next
  Get the next bit in a bpf_iter_bits
- bpf_iter_bits_destroy
  Destroy a bpf_iter_bits

The bits iterator can be used in any context and on any address.

Changes:
- v5->v6:
  - Add positive tests (Andrii)
- v4->v5:
  - Simplify test cases (Andrii)
- v3->v4:
  - Fix endianness error on s390x (Andrii)
  - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
- v2->v3:
  - Optimization for u64/u32 mask (Andrii)
- v1->v2:
  - Simplify the CPU number verification code to avoid the failure on s390x
    (Eduard)
- bpf: Add bpf_iter_cpumask
  https://lwn.net/Articles/961104/
- bpf: Add new bpf helper bpf_for_each_cpu
  https://lwn.net/Articles/939939/

Yafang Shao (2):
  bpf: Add bits iterator
  selftests/bpf: Add selftest for bits iter

 kernel/bpf/helpers.c                          | 120 +++++++++++++++++
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
 3 files changed, 249 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c

-- 
2.39.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v6 1/2] bpf: Add bits iterator
  2024-04-11 13:11 [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao
@ 2024-04-11 13:11 ` Yafang Shao
  2024-04-11 13:11 ` [PATCH bpf-next v6 2/2] selftests/bpf: Add selftest for bits iter Yafang Shao
  2024-04-11 13:50 ` [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao
  2 siblings, 0 replies; 15+ messages in thread
From: Yafang Shao @ 2024-04-11 13:11 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, Yafang Shao

Add three new kfuncs for the bits iterator:
- bpf_iter_bits_new
  Initialize a new bits iterator for a given memory area. Due to the
  limitation of bpf memalloc, the max number of bits that can be iterated
  over is limited to (4096 * 8).
- bpf_iter_bits_next
  Get the next bit in a bpf_iter_bits
- bpf_iter_bits_destroy
  Destroy a bpf_iter_bits

The bits iterator facilitates the iteration of the bits of a memory area,
such as cpumask. It can be used in any context and on any address.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/helpers.c | 120 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 120 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8cde717137bd..421e42663736 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2549,6 +2549,123 @@ __bpf_kfunc void bpf_throw(u64 cookie)
 	WARN(1, "A call to BPF exception callback should never return\n");
 }
 
+struct bpf_iter_bits {
+	__u64 __opaque[2];
+} __aligned(8);
+
+struct bpf_iter_bits_kern {
+	union {
+		unsigned long *bits;
+		unsigned long bits_copy;
+	};
+	u32 nr_bits;
+	int bit;
+} __aligned(8);
+
+/**
+ * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
+ * @it: The new bpf_iter_bits to be created
+ * @unsafe_ptr__ign: A ponter pointing to a memory area to be iterated over
+ * @nr_bits: The number of bits to be iterated over. Due to the limitation of
+ * memalloc, it can't greater than (4096 * 8).
+ *
+ * This function initializes a new bpf_iter_bits structure for iterating over
+ * a memory area which is specified by the @unsafe_ptr__ign and @nr_bits. It
+ * copy the data of the memory area to the newly created bpf_iter_bits @it for
+ * subsequent iteration operations.
+ *
+ * On success, 0 is returned. On failure, ERR is returned.
+ */
+__bpf_kfunc int
+bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign, u32 nr_bits)
+{
+	u32 size = BITS_TO_LONGS(nr_bits) * sizeof(unsigned long);
+	struct bpf_iter_bits_kern *kit = (void *)it;
+	int err;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) !=
+		     __alignof__(struct bpf_iter_bits));
+
+	if (!unsafe_ptr__ign || !nr_bits) {
+		kit->bits = NULL;
+		return -EINVAL;
+	}
+
+	kit->nr_bits = 0;
+	kit->bits_copy = 0;
+	/* Optimization for u64/u32 mask */
+	if (nr_bits <= 64) {
+		err = bpf_probe_read_kernel_common(&kit->bits_copy, size, unsafe_ptr__ign);
+		if (err)
+			return -EFAULT;
+
+		kit->nr_bits = nr_bits;
+		kit->bit = -1;
+		return 0;
+	}
+
+	/* Fallback to memalloc */
+	kit->bits = bpf_mem_alloc(&bpf_global_ma, size);
+	if (!kit->bits)
+		return -ENOMEM;
+
+	err = bpf_probe_read_kernel_common(kit->bits, size, unsafe_ptr__ign);
+	if (err) {
+		bpf_mem_free(&bpf_global_ma, kit->bits);
+		return err;
+	}
+
+	kit->nr_bits = nr_bits;
+	kit->bit = -1;
+	return 0;
+}
+
+/**
+ * bpf_iter_bits_next() - Get the next bit in a bpf_iter_bits
+ * @it: The bpf_iter_bits to be checked
+ *
+ * This function returns a pointer to a number representing the value of the
+ * next bit in the bits.
+ *
+ * If there are no further bit available, it returns NULL.
+ */
+__bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it)
+{
+	struct bpf_iter_bits_kern *kit = (void *)it;
+	u32 nr_bits = kit->nr_bits;
+	const unsigned long *bits;
+	int bit;
+
+	if (nr_bits == 0)
+		return NULL;
+
+	bits = nr_bits <= 64 ? &kit->bits_copy : kit->bits;
+	bit = find_next_bit(bits, nr_bits, kit->bit + 1);
+	if (bit >= nr_bits) {
+		kit->nr_bits = 0;
+		return NULL;
+	}
+
+	kit->bit = bit;
+	return &kit->bit;
+}
+
+/**
+ * bpf_iter_bits_destroy() - Destroy a bpf_iter_bits
+ * @it: The bpf_iter_bits to be destroyed
+ *
+ * Destroy the resource associated with the bpf_iter_bits.
+ */
+__bpf_kfunc void bpf_iter_bits_destroy(struct bpf_iter_bits *it)
+{
+	struct bpf_iter_bits_kern *kit = (void *)it;
+
+	if (kit->nr_bits <= 64)
+		return;
+	bpf_mem_free(&bpf_global_ma, kit->bits);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(generic_btf_ids)
@@ -2626,6 +2743,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
 BTF_ID_FLAGS(func, bpf_dynptr_size)
 BTF_ID_FLAGS(func, bpf_dynptr_clone)
 BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
+BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next v6 2/2] selftests/bpf: Add selftest for bits iter
  2024-04-11 13:11 [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao
  2024-04-11 13:11 ` [PATCH bpf-next v6 1/2] bpf: Add " Yafang Shao
@ 2024-04-11 13:11 ` Yafang Shao
  2024-04-11 13:50 ` [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao
  2 siblings, 0 replies; 15+ messages in thread
From: Yafang Shao @ 2024-04-11 13:11 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa
  Cc: bpf, Yafang Shao

Add test cases for the bits iter:
- positive case
  - bit mask smaller than 8 bytes
  - a typical case of having 8-byte bit mask
  - another typical case where bit mask is > 8 bytes
  - the index of set bit

- nagative cases
  - bpf_iter_bits_destroy() is required after calling
    bpf_iter_bits_new()
  - bpf_iter_bits_destroy() can only destroy an initialized iter
  - bpf_iter_bits_next() must use an initialized iter

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
 2 files changed, 129 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index c4f9f306646e..7e04ecaaa20a 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -84,6 +84,7 @@
 #include "verifier_xadd.skel.h"
 #include "verifier_xdp.skel.h"
 #include "verifier_xdp_direct_packet_access.skel.h"
+#include "verifier_bits_iter.skel.h"
 
 #define MAX_ENTRIES 11
 
@@ -198,6 +199,7 @@ void test_verifier_var_off(void)              { RUN(verifier_var_off); }
 void test_verifier_xadd(void)                 { RUN(verifier_xadd); }
 void test_verifier_xdp(void)                  { RUN(verifier_xdp); }
 void test_verifier_xdp_direct_packet_access(void) { RUN(verifier_xdp_direct_packet_access); }
+void test_verifier_bits_iter(void) { RUN(verifier_bits_iter); }
 
 static int init_test_val_map(struct bpf_object *obj, char *map_name)
 {
diff --git a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c
new file mode 100644
index 000000000000..2a02540cfd26
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "bpf_misc.h"
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct data_t {
+	u64 a;
+	u32 b;
+};
+
+int bpf_iter_bits_new(struct bpf_iter_bits *it, const void *unsafe_ptr__ign,
+		      u32 nr_bits) __ksym __weak;
+int *bpf_iter_bits_next(struct bpf_iter_bits *it) __ksym __weak;
+void bpf_iter_bits_destroy(struct bpf_iter_bits *it) __ksym __weak;
+
+SEC("iter.s/cgroup")
+__description("bits iter without destroy")
+__failure __msg("Unreleased reference")
+int BPF_PROG(no_destroy, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_bits it;
+	struct task_struct *p;
+
+	p = bpf_task_from_pid(1);
+	if (!p)
+		return 1;
+
+	bpf_iter_bits_new(&it, p->cpus_ptr, 8192);
+
+	bpf_iter_bits_next(&it);
+	bpf_task_release(p);
+	return 0;
+}
+
+SEC("iter/cgroup")
+__description("bits iter with uninitialized iter in ->next()")
+__failure __msg("expected an initialized iter_bits as arg #1")
+int BPF_PROG(next_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_bits *it = NULL;
+
+	bpf_iter_bits_next(it);
+	return 0;
+}
+
+SEC("iter/cgroup")
+__description("bits iter with uninitialized iter in ->destroy()")
+__failure __msg("expected an initialized iter_bits as arg #1")
+int BPF_PROG(destroy_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp)
+{
+	struct bpf_iter_bits it = {};
+
+	bpf_iter_bits_destroy(&it);
+	return 0;
+}
+
+SEC("syscall")
+__description("bits copy 32")
+__success __retval(10)
+int bits_copy32(void)
+{
+	/* 21 bits:             --------------------- */
+	u32 data = 0b11111101111101111100001000100101U;
+	int nr = 0;
+	int *bit;
+
+	bpf_for_each(bits, bit, &data, 21)
+		nr++;
+	return nr;
+}
+
+SEC("syscall")
+__description("bits copy 64")
+__success __retval(18)
+int bits_copy64(void)
+{
+	/* 34 bits:         ~-------- */
+	u64 data = 0xffffefdf0f0f0f0fUL;
+	int nr = 0;
+	int *bit;
+
+	bpf_for_each(bits, bit, &data, 34)
+		nr++;
+	return nr;
+}
+
+SEC("syscall")
+__description("bits memalloc")
+__success __retval(56)
+int bits_memalloc(void)
+{
+	struct data_t data = {
+		.a = 0xaaaaaaaaaaaaaaaaUL, /* 32 bits are set */
+		.b = 0xbbbbbbbbU, /* 24 bits are set */
+	};
+	int nr = 0;
+	int *bit;
+
+	bpf_for_each(bits, bit, &data, 96)
+		nr++;
+	return nr;
+}
+
+SEC("syscall")
+__description("bit index")
+__success __retval(8)
+int bit_index(void)
+{
+	u64 data = 0x100;
+	int bit_idx = 0;
+	int *bit;
+
+	bpf_for_each(bits, bit, &data, 64) {
+		if (*bit == 0)
+			continue;
+		bit_idx = *bit;
+	}
+	return bit_idx;
+}
+
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-11 13:11 [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao
  2024-04-11 13:11 ` [PATCH bpf-next v6 1/2] bpf: Add " Yafang Shao
  2024-04-11 13:11 ` [PATCH bpf-next v6 2/2] selftests/bpf: Add selftest for bits iter Yafang Shao
@ 2024-04-11 13:50 ` Yafang Shao
  2024-04-25  0:34   ` Andrii Nakryiko
  2 siblings, 1 reply; 15+ messages in thread
From: Yafang Shao @ 2024-04-11 13:50 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa
  Cc: bpf

On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> added for the new bpf_iter_bits functionality. These kfuncs enable the
> iteration of the bits from a given address and a given number of bits.
>
> - bpf_iter_bits_new
>   Initialize a new bits iterator for a given memory area. Due to the
>   limitation of bpf memalloc, the max number of bits to be iterated
>   over is (4096 * 8).
> - bpf_iter_bits_next
>   Get the next bit in a bpf_iter_bits
> - bpf_iter_bits_destroy
>   Destroy a bpf_iter_bits
>
> The bits iterator can be used in any context and on any address.
>
> Changes:
> - v5->v6:
>   - Add positive tests (Andrii)
> - v4->v5:
>   - Simplify test cases (Andrii)
> - v3->v4:
>   - Fix endianness error on s390x (Andrii)
>   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> - v2->v3:
>   - Optimization for u64/u32 mask (Andrii)
> - v1->v2:
>   - Simplify the CPU number verification code to avoid the failure on s390x
>     (Eduard)
> - bpf: Add bpf_iter_cpumask
>   https://lwn.net/Articles/961104/
> - bpf: Add new bpf helper bpf_for_each_cpu
>   https://lwn.net/Articles/939939/
>
> Yafang Shao (2):
>   bpf: Add bits iterator
>   selftests/bpf: Add selftest for bits iter
>
>  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
>  .../selftests/bpf/prog_tests/verifier.c       |   2 +
>  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
>  3 files changed, 249 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
>
> --
> 2.39.1
>

It appears that the test case failed on s390x when the data is
a u32 value because we need to set the higher 32 bits.
will analyze it.


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-11 13:50 ` [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao
@ 2024-04-25  0:34   ` Andrii Nakryiko
  2024-04-25  5:36     ` Yafang Shao
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2024-04-25  0:34 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> > added for the new bpf_iter_bits functionality. These kfuncs enable the
> > iteration of the bits from a given address and a given number of bits.
> >
> > - bpf_iter_bits_new
> >   Initialize a new bits iterator for a given memory area. Due to the
> >   limitation of bpf memalloc, the max number of bits to be iterated
> >   over is (4096 * 8).
> > - bpf_iter_bits_next
> >   Get the next bit in a bpf_iter_bits
> > - bpf_iter_bits_destroy
> >   Destroy a bpf_iter_bits
> >
> > The bits iterator can be used in any context and on any address.
> >
> > Changes:
> > - v5->v6:
> >   - Add positive tests (Andrii)
> > - v4->v5:
> >   - Simplify test cases (Andrii)
> > - v3->v4:
> >   - Fix endianness error on s390x (Andrii)
> >   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> > - v2->v3:
> >   - Optimization for u64/u32 mask (Andrii)
> > - v1->v2:
> >   - Simplify the CPU number verification code to avoid the failure on s390x
> >     (Eduard)
> > - bpf: Add bpf_iter_cpumask
> >   https://lwn.net/Articles/961104/
> > - bpf: Add new bpf helper bpf_for_each_cpu
> >   https://lwn.net/Articles/939939/
> >
> > Yafang Shao (2):
> >   bpf: Add bits iterator
> >   selftests/bpf: Add selftest for bits iter
> >
> >  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> >  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> >  3 files changed, 249 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> >
> > --
> > 2.39.1
> >
>
> It appears that the test case failed on s390x when the data is
> a u32 value because we need to set the higher 32 bits.
> will analyze it.
>

Hey Yafang, did you get a chance to debug and fix the issue?

>
> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-25  0:34   ` Andrii Nakryiko
@ 2024-04-25  5:36     ` Yafang Shao
  2024-04-25  6:05       ` Yonghong Song
  2024-04-25 18:15       ` Andrii Nakryiko
  0 siblings, 2 replies; 15+ messages in thread
From: Yafang Shao @ 2024-04-25  5:36 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

[-- Attachment #1: Type: text/plain, Size: 3372 bytes --]

On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> > > added for the new bpf_iter_bits functionality. These kfuncs enable the
> > > iteration of the bits from a given address and a given number of bits.
> > >
> > > - bpf_iter_bits_new
> > >   Initialize a new bits iterator for a given memory area. Due to the
> > >   limitation of bpf memalloc, the max number of bits to be iterated
> > >   over is (4096 * 8).
> > > - bpf_iter_bits_next
> > >   Get the next bit in a bpf_iter_bits
> > > - bpf_iter_bits_destroy
> > >   Destroy a bpf_iter_bits
> > >
> > > The bits iterator can be used in any context and on any address.
> > >
> > > Changes:
> > > - v5->v6:
> > >   - Add positive tests (Andrii)
> > > - v4->v5:
> > >   - Simplify test cases (Andrii)
> > > - v3->v4:
> > >   - Fix endianness error on s390x (Andrii)
> > >   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> > > - v2->v3:
> > >   - Optimization for u64/u32 mask (Andrii)
> > > - v1->v2:
> > >   - Simplify the CPU number verification code to avoid the failure on s390x
> > >     (Eduard)
> > > - bpf: Add bpf_iter_cpumask
> > >   https://lwn.net/Articles/961104/
> > > - bpf: Add new bpf helper bpf_for_each_cpu
> > >   https://lwn.net/Articles/939939/
> > >
> > > Yafang Shao (2):
> > >   bpf: Add bits iterator
> > >   selftests/bpf: Add selftest for bits iter
> > >
> > >  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> > >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> > >  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> > >  3 files changed, 249 insertions(+)
> > >  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> > >
> > > --
> > > 2.39.1
> > >
> >
> > It appears that the test case failed on s390x when the data is
> > a u32 value because we need to set the higher 32 bits.
> > will analyze it.
> >
>
> Hey Yafang, did you get a chance to debug and fix the issue?
>

Hi Andrii,

Apologies for the delay; I recently returned from an extended holiday.

The issue stems from the limitations of bpf_probe_read_kernel() on
s390 architecture. The attachment provides a straightforward example
to illustrate this issue. The observed results are as follows:

    Error: #463/1 verifier_probe_read/probe read 4 bytes
    8897 run_subtest:PASS:obj_open_mem 0 nsec
    8898 run_subtest:PASS:unexpected_load_failure 0 nsec
    8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
    8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512

    Error: #463/2 verifier_probe_read/probe read 8 bytes
    8903 run_subtest:PASS:obj_open_mem 0 nsec
    8904 run_subtest:PASS:unexpected_load_failure 0 nsec
    8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
    8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512

More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872

Should we consider this behavior of bpf_probe_read_kernel() as
expected, or is it something that requires fixing?

-- 
Regards
Yafang

[-- Attachment #2: 0001-selftests-bpf-Add-test-for-probe_read_kernel.patch --]
[-- Type: application/octet-stream, Size: 2438 bytes --]

From ba6bbbbc2648ac3741458a62f336694415d49d52 Mon Sep 17 00:00:00 2001
From: Yafang Shao <laoar.shao@gmail.com>
Date: Thu, 25 Apr 2024 13:07:47 +0800
Subject: [PATCH] selftests/bpf: Add test for probe_read_kernel

It will fail on s390 which is big-endian.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 .../selftests/bpf/prog_tests/verifier.c       |  2 +
 .../selftests/bpf/progs/verifier_probe_read.c | 37 +++++++++++++++++++
 2 files changed, 39 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_probe_read.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index c4f9f306646e..edac67d43447 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -84,6 +84,7 @@
 #include "verifier_xadd.skel.h"
 #include "verifier_xdp.skel.h"
 #include "verifier_xdp_direct_packet_access.skel.h"
+#include "verifier_probe_read.skel.h"
 
 #define MAX_ENTRIES 11
 
@@ -198,6 +199,7 @@ void test_verifier_var_off(void)              { RUN(verifier_var_off); }
 void test_verifier_xadd(void)                 { RUN(verifier_xadd); }
 void test_verifier_xdp(void)                  { RUN(verifier_xdp); }
 void test_verifier_xdp_direct_packet_access(void) { RUN(verifier_xdp_direct_packet_access); }
+void test_verifier_probe_read(void) { RUN(verifier_probe_read); }
 
 static int init_test_val_map(struct bpf_object *obj, char *map_name)
 {
diff --git a/tools/testing/selftests/bpf/progs/verifier_probe_read.c b/tools/testing/selftests/bpf/progs/verifier_probe_read.c
new file mode 100644
index 000000000000..d0a7c9109960
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_probe_read.c
@@ -0,0 +1,37 @@
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+SEC("syscall")
+__description("probe read 4 bytes")
+__success __retval(0x200)
+u64 probe_read_4(void)
+{
+	u32 data = 0x200;
+	u64 data_dst;
+	int err;
+
+	err = bpf_probe_read_kernel(&data_dst, 4, &data);
+	if (err)
+		return 0;
+	return data_dst;
+}
+
+SEC("syscall")
+__description("probe read 8 bytes")
+__success __retval(0x200)
+u64 probe_read_8(void)
+{
+	u32 data = 0x200;
+	u64 data_dst;
+	int err;
+
+	err = bpf_probe_read_kernel(&data_dst, 8, &data);
+	if (err)
+		return 0;
+	return data_dst;
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-25  5:36     ` Yafang Shao
@ 2024-04-25  6:05       ` Yonghong Song
  2024-04-25  8:03         ` Yafang Shao
  2024-04-25 18:15       ` Andrii Nakryiko
  1 sibling, 1 reply; 15+ messages in thread
From: Yonghong Song @ 2024-04-25  6:05 UTC (permalink / raw)
  To: Yafang Shao, Andrii Nakryiko
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	kpsingh, sdf, haoluo, jolsa, bpf


On 4/24/24 10:36 PM, Yafang Shao wrote:
> On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>> On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>>> On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>>>> Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
>>>> added for the new bpf_iter_bits functionality. These kfuncs enable the
>>>> iteration of the bits from a given address and a given number of bits.
>>>>
>>>> - bpf_iter_bits_new
>>>>    Initialize a new bits iterator for a given memory area. Due to the
>>>>    limitation of bpf memalloc, the max number of bits to be iterated
>>>>    over is (4096 * 8).
>>>> - bpf_iter_bits_next
>>>>    Get the next bit in a bpf_iter_bits
>>>> - bpf_iter_bits_destroy
>>>>    Destroy a bpf_iter_bits
>>>>
>>>> The bits iterator can be used in any context and on any address.
>>>>
>>>> Changes:
>>>> - v5->v6:
>>>>    - Add positive tests (Andrii)
>>>> - v4->v5:
>>>>    - Simplify test cases (Andrii)
>>>> - v3->v4:
>>>>    - Fix endianness error on s390x (Andrii)
>>>>    - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
>>>> - v2->v3:
>>>>    - Optimization for u64/u32 mask (Andrii)
>>>> - v1->v2:
>>>>    - Simplify the CPU number verification code to avoid the failure on s390x
>>>>      (Eduard)
>>>> - bpf: Add bpf_iter_cpumask
>>>>    https://lwn.net/Articles/961104/
>>>> - bpf: Add new bpf helper bpf_for_each_cpu
>>>>    https://lwn.net/Articles/939939/
>>>>
>>>> Yafang Shao (2):
>>>>    bpf: Add bits iterator
>>>>    selftests/bpf: Add selftest for bits iter
>>>>
>>>>   kernel/bpf/helpers.c                          | 120 +++++++++++++++++
>>>>   .../selftests/bpf/prog_tests/verifier.c       |   2 +
>>>>   .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
>>>>   3 files changed, 249 insertions(+)
>>>>   create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
>>>>
>>>> --
>>>> 2.39.1
>>>>
>>> It appears that the test case failed on s390x when the data is
>>> a u32 value because we need to set the higher 32 bits.
>>> will analyze it.
>>>
>> Hey Yafang, did you get a chance to debug and fix the issue?
>>
> Hi Andrii,
>
> Apologies for the delay; I recently returned from an extended holiday.
>
> The issue stems from the limitations of bpf_probe_read_kernel() on
> s390 architecture. The attachment provides a straightforward example
> to illustrate this issue. The observed results are as follows:
>
>      Error: #463/1 verifier_probe_read/probe read 4 bytes
>      8897 run_subtest:PASS:obj_open_mem 0 nsec
>      8898 run_subtest:PASS:unexpected_load_failure 0 nsec
>      8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
>      8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512
>
>      Error: #463/2 verifier_probe_read/probe read 8 bytes
>      8903 run_subtest:PASS:obj_open_mem 0 nsec
>      8904 run_subtest:PASS:unexpected_load_failure 0 nsec
>      8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
>      8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512
>
> More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872
>
> Should we consider this behavior of bpf_probe_read_kernel() as
> expected, or is it something that requires fixing?

Maybe to guard the result with macros like __s390x__ to differentiate 
s390 vs. arm64/x86_64? There are some examples in prog_tests/* having 
such guards. probe_user.c:#if defined(__s390x__) 
test_bpf_syscall_macro.c:#if defined(__aarch64__) || defined(__s390__) 
xdp_adjust_tail.c:#if defined(__s390x__) xdp_do_redirect.c:#if 
defined(__s390x__)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-25  6:05       ` Yonghong Song
@ 2024-04-25  8:03         ` Yafang Shao
  0 siblings, 0 replies; 15+ messages in thread
From: Yafang Shao @ 2024-04-25  8:03 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, ast, daniel, john.fastabend, andrii, martin.lau,
	eddyz87, song, kpsingh, sdf, haoluo, jolsa, bpf

On Thu, Apr 25, 2024 at 2:05 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 4/24/24 10:36 PM, Yafang Shao wrote:
> > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> >> On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>> On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>>> Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> >>>> added for the new bpf_iter_bits functionality. These kfuncs enable the
> >>>> iteration of the bits from a given address and a given number of bits.
> >>>>
> >>>> - bpf_iter_bits_new
> >>>>    Initialize a new bits iterator for a given memory area. Due to the
> >>>>    limitation of bpf memalloc, the max number of bits to be iterated
> >>>>    over is (4096 * 8).
> >>>> - bpf_iter_bits_next
> >>>>    Get the next bit in a bpf_iter_bits
> >>>> - bpf_iter_bits_destroy
> >>>>    Destroy a bpf_iter_bits
> >>>>
> >>>> The bits iterator can be used in any context and on any address.
> >>>>
> >>>> Changes:
> >>>> - v5->v6:
> >>>>    - Add positive tests (Andrii)
> >>>> - v4->v5:
> >>>>    - Simplify test cases (Andrii)
> >>>> - v3->v4:
> >>>>    - Fix endianness error on s390x (Andrii)
> >>>>    - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> >>>> - v2->v3:
> >>>>    - Optimization for u64/u32 mask (Andrii)
> >>>> - v1->v2:
> >>>>    - Simplify the CPU number verification code to avoid the failure on s390x
> >>>>      (Eduard)
> >>>> - bpf: Add bpf_iter_cpumask
> >>>>    https://lwn.net/Articles/961104/
> >>>> - bpf: Add new bpf helper bpf_for_each_cpu
> >>>>    https://lwn.net/Articles/939939/
> >>>>
> >>>> Yafang Shao (2):
> >>>>    bpf: Add bits iterator
> >>>>    selftests/bpf: Add selftest for bits iter
> >>>>
> >>>>   kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> >>>>   .../selftests/bpf/prog_tests/verifier.c       |   2 +
> >>>>   .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> >>>>   3 files changed, 249 insertions(+)
> >>>>   create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> >>>>
> >>>> --
> >>>> 2.39.1
> >>>>
> >>> It appears that the test case failed on s390x when the data is
> >>> a u32 value because we need to set the higher 32 bits.
> >>> will analyze it.
> >>>
> >> Hey Yafang, did you get a chance to debug and fix the issue?
> >>
> > Hi Andrii,
> >
> > Apologies for the delay; I recently returned from an extended holiday.
> >
> > The issue stems from the limitations of bpf_probe_read_kernel() on
> > s390 architecture. The attachment provides a straightforward example
> > to illustrate this issue. The observed results are as follows:
> >
> >      Error: #463/1 verifier_probe_read/probe read 4 bytes
> >      8897 run_subtest:PASS:obj_open_mem 0 nsec
> >      8898 run_subtest:PASS:unexpected_load_failure 0 nsec
> >      8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> >      8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512
> >
> >      Error: #463/2 verifier_probe_read/probe read 8 bytes
> >      8903 run_subtest:PASS:obj_open_mem 0 nsec
> >      8904 run_subtest:PASS:unexpected_load_failure 0 nsec
> >      8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> >      8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512
> >
> > More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872
> >
> > Should we consider this behavior of bpf_probe_read_kernel() as
> > expected, or is it something that requires fixing?
>
> Maybe to guard the result with macros like __s390x__ to differentiate
> s390 vs. arm64/x86_64? There are some examples in prog_tests/* having
> such guards. probe_user.c:#if defined(__s390x__)
> test_bpf_syscall_macro.c:#if defined(__aarch64__) || defined(__s390__)
> xdp_adjust_tail.c:#if defined(__s390x__) xdp_do_redirect.c:#if
> defined(__s390x__)
>

That's feasible. Thank you for your suggestion. I'll make the necessary changes.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-25  5:36     ` Yafang Shao
  2024-04-25  6:05       ` Yonghong Song
@ 2024-04-25 18:15       ` Andrii Nakryiko
  2024-04-26  5:04         ` Yafang Shao
  1 sibling, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2024-04-25 18:15 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> > > > added for the new bpf_iter_bits functionality. These kfuncs enable the
> > > > iteration of the bits from a given address and a given number of bits.
> > > >
> > > > - bpf_iter_bits_new
> > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > >   limitation of bpf memalloc, the max number of bits to be iterated
> > > >   over is (4096 * 8).
> > > > - bpf_iter_bits_next
> > > >   Get the next bit in a bpf_iter_bits
> > > > - bpf_iter_bits_destroy
> > > >   Destroy a bpf_iter_bits
> > > >
> > > > The bits iterator can be used in any context and on any address.
> > > >
> > > > Changes:
> > > > - v5->v6:
> > > >   - Add positive tests (Andrii)
> > > > - v4->v5:
> > > >   - Simplify test cases (Andrii)
> > > > - v3->v4:
> > > >   - Fix endianness error on s390x (Andrii)
> > > >   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> > > > - v2->v3:
> > > >   - Optimization for u64/u32 mask (Andrii)
> > > > - v1->v2:
> > > >   - Simplify the CPU number verification code to avoid the failure on s390x
> > > >     (Eduard)
> > > > - bpf: Add bpf_iter_cpumask
> > > >   https://lwn.net/Articles/961104/
> > > > - bpf: Add new bpf helper bpf_for_each_cpu
> > > >   https://lwn.net/Articles/939939/
> > > >
> > > > Yafang Shao (2):
> > > >   bpf: Add bits iterator
> > > >   selftests/bpf: Add selftest for bits iter
> > > >
> > > >  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> > > >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> > > >  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> > > >  3 files changed, 249 insertions(+)
> > > >  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> > > >
> > > > --
> > > > 2.39.1
> > > >
> > >
> > > It appears that the test case failed on s390x when the data is
> > > a u32 value because we need to set the higher 32 bits.
> > > will analyze it.
> > >
> >
> > Hey Yafang, did you get a chance to debug and fix the issue?
> >
>
> Hi Andrii,
>
> Apologies for the delay; I recently returned from an extended holiday.
>
> The issue stems from the limitations of bpf_probe_read_kernel() on
> s390 architecture. The attachment provides a straightforward example
> to illustrate this issue. The observed results are as follows:
>
>     Error: #463/1 verifier_probe_read/probe read 4 bytes
>     8897 run_subtest:PASS:obj_open_mem 0 nsec
>     8898 run_subtest:PASS:unexpected_load_failure 0 nsec
>     8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
>     8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512
>
>     Error: #463/2 verifier_probe_read/probe read 8 bytes
>     8903 run_subtest:PASS:obj_open_mem 0 nsec
>     8904 run_subtest:PASS:unexpected_load_failure 0 nsec
>     8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
>     8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512
>
> More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872
>
> Should we consider this behavior of bpf_probe_read_kernel() as
> expected, or is it something that requires fixing?
>

I might be missing something, but there is nothing wrong with
bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting
only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8"
you are reading (upper) 4 bytes of garbage from uninitialized
data_dst.

So getting back to iter implementation. Make sure you are
zero-initializing that u64 value you are reading into?


> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-25 18:15       ` Andrii Nakryiko
@ 2024-04-26  5:04         ` Yafang Shao
  2024-04-26 16:50           ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Yafang Shao @ 2024-04-26  5:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the
> > > > > iteration of the bits from a given address and a given number of bits.
> > > > >
> > > > > - bpf_iter_bits_new
> > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > >   limitation of bpf memalloc, the max number of bits to be iterated
> > > > >   over is (4096 * 8).
> > > > > - bpf_iter_bits_next
> > > > >   Get the next bit in a bpf_iter_bits
> > > > > - bpf_iter_bits_destroy
> > > > >   Destroy a bpf_iter_bits
> > > > >
> > > > > The bits iterator can be used in any context and on any address.
> > > > >
> > > > > Changes:
> > > > > - v5->v6:
> > > > >   - Add positive tests (Andrii)
> > > > > - v4->v5:
> > > > >   - Simplify test cases (Andrii)
> > > > > - v3->v4:
> > > > >   - Fix endianness error on s390x (Andrii)
> > > > >   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> > > > > - v2->v3:
> > > > >   - Optimization for u64/u32 mask (Andrii)
> > > > > - v1->v2:
> > > > >   - Simplify the CPU number verification code to avoid the failure on s390x
> > > > >     (Eduard)
> > > > > - bpf: Add bpf_iter_cpumask
> > > > >   https://lwn.net/Articles/961104/
> > > > > - bpf: Add new bpf helper bpf_for_each_cpu
> > > > >   https://lwn.net/Articles/939939/
> > > > >
> > > > > Yafang Shao (2):
> > > > >   bpf: Add bits iterator
> > > > >   selftests/bpf: Add selftest for bits iter
> > > > >
> > > > >  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> > > > >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> > > > >  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> > > > >  3 files changed, 249 insertions(+)
> > > > >  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> > > > >
> > > > > --
> > > > > 2.39.1
> > > > >
> > > >
> > > > It appears that the test case failed on s390x when the data is
> > > > a u32 value because we need to set the higher 32 bits.
> > > > will analyze it.
> > > >
> > >
> > > Hey Yafang, did you get a chance to debug and fix the issue?
> > >
> >
> > Hi Andrii,
> >
> > Apologies for the delay; I recently returned from an extended holiday.
> >
> > The issue stems from the limitations of bpf_probe_read_kernel() on
> > s390 architecture. The attachment provides a straightforward example
> > to illustrate this issue. The observed results are as follows:
> >
> >     Error: #463/1 verifier_probe_read/probe read 4 bytes
> >     8897 run_subtest:PASS:obj_open_mem 0 nsec
> >     8898 run_subtest:PASS:unexpected_load_failure 0 nsec
> >     8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> >     8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512
> >
> >     Error: #463/2 verifier_probe_read/probe read 8 bytes
> >     8903 run_subtest:PASS:obj_open_mem 0 nsec
> >     8904 run_subtest:PASS:unexpected_load_failure 0 nsec
> >     8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> >     8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512
> >
> > More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872
> >
> > Should we consider this behavior of bpf_probe_read_kernel() as
> > expected, or is it something that requires fixing?
> >
>
> I might be missing something, but there is nothing wrong with
> bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting
> only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8"
> you are reading (upper) 4 bytes of garbage from uninitialized
> data_dst.

The issue doesn't lie with the dst but rather with the src. Even after
initializing the destination, the operation still fails. You can find
more details in the following link:
https://github.com/kernel-patches/bpf/pull/6882. It appears that
bpf_probe_read_kernel() encounters difficulties when dealing with
non-long-aligned source addresses.

>
> So getting back to iter implementation. Make sure you are
> zero-initializing that u64 value you are reading into?
>

It has been zero-initialized:

+ kit->nr_bits = 0;
+ kit->bits_copy = 0;

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-26  5:04         ` Yafang Shao
@ 2024-04-26 16:50           ` Andrii Nakryiko
  2024-04-28 13:47             ` Yafang Shao
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2024-04-26 16:50 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > >
> > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the
> > > > > > iteration of the bits from a given address and a given number of bits.
> > > > > >
> > > > > > - bpf_iter_bits_new
> > > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > > >   limitation of bpf memalloc, the max number of bits to be iterated
> > > > > >   over is (4096 * 8).
> > > > > > - bpf_iter_bits_next
> > > > > >   Get the next bit in a bpf_iter_bits
> > > > > > - bpf_iter_bits_destroy
> > > > > >   Destroy a bpf_iter_bits
> > > > > >
> > > > > > The bits iterator can be used in any context and on any address.
> > > > > >
> > > > > > Changes:
> > > > > > - v5->v6:
> > > > > >   - Add positive tests (Andrii)
> > > > > > - v4->v5:
> > > > > >   - Simplify test cases (Andrii)
> > > > > > - v3->v4:
> > > > > >   - Fix endianness error on s390x (Andrii)
> > > > > >   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> > > > > > - v2->v3:
> > > > > >   - Optimization for u64/u32 mask (Andrii)
> > > > > > - v1->v2:
> > > > > >   - Simplify the CPU number verification code to avoid the failure on s390x
> > > > > >     (Eduard)
> > > > > > - bpf: Add bpf_iter_cpumask
> > > > > >   https://lwn.net/Articles/961104/
> > > > > > - bpf: Add new bpf helper bpf_for_each_cpu
> > > > > >   https://lwn.net/Articles/939939/
> > > > > >
> > > > > > Yafang Shao (2):
> > > > > >   bpf: Add bits iterator
> > > > > >   selftests/bpf: Add selftest for bits iter
> > > > > >
> > > > > >  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> > > > > >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> > > > > >  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> > > > > >  3 files changed, 249 insertions(+)
> > > > > >  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> > > > > >
> > > > > > --
> > > > > > 2.39.1
> > > > > >
> > > > >
> > > > > It appears that the test case failed on s390x when the data is
> > > > > a u32 value because we need to set the higher 32 bits.
> > > > > will analyze it.
> > > > >
> > > >
> > > > Hey Yafang, did you get a chance to debug and fix the issue?
> > > >
> > >
> > > Hi Andrii,
> > >
> > > Apologies for the delay; I recently returned from an extended holiday.
> > >
> > > The issue stems from the limitations of bpf_probe_read_kernel() on
> > > s390 architecture. The attachment provides a straightforward example
> > > to illustrate this issue. The observed results are as follows:
> > >
> > >     Error: #463/1 verifier_probe_read/probe read 4 bytes
> > >     8897 run_subtest:PASS:obj_open_mem 0 nsec
> > >     8898 run_subtest:PASS:unexpected_load_failure 0 nsec
> > >     8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > >     8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512
> > >
> > >     Error: #463/2 verifier_probe_read/probe read 8 bytes
> > >     8903 run_subtest:PASS:obj_open_mem 0 nsec
> > >     8904 run_subtest:PASS:unexpected_load_failure 0 nsec
> > >     8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > >     8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512
> > >
> > > More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872
> > >
> > > Should we consider this behavior of bpf_probe_read_kernel() as
> > > expected, or is it something that requires fixing?
> > >
> >
> > I might be missing something, but there is nothing wrong with
> > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting
> > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8"
> > you are reading (upper) 4 bytes of garbage from uninitialized
> > data_dst.
>
> The issue doesn't lie with the dst but rather with the src. Even after
> initializing the destination, the operation still fails. You can find

Are you sure the operation "fails"? If it would fail, you'd get a
negative error code, but you are getting zero. Which actually makes
sense.

I think you are just getting confused by big endianness of s390x, and
there is nothing wrong with bpf_probe_read_kernel().

In both of your tests (I pasted your code below, it would be better if
you did it in your initial emails) you end up with 0x200 in *upper* 32
bits (on big endian) and lower bits are zeros. And __retval thing is
32-bit (despite BPF program returning long), so this return value is
truncated to *lower* 32-bits, which are, expectedly, zeroes.

So I think everything works as expected, but your tests (at least)
don't handle the big-endian arch well.

__description("probe read 4 bytes")
__success __retval(0x200)
long probe_read_4(void)
{
    int data = 0x200;
    long data_dst = 0;
    int err;

    err = bpf_probe_read_kernel(&data_dst, 4, &data);
    if (err)
        return err;

    return data_dst;
}

SEC("syscall")
__description("probe read 8 bytes")
__success __retval(0x200)
long probe_read_8(void)
{
    int data = 0x200;
    long data_dst = 0;
    int err;

    err = bpf_probe_read_kernel(&data_dst, 8, &data);
    if (err)
        return err;

    return data_dst;

}

> more details in the following link:
> https://github.com/kernel-patches/bpf/pull/6882. It appears that
> bpf_probe_read_kernel() encounters difficulties when dealing with
> non-long-aligned source addresses.
>
> >
> > So getting back to iter implementation. Make sure you are
> > zero-initializing that u64 value you are reading into?
> >
>
> It has been zero-initialized:
>
> + kit->nr_bits = 0;
> + kit->bits_copy = 0;
>

ok, then the problem is somewhere else, but it doesn't seem to be in
bpf_probe_read_kernel(). I'm forgetting what was the original test
failure for your patch set, but please double check again, taking into
account the big endianness of s390x.

> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-26 16:50           ` Andrii Nakryiko
@ 2024-04-28 13:47             ` Yafang Shao
  2024-04-29 16:27               ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Yafang Shao @ 2024-04-28 13:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > >
> > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the
> > > > > > > iteration of the bits from a given address and a given number of bits.
> > > > > > >
> > > > > > > - bpf_iter_bits_new
> > > > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > > > >   limitation of bpf memalloc, the max number of bits to be iterated
> > > > > > >   over is (4096 * 8).
> > > > > > > - bpf_iter_bits_next
> > > > > > >   Get the next bit in a bpf_iter_bits
> > > > > > > - bpf_iter_bits_destroy
> > > > > > >   Destroy a bpf_iter_bits
> > > > > > >
> > > > > > > The bits iterator can be used in any context and on any address.
> > > > > > >
> > > > > > > Changes:
> > > > > > > - v5->v6:
> > > > > > >   - Add positive tests (Andrii)
> > > > > > > - v4->v5:
> > > > > > >   - Simplify test cases (Andrii)
> > > > > > > - v3->v4:
> > > > > > >   - Fix endianness error on s390x (Andrii)
> > > > > > >   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> > > > > > > - v2->v3:
> > > > > > >   - Optimization for u64/u32 mask (Andrii)
> > > > > > > - v1->v2:
> > > > > > >   - Simplify the CPU number verification code to avoid the failure on s390x
> > > > > > >     (Eduard)
> > > > > > > - bpf: Add bpf_iter_cpumask
> > > > > > >   https://lwn.net/Articles/961104/
> > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu
> > > > > > >   https://lwn.net/Articles/939939/
> > > > > > >
> > > > > > > Yafang Shao (2):
> > > > > > >   bpf: Add bits iterator
> > > > > > >   selftests/bpf: Add selftest for bits iter
> > > > > > >
> > > > > > >  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> > > > > > >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> > > > > > >  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> > > > > > >  3 files changed, 249 insertions(+)
> > > > > > >  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> > > > > > >
> > > > > > > --
> > > > > > > 2.39.1
> > > > > > >
> > > > > >
> > > > > > It appears that the test case failed on s390x when the data is
> > > > > > a u32 value because we need to set the higher 32 bits.
> > > > > > will analyze it.
> > > > > >
> > > > >
> > > > > Hey Yafang, did you get a chance to debug and fix the issue?
> > > > >
> > > >
> > > > Hi Andrii,
> > > >
> > > > Apologies for the delay; I recently returned from an extended holiday.
> > > >
> > > > The issue stems from the limitations of bpf_probe_read_kernel() on
> > > > s390 architecture. The attachment provides a straightforward example
> > > > to illustrate this issue. The observed results are as follows:
> > > >
> > > >     Error: #463/1 verifier_probe_read/probe read 4 bytes
> > > >     8897 run_subtest:PASS:obj_open_mem 0 nsec
> > > >     8898 run_subtest:PASS:unexpected_load_failure 0 nsec
> > > >     8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > > >     8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512
> > > >
> > > >     Error: #463/2 verifier_probe_read/probe read 8 bytes
> > > >     8903 run_subtest:PASS:obj_open_mem 0 nsec
> > > >     8904 run_subtest:PASS:unexpected_load_failure 0 nsec
> > > >     8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > > >     8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512
> > > >
> > > > More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872
> > > >
> > > > Should we consider this behavior of bpf_probe_read_kernel() as
> > > > expected, or is it something that requires fixing?
> > > >
> > >
> > > I might be missing something, but there is nothing wrong with
> > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting
> > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8"
> > > you are reading (upper) 4 bytes of garbage from uninitialized
> > > data_dst.
> >
> > The issue doesn't lie with the dst but rather with the src. Even after
> > initializing the destination, the operation still fails. You can find
>
> Are you sure the operation "fails"? If it would fail, you'd get a
> negative error code, but you are getting zero. Which actually makes
> sense.
>
> I think you are just getting confused by big endianness of s390x, and
> there is nothing wrong with bpf_probe_read_kernel().
>
> In both of your tests (I pasted your code below, it would be better if
> you did it in your initial emails) you end up with 0x200 in *upper* 32
> bits (on big endian) and lower bits are zeros. And __retval thing is
> 32-bit (despite BPF program returning long), so this return value is
> truncated to *lower* 32-bits, which are, expectedly, zeroes.

Thank you for clarifying. The presence of the 32-bit __retval led to
my misunderstanding :(

>
> So I think everything works as expected, but your tests (at least)
> don't handle the big-endian arch well.

The issue arises when the dst and src have different sizes, causing
bpf_probe_read_kernel_common() to handle them poorly on big-endian
machines. To address this, we need to calculate the offset for
copying, as demonstrated by the following

   bpf_probe_read_kernel_common(&kit->bits_copy + offset, size,
unsafe_ptr__ign);

One might wonder why this calculation is not incorporated directly
into the implementation of bpf_probe_read_kernel_common() ?

>
> __description("probe read 4 bytes")
> __success __retval(0x200)
> long probe_read_4(void)
> {
>     int data = 0x200;
>     long data_dst = 0;
>     int err;
>
>     err = bpf_probe_read_kernel(&data_dst, 4, &data);
>     if (err)
>         return err;
>
>     return data_dst;
> }
>
> SEC("syscall")
> __description("probe read 8 bytes")
> __success __retval(0x200)
> long probe_read_8(void)
> {
>     int data = 0x200;
>     long data_dst = 0;
>     int err;
>
>     err = bpf_probe_read_kernel(&data_dst, 8, &data);
>     if (err)
>         return err;
>
>     return data_dst;
>
> }
>
> > more details in the following link:
> > https://github.com/kernel-patches/bpf/pull/6882. It appears that
> > bpf_probe_read_kernel() encounters difficulties when dealing with
> > non-long-aligned source addresses.
> >
> > >
> > > So getting back to iter implementation. Make sure you are
> > > zero-initializing that u64 value you are reading into?
> > >
> >
> > It has been zero-initialized:
> >
> > + kit->nr_bits = 0;
> > + kit->bits_copy = 0;
> >
>
> ok, then the problem is somewhere else, but it doesn't seem to be in
> bpf_probe_read_kernel(). I'm forgetting what was the original test
> failure for your patch set, but please double check again, taking into
> account the big endianness of s390x.
>

If we aim to make it compatible with s390, we need to introduce some
constraints regarding the bits iteration.

1. We must replace nr_bits with size:

  bpf_iter_bits_new(struct bpf_iter_bits *it, const void
*unsafe_ptr__ign, u32 size)

2. The size must adhere to alignment requirements:

        if (size <= sizeof(u64)) {
                int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0;

                switch (size) {
                case 1:
                case 2:
                case 4:
                case 8:
                        break;
                default:
                        return -EINVAL;
                }

                err = bpf_probe_read_kernel_common(((char
*)&kit->bits_copy) + offset, size, unsafe_ptr__ign);
                if (err)
                        return -EFAULT;

                kit->size = size;
                kit->bit = -1;
                return 0;
        }

        /* Not long-aligned */
        if (size & (sizeof(unsigned long) - 1))
                return -EINVAL;

        ....

Does this meet your expectations?

--
Regards



Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-28 13:47             ` Yafang Shao
@ 2024-04-29 16:27               ` Andrii Nakryiko
  2024-05-01  2:12                 ` Yafang Shao
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2024-04-29 16:27 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the
> > > > > > > > iteration of the bits from a given address and a given number of bits.
> > > > > > > >
> > > > > > > > - bpf_iter_bits_new
> > > > > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > > > > >   limitation of bpf memalloc, the max number of bits to be iterated
> > > > > > > >   over is (4096 * 8).
> > > > > > > > - bpf_iter_bits_next
> > > > > > > >   Get the next bit in a bpf_iter_bits
> > > > > > > > - bpf_iter_bits_destroy
> > > > > > > >   Destroy a bpf_iter_bits
> > > > > > > >
> > > > > > > > The bits iterator can be used in any context and on any address.
> > > > > > > >
> > > > > > > > Changes:
> > > > > > > > - v5->v6:
> > > > > > > >   - Add positive tests (Andrii)
> > > > > > > > - v4->v5:
> > > > > > > >   - Simplify test cases (Andrii)
> > > > > > > > - v3->v4:
> > > > > > > >   - Fix endianness error on s390x (Andrii)
> > > > > > > >   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> > > > > > > > - v2->v3:
> > > > > > > >   - Optimization for u64/u32 mask (Andrii)
> > > > > > > > - v1->v2:
> > > > > > > >   - Simplify the CPU number verification code to avoid the failure on s390x
> > > > > > > >     (Eduard)
> > > > > > > > - bpf: Add bpf_iter_cpumask
> > > > > > > >   https://lwn.net/Articles/961104/
> > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu
> > > > > > > >   https://lwn.net/Articles/939939/
> > > > > > > >
> > > > > > > > Yafang Shao (2):
> > > > > > > >   bpf: Add bits iterator
> > > > > > > >   selftests/bpf: Add selftest for bits iter
> > > > > > > >
> > > > > > > >  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> > > > > > > >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> > > > > > > >  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> > > > > > > >  3 files changed, 249 insertions(+)
> > > > > > > >  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.39.1
> > > > > > > >
> > > > > > >
> > > > > > > It appears that the test case failed on s390x when the data is
> > > > > > > a u32 value because we need to set the higher 32 bits.
> > > > > > > will analyze it.
> > > > > > >
> > > > > >
> > > > > > Hey Yafang, did you get a chance to debug and fix the issue?
> > > > > >
> > > > >
> > > > > Hi Andrii,
> > > > >
> > > > > Apologies for the delay; I recently returned from an extended holiday.
> > > > >
> > > > > The issue stems from the limitations of bpf_probe_read_kernel() on
> > > > > s390 architecture. The attachment provides a straightforward example
> > > > > to illustrate this issue. The observed results are as follows:
> > > > >
> > > > >     Error: #463/1 verifier_probe_read/probe read 4 bytes
> > > > >     8897 run_subtest:PASS:obj_open_mem 0 nsec
> > > > >     8898 run_subtest:PASS:unexpected_load_failure 0 nsec
> > > > >     8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > > > >     8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512
> > > > >
> > > > >     Error: #463/2 verifier_probe_read/probe read 8 bytes
> > > > >     8903 run_subtest:PASS:obj_open_mem 0 nsec
> > > > >     8904 run_subtest:PASS:unexpected_load_failure 0 nsec
> > > > >     8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > > > >     8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512
> > > > >
> > > > > More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872
> > > > >
> > > > > Should we consider this behavior of bpf_probe_read_kernel() as
> > > > > expected, or is it something that requires fixing?
> > > > >
> > > >
> > > > I might be missing something, but there is nothing wrong with
> > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting
> > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8"
> > > > you are reading (upper) 4 bytes of garbage from uninitialized
> > > > data_dst.
> > >
> > > The issue doesn't lie with the dst but rather with the src. Even after
> > > initializing the destination, the operation still fails. You can find
> >
> > Are you sure the operation "fails"? If it would fail, you'd get a
> > negative error code, but you are getting zero. Which actually makes
> > sense.
> >
> > I think you are just getting confused by big endianness of s390x, and
> > there is nothing wrong with bpf_probe_read_kernel().
> >
> > In both of your tests (I pasted your code below, it would be better if
> > you did it in your initial emails) you end up with 0x200 in *upper* 32
> > bits (on big endian) and lower bits are zeros. And __retval thing is
> > 32-bit (despite BPF program returning long), so this return value is
> > truncated to *lower* 32-bits, which are, expectedly, zeroes.
>
> Thank you for clarifying. The presence of the 32-bit __retval led to
> my misunderstanding :(
>
> >
> > So I think everything works as expected, but your tests (at least)
> > don't handle the big-endian arch well.
>
> The issue arises when the dst and src have different sizes, causing
> bpf_probe_read_kernel_common() to handle them poorly on big-endian
> machines. To address this, we need to calculate the offset for
> copying, as demonstrated by the following
>
>    bpf_probe_read_kernel_common(&kit->bits_copy + offset, size,
> unsafe_ptr__ign);
>
> One might wonder why this calculation is not incorporated directly
> into the implementation of bpf_probe_read_kernel_common() ?

Let's stop talking about
bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything
wrong or not handling anything wrong. It's not, it's correct. Your
code is *using* it wrong, and that's what we'll have to fix. The
contract of bpf_probe_read_kernel is in terms of memory addresses of
source/destination *bytes* and the *common* size of both source and
destination. bpf_probe_read_kernel() cannot know that you are passing
a pointer to int or long or whatever, it's just a void * pointer. So
if you are writing bytes over long, you need to take care of adjusting
offsets to accommodate big-endian.

It's a distraction to talk about bpf_probe_read_kernel() here (but
it's useful to understand its contract).

>
> >
> > __description("probe read 4 bytes")
> > __success __retval(0x200)
> > long probe_read_4(void)
> > {
> >     int data = 0x200;
> >     long data_dst = 0;
> >     int err;
> >
> >     err = bpf_probe_read_kernel(&data_dst, 4, &data);
> >     if (err)
> >         return err;
> >
> >     return data_dst;
> > }
> >
> > SEC("syscall")
> > __description("probe read 8 bytes")
> > __success __retval(0x200)
> > long probe_read_8(void)
> > {
> >     int data = 0x200;
> >     long data_dst = 0;
> >     int err;
> >
> >     err = bpf_probe_read_kernel(&data_dst, 8, &data);
> >     if (err)
> >         return err;
> >
> >     return data_dst;
> >
> > }
> >
> > > more details in the following link:
> > > https://github.com/kernel-patches/bpf/pull/6882. It appears that
> > > bpf_probe_read_kernel() encounters difficulties when dealing with
> > > non-long-aligned source addresses.
> > >
> > > >
> > > > So getting back to iter implementation. Make sure you are
> > > > zero-initializing that u64 value you are reading into?
> > > >
> > >
> > > It has been zero-initialized:
> > >
> > > + kit->nr_bits = 0;
> > > + kit->bits_copy = 0;
> > >
> >
> > ok, then the problem is somewhere else, but it doesn't seem to be in
> > bpf_probe_read_kernel(). I'm forgetting what was the original test
> > failure for your patch set, but please double check again, taking into
> > account the big endianness of s390x.
> >
>
> If we aim to make it compatible with s390, we need to introduce some
> constraints regarding the bits iteration.
>
> 1. We must replace nr_bits with size:
>
>   bpf_iter_bits_new(struct bpf_iter_bits *it, const void
> *unsafe_ptr__ign, u32 size)
>
> 2. The size must adhere to alignment requirements:
>
>         if (size <= sizeof(u64)) {
>                 int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0;
>
>                 switch (size) {
>                 case 1:
>                 case 2:
>                 case 4:
>                 case 8:
>                         break;
>                 default:
>                         return -EINVAL;
>                 }
>
>                 err = bpf_probe_read_kernel_common(((char
> *)&kit->bits_copy) + offset, size, unsafe_ptr__ign);
>                 if (err)
>                         return -EFAULT;
>
>                 kit->size = size;
>                 kit->bit = -1;
>                 return 0;
>         }
>
>         /* Not long-aligned */
>         if (size & (sizeof(unsigned long) - 1))
>                 return -EINVAL;
>
>         ....
>
> Does this meet your expectations?
>

I don't think you need to add any restrictions or change anything to
be byte-sized. You just need to calculate byte offset and do a bit
shift to place N bits from the mask into upper N bits of long on
big-endian. You might need to do some masking even for little endian
as well.

Which is why I suggested investing in simple but thorough tests. Write
down a few bit mask patterns of variable length (not just
multiple-of-8 sizes) and think about the sequence of bits that should
be returned. And then codify that.

Then check that both little- and big-endian arches work correctly.

This is all a bit tricky because kernel's find_next_bit() makes the
assumption that bit masks are long-based, while this bits iterator
protocol is based on bit sizes, which are not necessarily multiples of
8 bits. So after you copy memory as bytes, you might need to clear
(and shift, for big endian) some bits. Use simple but non-symmetrical
bit masks to test everything.

> --
> Regards
>
>
>
> Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-04-29 16:27               ` Andrii Nakryiko
@ 2024-05-01  2:12                 ` Yafang Shao
  2024-05-01  4:14                   ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Yafang Shao @ 2024-05-01  2:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Tue, Apr 30, 2024 at 12:28 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
> > > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> > > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the
> > > > > > > > > iteration of the bits from a given address and a given number of bits.
> > > > > > > > >
> > > > > > > > > - bpf_iter_bits_new
> > > > > > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > > > > > >   limitation of bpf memalloc, the max number of bits to be iterated
> > > > > > > > >   over is (4096 * 8).
> > > > > > > > > - bpf_iter_bits_next
> > > > > > > > >   Get the next bit in a bpf_iter_bits
> > > > > > > > > - bpf_iter_bits_destroy
> > > > > > > > >   Destroy a bpf_iter_bits
> > > > > > > > >
> > > > > > > > > The bits iterator can be used in any context and on any address.
> > > > > > > > >
> > > > > > > > > Changes:
> > > > > > > > > - v5->v6:
> > > > > > > > >   - Add positive tests (Andrii)
> > > > > > > > > - v4->v5:
> > > > > > > > >   - Simplify test cases (Andrii)
> > > > > > > > > - v3->v4:
> > > > > > > > >   - Fix endianness error on s390x (Andrii)
> > > > > > > > >   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> > > > > > > > > - v2->v3:
> > > > > > > > >   - Optimization for u64/u32 mask (Andrii)
> > > > > > > > > - v1->v2:
> > > > > > > > >   - Simplify the CPU number verification code to avoid the failure on s390x
> > > > > > > > >     (Eduard)
> > > > > > > > > - bpf: Add bpf_iter_cpumask
> > > > > > > > >   https://lwn.net/Articles/961104/
> > > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu
> > > > > > > > >   https://lwn.net/Articles/939939/
> > > > > > > > >
> > > > > > > > > Yafang Shao (2):
> > > > > > > > >   bpf: Add bits iterator
> > > > > > > > >   selftests/bpf: Add selftest for bits iter
> > > > > > > > >
> > > > > > > > >  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> > > > > > > > >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> > > > > > > > >  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> > > > > > > > >  3 files changed, 249 insertions(+)
> > > > > > > > >  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.39.1
> > > > > > > > >
> > > > > > > >
> > > > > > > > It appears that the test case failed on s390x when the data is
> > > > > > > > a u32 value because we need to set the higher 32 bits.
> > > > > > > > will analyze it.
> > > > > > > >
> > > > > > >
> > > > > > > Hey Yafang, did you get a chance to debug and fix the issue?
> > > > > > >
> > > > > >
> > > > > > Hi Andrii,
> > > > > >
> > > > > > Apologies for the delay; I recently returned from an extended holiday.
> > > > > >
> > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on
> > > > > > s390 architecture. The attachment provides a straightforward example
> > > > > > to illustrate this issue. The observed results are as follows:
> > > > > >
> > > > > >     Error: #463/1 verifier_probe_read/probe read 4 bytes
> > > > > >     8897 run_subtest:PASS:obj_open_mem 0 nsec
> > > > > >     8898 run_subtest:PASS:unexpected_load_failure 0 nsec
> > > > > >     8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > > > > >     8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512
> > > > > >
> > > > > >     Error: #463/2 verifier_probe_read/probe read 8 bytes
> > > > > >     8903 run_subtest:PASS:obj_open_mem 0 nsec
> > > > > >     8904 run_subtest:PASS:unexpected_load_failure 0 nsec
> > > > > >     8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > > > > >     8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512
> > > > > >
> > > > > > More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872
> > > > > >
> > > > > > Should we consider this behavior of bpf_probe_read_kernel() as
> > > > > > expected, or is it something that requires fixing?
> > > > > >
> > > > >
> > > > > I might be missing something, but there is nothing wrong with
> > > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting
> > > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8"
> > > > > you are reading (upper) 4 bytes of garbage from uninitialized
> > > > > data_dst.
> > > >
> > > > The issue doesn't lie with the dst but rather with the src. Even after
> > > > initializing the destination, the operation still fails. You can find
> > >
> > > Are you sure the operation "fails"? If it would fail, you'd get a
> > > negative error code, but you are getting zero. Which actually makes
> > > sense.
> > >
> > > I think you are just getting confused by big endianness of s390x, and
> > > there is nothing wrong with bpf_probe_read_kernel().
> > >
> > > In both of your tests (I pasted your code below, it would be better if
> > > you did it in your initial emails) you end up with 0x200 in *upper* 32
> > > bits (on big endian) and lower bits are zeros. And __retval thing is
> > > 32-bit (despite BPF program returning long), so this return value is
> > > truncated to *lower* 32-bits, which are, expectedly, zeroes.
> >
> > Thank you for clarifying. The presence of the 32-bit __retval led to
> > my misunderstanding :(
> >
> > >
> > > So I think everything works as expected, but your tests (at least)
> > > don't handle the big-endian arch well.
> >
> > The issue arises when the dst and src have different sizes, causing
> > bpf_probe_read_kernel_common() to handle them poorly on big-endian
> > machines. To address this, we need to calculate the offset for
> > copying, as demonstrated by the following
> >
> >    bpf_probe_read_kernel_common(&kit->bits_copy + offset, size,
> > unsafe_ptr__ign);
> >
> > One might wonder why this calculation is not incorporated directly
> > into the implementation of bpf_probe_read_kernel_common() ?
>
> Let's stop talking about
> bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything
> wrong or not handling anything wrong. It's not, it's correct. Your
> code is *using* it wrong, and that's what we'll have to fix. The
> contract of bpf_probe_read_kernel is in terms of memory addresses of
> source/destination *bytes* and the *common* size of both source and
> destination. bpf_probe_read_kernel() cannot know that you are passing
> a pointer to int or long or whatever, it's just a void * pointer. So
> if you are writing bytes over long, you need to take care of adjusting
> offsets to accommodate big-endian.
>
> It's a distraction to talk about bpf_probe_read_kernel() here (but
> it's useful to understand its contract).
>
> >
> > >
> > > __description("probe read 4 bytes")
> > > __success __retval(0x200)
> > > long probe_read_4(void)
> > > {
> > >     int data = 0x200;
> > >     long data_dst = 0;
> > >     int err;
> > >
> > >     err = bpf_probe_read_kernel(&data_dst, 4, &data);
> > >     if (err)
> > >         return err;
> > >
> > >     return data_dst;
> > > }
> > >
> > > SEC("syscall")
> > > __description("probe read 8 bytes")
> > > __success __retval(0x200)
> > > long probe_read_8(void)
> > > {
> > >     int data = 0x200;
> > >     long data_dst = 0;
> > >     int err;
> > >
> > >     err = bpf_probe_read_kernel(&data_dst, 8, &data);
> > >     if (err)
> > >         return err;
> > >
> > >     return data_dst;
> > >
> > > }
> > >
> > > > more details in the following link:
> > > > https://github.com/kernel-patches/bpf/pull/6882. It appears that
> > > > bpf_probe_read_kernel() encounters difficulties when dealing with
> > > > non-long-aligned source addresses.
> > > >
> > > > >
> > > > > So getting back to iter implementation. Make sure you are
> > > > > zero-initializing that u64 value you are reading into?
> > > > >
> > > >
> > > > It has been zero-initialized:
> > > >
> > > > + kit->nr_bits = 0;
> > > > + kit->bits_copy = 0;
> > > >
> > >
> > > ok, then the problem is somewhere else, but it doesn't seem to be in
> > > bpf_probe_read_kernel(). I'm forgetting what was the original test
> > > failure for your patch set, but please double check again, taking into
> > > account the big endianness of s390x.
> > >
> >
> > If we aim to make it compatible with s390, we need to introduce some
> > constraints regarding the bits iteration.
> >
> > 1. We must replace nr_bits with size:
> >
> >   bpf_iter_bits_new(struct bpf_iter_bits *it, const void
> > *unsafe_ptr__ign, u32 size)
> >
> > 2. The size must adhere to alignment requirements:
> >
> >         if (size <= sizeof(u64)) {
> >                 int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0;
> >
> >                 switch (size) {
> >                 case 1:
> >                 case 2:
> >                 case 4:
> >                 case 8:
> >                         break;
> >                 default:
> >                         return -EINVAL;
> >                 }
> >
> >                 err = bpf_probe_read_kernel_common(((char
> > *)&kit->bits_copy) + offset, size, unsafe_ptr__ign);
> >                 if (err)
> >                         return -EFAULT;
> >
> >                 kit->size = size;
> >                 kit->bit = -1;
> >                 return 0;
> >         }
> >
> >         /* Not long-aligned */
> >         if (size & (sizeof(unsigned long) - 1))
> >                 return -EINVAL;
> >
> >         ....
> >
> > Does this meet your expectations?
> >
>
> I don't think you need to add any restrictions or change anything to
> be byte-sized. You just need to calculate byte offset and do a bit
> shift to place N bits from the mask into upper N bits of long on
> big-endian. You might need to do some masking even for little endian
> as well.
>
> Which is why I suggested investing in simple but thorough tests. Write
> down a few bit mask patterns of variable length (not just
> multiple-of-8 sizes) and think about the sequence of bits that should
> be returned. And then codify that.
>
> Then check that both little- and big-endian arches work correctly.
>
> This is all a bit tricky because kernel's find_next_bit() makes the
> assumption that bit masks are long-based, while this bits iterator
> protocol is based on bit sizes, which are not necessarily multiples of
> 8 bits. So after you copy memory as bytes, you might need to clear
> (and shift, for big endian) some bits. Use simple but non-symmetrical
> bit masks to test everything.
>

The reason for using 'size' instead of 'nr_bits' lies in the nature of
the pointer 'unsafe_ptr__ign', which is of type void *. Due to this
ambiguity in the type, the 'size' parameter serves to indicate the
size of the data being accessed. For instance, if the type is u32,
then the 'size' parameter corresponds to sizeof(u32)

    u32 src = 0x100;
    bpf_for_each(bits, bit, &src, sizeof(u32));

This approach shields the user of the bits iterator from concerns
about endianness, simplifying usage.

Conversely, if we were to use 'nr_bits', the user would need to
account for endianness themselves. In other words, the user would be
responsible for calculating the offset of 'src'.For instance, on a
big-endian machine, if the 'src' is of type u64 and the user only want
to iterate over 32 bits, the code would appear as follows:

    u64 src = 0x100;
    bpf_for_each(bits, bit, ((void *)&src) + (sizeof(u64) - (32 + 7) / 8), 32);

It may indeed be less user-friendly. However, I can make the
adjustment as you suggested, given that it aligns with the pattern
observed in bpf_probe_read_kernel_common().

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator
  2024-05-01  2:12                 ` Yafang Shao
@ 2024-05-01  4:14                   ` Andrii Nakryiko
  0 siblings, 0 replies; 15+ messages in thread
From: Andrii Nakryiko @ 2024-05-01  4:14 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, bpf

On Tue, Apr 30, 2024 at 7:12 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Apr 30, 2024 at 12:28 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko
> > > > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been
> > > > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the
> > > > > > > > > > iteration of the bits from a given address and a given number of bits.
> > > > > > > > > >
> > > > > > > > > > - bpf_iter_bits_new
> > > > > > > > > >   Initialize a new bits iterator for a given memory area. Due to the
> > > > > > > > > >   limitation of bpf memalloc, the max number of bits to be iterated
> > > > > > > > > >   over is (4096 * 8).
> > > > > > > > > > - bpf_iter_bits_next
> > > > > > > > > >   Get the next bit in a bpf_iter_bits
> > > > > > > > > > - bpf_iter_bits_destroy
> > > > > > > > > >   Destroy a bpf_iter_bits
> > > > > > > > > >
> > > > > > > > > > The bits iterator can be used in any context and on any address.
> > > > > > > > > >
> > > > > > > > > > Changes:
> > > > > > > > > > - v5->v6:
> > > > > > > > > >   - Add positive tests (Andrii)
> > > > > > > > > > - v4->v5:
> > > > > > > > > >   - Simplify test cases (Andrii)
> > > > > > > > > > - v3->v4:
> > > > > > > > > >   - Fix endianness error on s390x (Andrii)
> > > > > > > > > >   - zero-initialize kit->bits_copy and zero out nr_bits (Andrii)
> > > > > > > > > > - v2->v3:
> > > > > > > > > >   - Optimization for u64/u32 mask (Andrii)
> > > > > > > > > > - v1->v2:
> > > > > > > > > >   - Simplify the CPU number verification code to avoid the failure on s390x
> > > > > > > > > >     (Eduard)
> > > > > > > > > > - bpf: Add bpf_iter_cpumask
> > > > > > > > > >   https://lwn.net/Articles/961104/
> > > > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu
> > > > > > > > > >   https://lwn.net/Articles/939939/
> > > > > > > > > >
> > > > > > > > > > Yafang Shao (2):
> > > > > > > > > >   bpf: Add bits iterator
> > > > > > > > > >   selftests/bpf: Add selftest for bits iter
> > > > > > > > > >
> > > > > > > > > >  kernel/bpf/helpers.c                          | 120 +++++++++++++++++
> > > > > > > > > >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> > > > > > > > > >  .../selftests/bpf/progs/verifier_bits_iter.c  | 127 ++++++++++++++++++
> > > > > > > > > >  3 files changed, 249 insertions(+)
> > > > > > > > > >  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.39.1
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > It appears that the test case failed on s390x when the data is
> > > > > > > > > a u32 value because we need to set the higher 32 bits.
> > > > > > > > > will analyze it.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue?
> > > > > > > >
> > > > > > >
> > > > > > > Hi Andrii,
> > > > > > >
> > > > > > > Apologies for the delay; I recently returned from an extended holiday.
> > > > > > >
> > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on
> > > > > > > s390 architecture. The attachment provides a straightforward example
> > > > > > > to illustrate this issue. The observed results are as follows:
> > > > > > >
> > > > > > >     Error: #463/1 verifier_probe_read/probe read 4 bytes
> > > > > > >     8897 run_subtest:PASS:obj_open_mem 0 nsec
> > > > > > >     8898 run_subtest:PASS:unexpected_load_failure 0 nsec
> > > > > > >     8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > > > > > >     8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512
> > > > > > >
> > > > > > >     Error: #463/2 verifier_probe_read/probe read 8 bytes
> > > > > > >     8903 run_subtest:PASS:obj_open_mem 0 nsec
> > > > > > >     8904 run_subtest:PASS:unexpected_load_failure 0 nsec
> > > > > > >     8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec
> > > > > > >     8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512
> > > > > > >
> > > > > > > More details can be found at:  https://github.com/kernel-patches/bpf/pull/6872
> > > > > > >
> > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as
> > > > > > > expected, or is it something that requires fixing?
> > > > > > >
> > > > > >
> > > > > > I might be missing something, but there is nothing wrong with
> > > > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting
> > > > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8"
> > > > > > you are reading (upper) 4 bytes of garbage from uninitialized
> > > > > > data_dst.
> > > > >
> > > > > The issue doesn't lie with the dst but rather with the src. Even after
> > > > > initializing the destination, the operation still fails. You can find
> > > >
> > > > Are you sure the operation "fails"? If it would fail, you'd get a
> > > > negative error code, but you are getting zero. Which actually makes
> > > > sense.
> > > >
> > > > I think you are just getting confused by big endianness of s390x, and
> > > > there is nothing wrong with bpf_probe_read_kernel().
> > > >
> > > > In both of your tests (I pasted your code below, it would be better if
> > > > you did it in your initial emails) you end up with 0x200 in *upper* 32
> > > > bits (on big endian) and lower bits are zeros. And __retval thing is
> > > > 32-bit (despite BPF program returning long), so this return value is
> > > > truncated to *lower* 32-bits, which are, expectedly, zeroes.
> > >
> > > Thank you for clarifying. The presence of the 32-bit __retval led to
> > > my misunderstanding :(
> > >
> > > >
> > > > So I think everything works as expected, but your tests (at least)
> > > > don't handle the big-endian arch well.
> > >
> > > The issue arises when the dst and src have different sizes, causing
> > > bpf_probe_read_kernel_common() to handle them poorly on big-endian
> > > machines. To address this, we need to calculate the offset for
> > > copying, as demonstrated by the following
> > >
> > >    bpf_probe_read_kernel_common(&kit->bits_copy + offset, size,
> > > unsafe_ptr__ign);
> > >
> > > One might wonder why this calculation is not incorporated directly
> > > into the implementation of bpf_probe_read_kernel_common() ?
> >
> > Let's stop talking about
> > bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything
> > wrong or not handling anything wrong. It's not, it's correct. Your
> > code is *using* it wrong, and that's what we'll have to fix. The
> > contract of bpf_probe_read_kernel is in terms of memory addresses of
> > source/destination *bytes* and the *common* size of both source and
> > destination. bpf_probe_read_kernel() cannot know that you are passing
> > a pointer to int or long or whatever, it's just a void * pointer. So
> > if you are writing bytes over long, you need to take care of adjusting
> > offsets to accommodate big-endian.
> >
> > It's a distraction to talk about bpf_probe_read_kernel() here (but
> > it's useful to understand its contract).
> >
> > >
> > > >
> > > > __description("probe read 4 bytes")
> > > > __success __retval(0x200)
> > > > long probe_read_4(void)
> > > > {
> > > >     int data = 0x200;
> > > >     long data_dst = 0;
> > > >     int err;
> > > >
> > > >     err = bpf_probe_read_kernel(&data_dst, 4, &data);
> > > >     if (err)
> > > >         return err;
> > > >
> > > >     return data_dst;
> > > > }
> > > >
> > > > SEC("syscall")
> > > > __description("probe read 8 bytes")
> > > > __success __retval(0x200)
> > > > long probe_read_8(void)
> > > > {
> > > >     int data = 0x200;
> > > >     long data_dst = 0;
> > > >     int err;
> > > >
> > > >     err = bpf_probe_read_kernel(&data_dst, 8, &data);
> > > >     if (err)
> > > >         return err;
> > > >
> > > >     return data_dst;
> > > >
> > > > }
> > > >
> > > > > more details in the following link:
> > > > > https://github.com/kernel-patches/bpf/pull/6882. It appears that
> > > > > bpf_probe_read_kernel() encounters difficulties when dealing with
> > > > > non-long-aligned source addresses.
> > > > >
> > > > > >
> > > > > > So getting back to iter implementation. Make sure you are
> > > > > > zero-initializing that u64 value you are reading into?
> > > > > >
> > > > >
> > > > > It has been zero-initialized:
> > > > >
> > > > > + kit->nr_bits = 0;
> > > > > + kit->bits_copy = 0;
> > > > >
> > > >
> > > > ok, then the problem is somewhere else, but it doesn't seem to be in
> > > > bpf_probe_read_kernel(). I'm forgetting what was the original test
> > > > failure for your patch set, but please double check again, taking into
> > > > account the big endianness of s390x.
> > > >
> > >
> > > If we aim to make it compatible with s390, we need to introduce some
> > > constraints regarding the bits iteration.
> > >
> > > 1. We must replace nr_bits with size:
> > >
> > >   bpf_iter_bits_new(struct bpf_iter_bits *it, const void
> > > *unsafe_ptr__ign, u32 size)
> > >
> > > 2. The size must adhere to alignment requirements:
> > >
> > >         if (size <= sizeof(u64)) {
> > >                 int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0;
> > >
> > >                 switch (size) {
> > >                 case 1:
> > >                 case 2:
> > >                 case 4:
> > >                 case 8:
> > >                         break;
> > >                 default:
> > >                         return -EINVAL;
> > >                 }
> > >
> > >                 err = bpf_probe_read_kernel_common(((char
> > > *)&kit->bits_copy) + offset, size, unsafe_ptr__ign);
> > >                 if (err)
> > >                         return -EFAULT;
> > >
> > >                 kit->size = size;
> > >                 kit->bit = -1;
> > >                 return 0;
> > >         }
> > >
> > >         /* Not long-aligned */
> > >         if (size & (sizeof(unsigned long) - 1))
> > >                 return -EINVAL;
> > >
> > >         ....
> > >
> > > Does this meet your expectations?
> > >
> >
> > I don't think you need to add any restrictions or change anything to
> > be byte-sized. You just need to calculate byte offset and do a bit
> > shift to place N bits from the mask into upper N bits of long on
> > big-endian. You might need to do some masking even for little endian
> > as well.
> >
> > Which is why I suggested investing in simple but thorough tests. Write
> > down a few bit mask patterns of variable length (not just
> > multiple-of-8 sizes) and think about the sequence of bits that should
> > be returned. And then codify that.
> >
> > Then check that both little- and big-endian arches work correctly.
> >
> > This is all a bit tricky because kernel's find_next_bit() makes the
> > assumption that bit masks are long-based, while this bits iterator
> > protocol is based on bit sizes, which are not necessarily multiples of
> > 8 bits. So after you copy memory as bytes, you might need to clear
> > (and shift, for big endian) some bits. Use simple but non-symmetrical
> > bit masks to test everything.
> >
>
> The reason for using 'size' instead of 'nr_bits' lies in the nature of
> the pointer 'unsafe_ptr__ign', which is of type void *. Due to this
> ambiguity in the type, the 'size' parameter serves to indicate the
> size of the data being accessed. For instance, if the type is u32,
> then the 'size' parameter corresponds to sizeof(u32)
>
>     u32 src = 0x100;
>     bpf_for_each(bits, bit, &src, sizeof(u32));
>
> This approach shields the user of the bits iterator from concerns
> about endianness, simplifying usage.
>
> Conversely, if we were to use 'nr_bits', the user would need to
> account for endianness themselves. In other words, the user would be

Well, not exactly. Only if they try to cleverly use int/long for
bitmask instead of sticking to an array of bytes. All this endianness
comes into effect only when you are dealing with something else than
pure bytes. And if they do, I think it's ok to expect them to deal
with endianness correctly. (But also for little endian it will just
work even if they don't think much about it).

So I'd keep it as nr_bits and define an interface based on `char *` or
`void *` to make it clear we are dealing with bits in arbitrary-sized
byte arrays.

But if someone has strong opinions otherwise, I'd be happy to hear
some other arguments, of course. I just think it's confusing to be
dealing with *bit*masks but in terms of byte sizes, tbh.

And also, consider that in practice most users will be dealing with
sizeof(u32) or sizeof(u64) (or larger, but multiple-of-4-or-8-bytes)
bit masks. But the implementation has to be correct for all valid
inputs and use cases.

> responsible for calculating the offset of 'src'.For instance, on a
> big-endian machine, if the 'src' is of type u64 and the user only want
> to iterate over 32 bits, the code would appear as follows:
>
>     u64 src = 0x100;
>     bpf_for_each(bits, bit, ((void *)&src) + (sizeof(u64) - (32 + 7) / 8), 32);
>
> It may indeed be less user-friendly. However, I can make the
> adjustment as you suggested, given that it aligns with the pattern
> observed in bpf_probe_read_kernel_common().
>
> --
> Regards
> Yafang

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-01  4:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-11 13:11 [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao
2024-04-11 13:11 ` [PATCH bpf-next v6 1/2] bpf: Add " Yafang Shao
2024-04-11 13:11 ` [PATCH bpf-next v6 2/2] selftests/bpf: Add selftest for bits iter Yafang Shao
2024-04-11 13:50 ` [PATCH bpf-next v6 0/2] bpf: Add a generic bits iterator Yafang Shao
2024-04-25  0:34   ` Andrii Nakryiko
2024-04-25  5:36     ` Yafang Shao
2024-04-25  6:05       ` Yonghong Song
2024-04-25  8:03         ` Yafang Shao
2024-04-25 18:15       ` Andrii Nakryiko
2024-04-26  5:04         ` Yafang Shao
2024-04-26 16:50           ` Andrii Nakryiko
2024-04-28 13:47             ` Yafang Shao
2024-04-29 16:27               ` Andrii Nakryiko
2024-05-01  2:12                 ` Yafang Shao
2024-05-01  4:14                   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox