[PATCH bpf v2 0/7] Misc fixes for bpf

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf v2 0/7] Misc fixes for bpf
@ 2024-10-21  1:39 Hou Tao
  2024-10-21  1:39 ` [PATCH bpf v2 1/7] bpf: Add the missing BPF_LINK_TYPE invocation for sockmap Hou Tao
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Hou Tao @ 2024-10-21  1:39 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

From: Hou Tao <houtao1@huawei.com>

Hi,

The patch set is just a bundle of fixes. Patch #1 fixes the out-of-bound
for sockmap link fdinfo. Patch #2 adds a static assertion to guarantee
such out-of-bound access will never happen again. Patch #3 fixes the
kmemleak report when parsing the mount options for bpffs. Patch #4~#7
fix issues related to bits iterator.

Please check the individual patches for more details. And comments are
always welcome.

v2:
 * patch #1: update tools/include/uapi/linux/bpf.h as well (Alexei)
 * patch #2: new patch. Add a static assertion for bpf_link_type_strs[] (Andrii)
 * patch #4: doesn't set nr_bits to zero in bpf_iter_bits_next (Andrii)
 * patch #5: use 512 as the maximal value of nr_words
 * patch #6: change the type of both bits and bits_copy to u64 (Andrii)

v1: https://lore.kernel.org/bpf/20241008091718.3797027-1-houtao@huaweicloud.com/

Hou Tao (7):
  bpf: Add the missing BPF_LINK_TYPE invocation for sockmap
  bpf: Add assertion for the size of bpf_link_type_strs[]
  bpf: Preserve param->string when parsing mount options
  bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
  bpf: Check the validity of nr_words in bpf_iter_bits_new()
  bpf: Use __u64 to save the bits in bits iterator
  selftests/bpf: Test multiplication overflow of nr_bits in bits_iter

 include/linux/bpf_types.h                     |  7 +--
 include/uapi/linux/bpf.h                      |  3 ++
 kernel/bpf/helpers.c                          | 45 +++++++++++++++----
 kernel/bpf/inode.c                            |  5 ++-
 kernel/bpf/syscall.c                          |  2 +
 tools/include/uapi/linux/bpf.h                |  3 ++
 .../selftests/bpf/progs/verifier_bits_iter.c  | 14 ++++++
 7 files changed, 63 insertions(+), 16 deletions(-)

-- 
2.29.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH bpf v2 1/7] bpf: Add the missing BPF_LINK_TYPE invocation for sockmap
  2024-10-21  1:39 [PATCH bpf v2 0/7] Misc fixes for bpf Hou Tao
@ 2024-10-21  1:39 ` Hou Tao
  2024-10-21  1:39 ` [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[] Hou Tao
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Hou Tao @ 2024-10-21  1:39 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

From: Hou Tao <houtao1@huawei.com>

There is an out-of-bounds read in bpf_link_show_fdinfo() for the sockmap
link fd. Fix it by adding the missing BPF_LINK_TYPE invocation for
sockmap link

Also add comments for bpf_link_type to prevent missing updates in the
future.

Fixes: 699c23f02c65 ("bpf: Add bpf_link support for sk_msg and sk_skb progs")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 include/linux/bpf_types.h      | 1 +
 include/uapi/linux/bpf.h       | 3 +++
 tools/include/uapi/linux/bpf.h | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 9f2a6b83b49e..fa78f49d4a9a 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -146,6 +146,7 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
 BPF_LINK_TYPE(BPF_LINK_TYPE_NETFILTER, netfilter)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TCX, tcx)
 BPF_LINK_TYPE(BPF_LINK_TYPE_NETKIT, netkit)
+BPF_LINK_TYPE(BPF_LINK_TYPE_SOCKMAP, sockmap)
 #endif
 #ifdef CONFIG_PERF_EVENTS
 BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e8241b320c6d..4a939c90dc2e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1121,6 +1121,9 @@ enum bpf_attach_type {
 
 #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
 
+/* Add BPF_LINK_TYPE(type, name) in bpf_types.h to keep bpf_link_type_strs[]
+ * in sync with the definitions below.
+ */
 enum bpf_link_type {
 	BPF_LINK_TYPE_UNSPEC = 0,
 	BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e8241b320c6d..4a939c90dc2e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1121,6 +1121,9 @@ enum bpf_attach_type {
 
 #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
 
+/* Add BPF_LINK_TYPE(type, name) in bpf_types.h to keep bpf_link_type_strs[]
+ * in sync with the definitions below.
+ */
 enum bpf_link_type {
 	BPF_LINK_TYPE_UNSPEC = 0,
 	BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[]
  2024-10-21  1:39 [PATCH bpf v2 0/7] Misc fixes for bpf Hou Tao
  2024-10-21  1:39 ` [PATCH bpf v2 1/7] bpf: Add the missing BPF_LINK_TYPE invocation for sockmap Hou Tao
@ 2024-10-21  1:39 ` Hou Tao
  2024-10-21  8:18   ` Jiri Olsa
  2024-10-21  1:40 ` [PATCH bpf v2 3/7] bpf: Preserve param->string when parsing mount options Hou Tao
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Hou Tao @ 2024-10-21  1:39 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

From: Hou Tao <houtao1@huawei.com>

If a corresponding link type doesn't invoke BPF_LINK_TYPE(), accessing
bpf_link_type_strs[link->type] may result in out-of-bound access.

To prevent such missed invocations in the future, the following static
assertion seems feasible:

  BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE)

However, this doesn't work well. The reason is that the invocation of
BPF_LINK_TYPE() for one link type is optional due to its CONFIG_XXX
dependency and the elements in bpf_link_type_strs[] will be sparse. For
example, if CONFIG_NET is disabled, the size of bpf_link_type_strs will
be BPF_LINK_TYPE_UPROBE_MULTI + 1.

Therefore, in addition to the static assertion, remove all CONFIG_XXX
conditions for the invocation of BPF_LINK_TYPE(). If these CONFIG_XXX
conditions become necessary later, the fix may need to be revised (e.g.,
to check the validity of link_type in bpf_link_show_fdinfo()).

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 include/linux/bpf_types.h | 6 ------
 kernel/bpf/syscall.c      | 2 ++
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fa78f49d4a9a..6b7eabe9a115 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -136,21 +136,15 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
 
 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
-#ifdef CONFIG_CGROUP_BPF
 BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
-#endif
 BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
-#ifdef CONFIG_NET
 BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
 BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
 BPF_LINK_TYPE(BPF_LINK_TYPE_NETFILTER, netfilter)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TCX, tcx)
 BPF_LINK_TYPE(BPF_LINK_TYPE_NETKIT, netkit)
 BPF_LINK_TYPE(BPF_LINK_TYPE_SOCKMAP, sockmap)
-#endif
-#ifdef CONFIG_PERF_EVENTS
 BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
-#endif
 BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
 BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
 BPF_LINK_TYPE(BPF_LINK_TYPE_UPROBE_MULTI, uprobe_multi)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8cfa7183d2ef..9f335c379b05 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3071,6 +3071,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 	const struct bpf_prog *prog = link->prog;
 	char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
 
+	BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE);
+
 	seq_printf(m,
 		   "link_type:\t%s\n"
 		   "link_id:\t%u\n",
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH bpf v2 3/7] bpf: Preserve param->string when parsing mount options
  2024-10-21  1:39 [PATCH bpf v2 0/7] Misc fixes for bpf Hou Tao
  2024-10-21  1:39 ` [PATCH bpf v2 1/7] bpf: Add the missing BPF_LINK_TYPE invocation for sockmap Hou Tao
  2024-10-21  1:39 ` [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[] Hou Tao
@ 2024-10-21  1:40 ` Hou Tao
  2024-10-21  9:09   ` Jiri Olsa
  2024-10-21  1:40 ` [PATCH bpf v2 4/7] bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() Hou Tao
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Hou Tao @ 2024-10-21  1:40 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

From: Hou Tao <houtao1@huawei.com>

In bpf_parse_param(), keep the value of param->string intact so it can
be freed later. Otherwise, the kmalloc area pointed to by param->string
will be leaked as shown below:

unreferenced object 0xffff888118c46d20 (size 8):
  comm "new_name", pid 12109, jiffies 4295580214
  hex dump (first 8 bytes):
    61 6e 79 00 38 c9 5c 7e                          any.8.\~
  backtrace (crc e1b7f876):
    [<00000000c6848ac7>] kmemleak_alloc+0x4b/0x80
    [<00000000de9f7d00>] __kmalloc_node_track_caller_noprof+0x36e/0x4a0
    [<000000003e29b886>] memdup_user+0x32/0xa0
    [<0000000007248326>] strndup_user+0x46/0x60
    [<0000000035b3dd29>] __x64_sys_fsconfig+0x368/0x3d0
    [<0000000018657927>] x64_sys_call+0xff/0x9f0
    [<00000000c0cabc95>] do_syscall_64+0x3b/0xc0
    [<000000002f331597>] entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: 6c1752e0b6ca ("bpf: Support symbolic BPF FS delegation mount options")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/inode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index d8fc5eba529d..9aaf5124648b 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -880,7 +880,7 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 		const struct btf_type *enum_t;
 		const char *enum_pfx;
 		u64 *delegate_msk, msk = 0;
-		char *p;
+		char *p, *str;
 		int val;
 
 		/* ignore errors, fallback to hex */
@@ -911,7 +911,8 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 			return -EINVAL;
 		}
 
-		while ((p = strsep(&param->string, ":"))) {
+		str = param->string;
+		while ((p = strsep(&str, ":"))) {
 			if (strcmp(p, "any") == 0) {
 				msk |= ~0ULL;
 			} else if (find_btf_enum_const(info.btf, enum_t, enum_pfx, p, &val)) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH bpf v2 4/7] bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
  2024-10-21  1:39 [PATCH bpf v2 0/7] Misc fixes for bpf Hou Tao
                   ` (2 preceding siblings ...)
  2024-10-21  1:40 ` [PATCH bpf v2 3/7] bpf: Preserve param->string when parsing mount options Hou Tao
@ 2024-10-21  1:40 ` Hou Tao
  2024-10-21  2:45   ` Hou Tao
  2024-10-21  1:40 ` [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new() Hou Tao
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Hou Tao @ 2024-10-21  1:40 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

From: Hou Tao <houtao1@huawei.com>

bpf_iter_bits_destroy() uses "kit->nr_bits <= 64" to check whether the
bits are dynamically allocated. However, the check is incorrect and may
cause a kmemleak as shown below:

unreferenced object 0xffff88812628c8c0 (size 32):
  comm "swapper/0", pid 1, jiffies 4294727320
  hex dump (first 32 bytes):
	b0 c1 55 f5 81 88 ff ff f0 f0 f0 f0 f0 f0 f0 f0  ..U.............
	f0 f0 f0 f0 f0 f0 f0 f0 00 00 00 00 00 00 00 00  ................
  backtrace (crc 781e32cc):
	[<00000000c452b4ab>] kmemleak_alloc+0x4b/0x80
	[<0000000004e09f80>] __kmalloc_node_noprof+0x480/0x5c0
	[<00000000597124d6>] __alloc.isra.0+0x89/0xb0
	[<000000004ebfffcd>] alloc_bulk+0x2af/0x720
	[<00000000d9c10145>] prefill_mem_cache+0x7f/0xb0
	[<00000000ff9738ff>] bpf_mem_alloc_init+0x3e2/0x610
	[<000000008b616eac>] bpf_global_ma_init+0x19/0x30
	[<00000000fc473efc>] do_one_initcall+0xd3/0x3c0
	[<00000000ec81498c>] kernel_init_freeable+0x66a/0x940
	[<00000000b119f72f>] kernel_init+0x20/0x160
	[<00000000f11ac9a7>] ret_from_fork+0x3c/0x70
	[<0000000004671da4>] ret_from_fork_asm+0x1a/0x30

That is because nr_bits will be set as zero in bpf_iter_bits_next()
after all bits have been iterated.

Fix the problem by not setting nr_bits to zero in bpf_iter_bits_next().
Instead, use "bits >= nr_bits" to indicate when iteration is completed
and still use "nr_bits > 64" to indicate when bits are dynamically
allocated.

Fixes: 4665415975b0 ("bpf: Add bits iterator")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/helpers.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1a43d06eab28..62349e206a29 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2888,7 +2888,7 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
 
 	kit->nr_bits = 0;
 	kit->bits_copy = 0;
-	kit->bit = -1;
+	kit->bit = 0;
 
 	if (!unsafe_ptr__ign || !nr_words)
 		return -EINVAL;
@@ -2934,15 +2934,13 @@ __bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it)
 	const unsigned long *bits;
 	int bit;
 
-	if (nr_bits == 0)
+	if (kit->bit >= nr_bits)
 		return NULL;
 
 	bits = nr_bits == 64 ? &kit->bits_copy : kit->bits;
 	bit = find_next_bit(bits, nr_bits, kit->bit + 1);
-	if (bit >= nr_bits) {
-		kit->nr_bits = 0;
+	if (bit >= nr_bits)
 		return NULL;
-	}
 
 	kit->bit = bit;
 	return &kit->bit;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new()
  2024-10-21  1:39 [PATCH bpf v2 0/7] Misc fixes for bpf Hou Tao
                   ` (3 preceding siblings ...)
  2024-10-21  1:40 ` [PATCH bpf v2 4/7] bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() Hou Tao
@ 2024-10-21  1:40 ` Hou Tao
  2024-10-21  9:51   ` Jiri Olsa
                     ` (2 more replies)
  2024-10-21  1:40 ` [PATCH bpf v2 6/7] bpf: Use __u64 to save the bits in bits iterator Hou Tao
                   ` (2 subsequent siblings)
  7 siblings, 3 replies; 28+ messages in thread
From: Hou Tao @ 2024-10-21  1:40 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

From: Hou Tao <houtao1@huawei.com>

Check the validity of nr_words in bpf_iter_bits_new(). Without this
check, when multiplication overflow occurs for nr_bits (e.g., when
nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).

Fix it by limiting the max value of nr_words to 512.

Fixes: 4665415975b0 ("bpf: Add bits iterator")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/helpers.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 62349e206a29..c147f75e1b48 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2851,6 +2851,8 @@ struct bpf_iter_bits {
 	__u64 __opaque[2];
 } __aligned(8);
 
+#define BITS_ITER_NR_WORDS_MAX 512
+
 struct bpf_iter_bits_kern {
 	union {
 		unsigned long *bits;
@@ -2892,6 +2894,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
 
 	if (!unsafe_ptr__ign || !nr_words)
 		return -EINVAL;
+	if (nr_words > BITS_ITER_NR_WORDS_MAX)
+		return -E2BIG;
 
 	/* Optimization for u64 mask */
 	if (nr_bits == 64) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH bpf v2 6/7] bpf: Use __u64 to save the bits in bits iterator
  2024-10-21  1:39 [PATCH bpf v2 0/7] Misc fixes for bpf Hou Tao
                   ` (4 preceding siblings ...)
  2024-10-21  1:40 ` [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new() Hou Tao
@ 2024-10-21  1:40 ` Hou Tao
  2024-10-23  3:10   ` Yafang Shao
  2024-10-21  1:40 ` [PATCH bpf v2 7/7] selftests/bpf: Test multiplication overflow of nr_bits in bits_iter Hou Tao
  2024-10-21 23:11 ` [PATCH bpf v2 0/7] Misc fixes for bpf Andrii Nakryiko
  7 siblings, 1 reply; 28+ messages in thread
From: Hou Tao @ 2024-10-21  1:40 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

From: Hou Tao <houtao1@huawei.com>

On 32-bit hosts (e.g., arm32), when a bpf program passes a u64 to
bpf_iter_bits_new(), bpf_iter_bits_new() will use bits_copy to store the
content of the u64. However, bits_copy is only 4 bytes, leading to stack
corruption.

The straightforward solution would be to replace u64 with unsigned long
in bpf_iter_bits_new(). However, this introduces confusion and problems
for 32-bit hosts because the size of ulong in bpf program is 8 bytes,
but it is treated as 4-bytes after passed to bpf_iter_bits_new().

Fix it by changing the type of both bits and bit_count from unsigned
long to u64. However, the change is not enough. The main reason is that
bpf_iter_bits_next() uses find_next_bit() to find the next bit and the
pointer passed to find_next_bit() is an unsigned long pointer instead
of a u64 pointer. For 32-bit little-endian host, it is fine but it is
not the case for 32-bit big-endian host. Because under 32-bit big-endian
host, the first iterated unsigned long will be the bits 32-63 of the u64
instead of the expected bits 0-31. Therefore, in addition to changing
the type, swap the two unsigned longs within the u64 for 32-bit
big-endian host.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/helpers.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index c147f75e1b48..0ad85201fb84 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2855,13 +2855,36 @@ struct bpf_iter_bits {
 
 struct bpf_iter_bits_kern {
 	union {
-		unsigned long *bits;
-		unsigned long bits_copy;
+		__u64 *bits;
+		__u64 bits_copy;
 	};
 	u32 nr_bits;
 	int bit;
 } __aligned(8);
 
+/* On 64-bit hosts, unsigned long and u64 have the same size, so passing
+ * a u64 pointer and an unsigned long pointer to find_next_bit() will
+ * return the same result, as both point to the same 8-byte area.
+ *
+ * For 32-bit little-endian hosts, using a u64 pointer or unsigned long
+ * pointer also makes no difference. This is because the first iterated
+ * unsigned long is composed of bits 0-31 of the u64 and the second unsigned
+ * long is composed of bits 32-63 of the u64.
+ *
+ * However, for 32-bit big-endian hosts, this is not the case. The first
+ * iterated unsigned long will be bits 32-63 of the u64, so swap these two
+ * ulong values within the u64.
+ */
+static void swap_ulong_in_u64(u64 *bits, unsigned int nr)
+{
+#if !defined(CONFIG_64BIT) && defined(__BIG_ENDIAN)
+	unsigned int i;
+
+	for (i = 0; i < nr; i++)
+		bits[i] = (bits[i] >> 32) | ((u64)(u32)bits[i] << 32);
+#endif
+}
+
 /**
  * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area
  * @it: The new bpf_iter_bits to be created
@@ -2903,6 +2926,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
 		if (err)
 			return -EFAULT;
 
+		swap_ulong_in_u64(&kit->bits_copy, nr_words);
+
 		kit->nr_bits = nr_bits;
 		return 0;
 	}
@@ -2918,6 +2943,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
 		return err;
 	}
 
+	swap_ulong_in_u64(kit->bits, nr_words);
+
 	kit->nr_bits = nr_bits;
 	return 0;
 }
@@ -2935,7 +2962,7 @@ __bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it)
 {
 	struct bpf_iter_bits_kern *kit = (void *)it;
 	u32 nr_bits = kit->nr_bits;
-	const unsigned long *bits;
+	const void *bits;
 	int bit;
 
 	if (kit->bit >= nr_bits)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH bpf v2 7/7] selftests/bpf: Test multiplication overflow of nr_bits in bits_iter
  2024-10-21  1:39 [PATCH bpf v2 0/7] Misc fixes for bpf Hou Tao
                   ` (5 preceding siblings ...)
  2024-10-21  1:40 ` [PATCH bpf v2 6/7] bpf: Use __u64 to save the bits in bits iterator Hou Tao
@ 2024-10-21  1:40 ` Hou Tao
  2024-10-21 23:11 ` [PATCH bpf v2 0/7] Misc fixes for bpf Andrii Nakryiko
  7 siblings, 0 replies; 28+ messages in thread
From: Hou Tao @ 2024-10-21  1:40 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

From: Hou Tao <houtao1@huawei.com>

Add a test to verify the multiplication overflow of nr_bits in bits_iter.
When nr_words is assigned as 67108865, nr_bits becomes 64, causing
bpf_probe_read_kernel_common() to corrupt the stack.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 .../selftests/bpf/progs/verifier_bits_iter.c       | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c
index f4da4d508ddb..344b7eac15c8 100644
--- a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c
+++ b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c
@@ -151,3 +151,17 @@ int zero_words(void)
 		nr++;
 	return nr;
 }
+
+SEC("syscall")
+__description("big words")
+__success __retval(0)
+int big_words(void)
+{
+	u64 data[8] = {0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1};
+	int nr = 0;
+	int *bit;
+
+	bpf_for_each(bits, bit, &data[0], 67108865)
+		nr++;
+	return nr;
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 4/7] bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
  2024-10-21  1:40 ` [PATCH bpf v2 4/7] bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() Hou Tao
@ 2024-10-21  2:45   ` Hou Tao
  2024-10-21 23:07     ` Andrii Nakryiko
  0 siblings, 1 reply; 28+ messages in thread
From: Hou Tao @ 2024-10-21  2:45 UTC (permalink / raw)
  To: bpf, Hou Tao
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, xukuohai, houtao1@huawei.com

Hi,

On 10/21/2024 9:40 AM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> bpf_iter_bits_destroy() uses "kit->nr_bits <= 64" to check whether the
> bits are dynamically allocated. However, the check is incorrect and may
> cause a kmemleak as shown below:
>
> unreferenced object 0xffff88812628c8c0 (size 32):
>   comm "swapper/0", pid 1, jiffies 4294727320
>   hex dump (first 32 bytes):
> 	b0 c1 55 f5 81 88 ff ff f0 f0 f0 f0 f0 f0 f0 f0  ..U.............
> 	f0 f0 f0 f0 f0 f0 f0 f0 00 00 00 00 00 00 00 00  ................
>   backtrace (crc 781e32cc):
> 	[<00000000c452b4ab>] kmemleak_alloc+0x4b/0x80
> 	[<0000000004e09f80>] __kmalloc_node_noprof+0x480/0x5c0
> 	[<00000000597124d6>] __alloc.isra.0+0x89/0xb0
> 	[<000000004ebfffcd>] alloc_bulk+0x2af/0x720
> 	[<00000000d9c10145>] prefill_mem_cache+0x7f/0xb0
> 	[<00000000ff9738ff>] bpf_mem_alloc_init+0x3e2/0x610
> 	[<000000008b616eac>] bpf_global_ma_init+0x19/0x30
> 	[<00000000fc473efc>] do_one_initcall+0xd3/0x3c0
> 	[<00000000ec81498c>] kernel_init_freeable+0x66a/0x940
> 	[<00000000b119f72f>] kernel_init+0x20/0x160
> 	[<00000000f11ac9a7>] ret_from_fork+0x3c/0x70
> 	[<0000000004671da4>] ret_from_fork_asm+0x1a/0x30
>
> That is because nr_bits will be set as zero in bpf_iter_bits_next()
> after all bits have been iterated.
>
> Fix the problem by not setting nr_bits to zero in bpf_iter_bits_next().
> Instead, use "bits >= nr_bits" to indicate when iteration is completed
> and still use "nr_bits > 64" to indicate when bits are dynamically
> allocated.
>
> Fixes: 4665415975b0 ("bpf: Add bits iterator")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/helpers.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 1a43d06eab28..62349e206a29 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2888,7 +2888,7 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
>  
>  	kit->nr_bits = 0;
>  	kit->bits_copy = 0;
> -	kit->bit = -1;
> +	kit->bit = 0;

Sent the patch out in a hurry and didn't run the related test.

The change above will break "fewer words" test in verifier_bits_iter,
because it will skip bit 0 in the bit. The correct fix should be as below:

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1a43d06eab28..190b730e0f86 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2934,15 +2934,13 @@ __bpf_kfunc int *bpf_iter_bits_next(struct
bpf_iter_bits *it)
        const unsigned long *bits;
        int bit;

-       if (nr_bits == 0)
+       if (kit->bit >= 0 && kit->bit >= nr_bits)
                return NULL;

        bits = nr_bits == 64 ? &kit->bits_copy : kit->bits;
        bit = find_next_bit(bits, nr_bits, kit->bit + 1);
-       if (bit >= nr_bits) {
-               kit->nr_bits = 0;
+       if (bit >= nr_bits)
                return NULL;
-       }

        kit->bit = bit;
        return &kit->bit;

>  
>  	if (!unsafe_ptr__ign || !nr_words)
>  		return -EINVAL;
> @@ -2934,15 +2934,13 @@ __bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it)
>  	const unsigned long *bits;
>  	int bit;
>  
> -	if (nr_bits == 0)
> +	if (kit->bit >= nr_bits)
>  		return NULL;
>  
>  	bits = nr_bits == 64 ? &kit->bits_copy : kit->bits;
>  	bit = find_next_bit(bits, nr_bits, kit->bit + 1);
> -	if (bit >= nr_bits) {
> -		kit->nr_bits = 0;
> +	if (bit >= nr_bits)
>  		return NULL;
> -	}
>  
>  	kit->bit = bit;
>  	return &kit->bit;


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[]
  2024-10-21  1:39 ` [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[] Hou Tao
@ 2024-10-21  8:18   ` Jiri Olsa
  2024-10-21 23:02     ` Andrii Nakryiko
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2024-10-21  8:18 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, John Fastabend,
	Yafang Shao, houtao1, xukuohai

On Mon, Oct 21, 2024 at 09:39:59AM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> If a corresponding link type doesn't invoke BPF_LINK_TYPE(), accessing
> bpf_link_type_strs[link->type] may result in out-of-bound access.
> 
> To prevent such missed invocations in the future, the following static
> assertion seems feasible:
> 
>   BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE)
> 
> However, this doesn't work well. The reason is that the invocation of
> BPF_LINK_TYPE() for one link type is optional due to its CONFIG_XXX
> dependency and the elements in bpf_link_type_strs[] will be sparse. For
> example, if CONFIG_NET is disabled, the size of bpf_link_type_strs will
> be BPF_LINK_TYPE_UPROBE_MULTI + 1.
> 
> Therefore, in addition to the static assertion, remove all CONFIG_XXX
> conditions for the invocation of BPF_LINK_TYPE(). If these CONFIG_XXX
> conditions become necessary later, the fix may need to be revised (e.g.,
> to check the validity of link_type in bpf_link_show_fdinfo()).
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  include/linux/bpf_types.h | 6 ------
>  kernel/bpf/syscall.c      | 2 ++
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index fa78f49d4a9a..6b7eabe9a115 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -136,21 +136,15 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
>  
>  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> -#ifdef CONFIG_CGROUP_BPF
>  BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
> -#endif
>  BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
> -#ifdef CONFIG_NET
>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETFILTER, netfilter)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_TCX, tcx)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETKIT, netkit)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_SOCKMAP, sockmap)
> -#endif
> -#ifdef CONFIG_PERF_EVENTS
>  BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
> -#endif
>  BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_UPROBE_MULTI, uprobe_multi)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 8cfa7183d2ef..9f335c379b05 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3071,6 +3071,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
>  	const struct bpf_prog *prog = link->prog;
>  	char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
>  
> +	BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE);

I wonder it'd be simpler to just kill BPF_LINK_TYPE completely
and add link names directly to bpf_link_type_strs array..
it seems it's the only purpose of the BPF_LINK_TYPE macro

jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 3/7] bpf: Preserve param->string when parsing mount options
  2024-10-21  1:40 ` [PATCH bpf v2 3/7] bpf: Preserve param->string when parsing mount options Hou Tao
@ 2024-10-21  9:09   ` Jiri Olsa
  0 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2024-10-21  9:09 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, John Fastabend,
	Yafang Shao, houtao1, xukuohai

On Mon, Oct 21, 2024 at 09:40:00AM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> In bpf_parse_param(), keep the value of param->string intact so it can
> be freed later. Otherwise, the kmalloc area pointed to by param->string
> will be leaked as shown below:
> 
> unreferenced object 0xffff888118c46d20 (size 8):
>   comm "new_name", pid 12109, jiffies 4295580214
>   hex dump (first 8 bytes):
>     61 6e 79 00 38 c9 5c 7e                          any.8.\~
>   backtrace (crc e1b7f876):
>     [<00000000c6848ac7>] kmemleak_alloc+0x4b/0x80
>     [<00000000de9f7d00>] __kmalloc_node_track_caller_noprof+0x36e/0x4a0
>     [<000000003e29b886>] memdup_user+0x32/0xa0
>     [<0000000007248326>] strndup_user+0x46/0x60
>     [<0000000035b3dd29>] __x64_sys_fsconfig+0x368/0x3d0
>     [<0000000018657927>] x64_sys_call+0xff/0x9f0
>     [<00000000c0cabc95>] do_syscall_64+0x3b/0xc0
>     [<000000002f331597>] entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> Fixes: 6c1752e0b6ca ("bpf: Support symbolic BPF FS delegation mount options")
> Signed-off-by: Hou Tao <houtao1@huawei.com>

nice, I saw that memleak report recently and couldn't make sense of it ;-)

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

> ---
>  kernel/bpf/inode.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index d8fc5eba529d..9aaf5124648b 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -880,7 +880,7 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
>  		const struct btf_type *enum_t;
>  		const char *enum_pfx;
>  		u64 *delegate_msk, msk = 0;
> -		char *p;
> +		char *p, *str;
>  		int val;
>  
>  		/* ignore errors, fallback to hex */
> @@ -911,7 +911,8 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
>  			return -EINVAL;
>  		}
>  
> -		while ((p = strsep(&param->string, ":"))) {
> +		str = param->string;
> +		while ((p = strsep(&str, ":"))) {
>  			if (strcmp(p, "any") == 0) {
>  				msk |= ~0ULL;
>  			} else if (find_btf_enum_const(info.btf, enum_t, enum_pfx, p, &val)) {
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new()
  2024-10-21  1:40 ` [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new() Hou Tao
@ 2024-10-21  9:51   ` Jiri Olsa
  2024-10-21 23:09   ` Andrii Nakryiko
  2024-10-23  3:17   ` Yafang Shao
  2 siblings, 0 replies; 28+ messages in thread
From: Jiri Olsa @ 2024-10-21  9:51 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, John Fastabend,
	Yafang Shao, houtao1, xukuohai

On Mon, Oct 21, 2024 at 09:40:02AM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Check the validity of nr_words in bpf_iter_bits_new(). Without this
> check, when multiplication overflow occurs for nr_bits (e.g., when
> nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
> due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).
> 
> Fix it by limiting the max value of nr_words to 512.

lgtm, nice catch .. it's actually stated in the comment,
but we did not force it

Acked-by: Jiri Olsa <jolsa@kernel.org>

jirka

> 
> Fixes: 4665415975b0 ("bpf: Add bits iterator")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/helpers.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 62349e206a29..c147f75e1b48 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2851,6 +2851,8 @@ struct bpf_iter_bits {
>  	__u64 __opaque[2];
>  } __aligned(8);
>  
> +#define BITS_ITER_NR_WORDS_MAX 512
> +
>  struct bpf_iter_bits_kern {
>  	union {
>  		unsigned long *bits;
> @@ -2892,6 +2894,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
>  
>  	if (!unsafe_ptr__ign || !nr_words)
>  		return -EINVAL;
> +	if (nr_words > BITS_ITER_NR_WORDS_MAX)
> +		return -E2BIG;
>  
>  	/* Optimization for u64 mask */
>  	if (nr_bits == 64) {
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[]
  2024-10-21  8:18   ` Jiri Olsa
@ 2024-10-21 23:02     ` Andrii Nakryiko
  2024-10-22  7:35       ` Hou Tao
  0 siblings, 1 reply; 28+ messages in thread
From: Andrii Nakryiko @ 2024-10-21 23:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Hou Tao, bpf, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, Eduard Zingerman, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	John Fastabend, Yafang Shao, houtao1, xukuohai

On Mon, Oct 21, 2024 at 1:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Oct 21, 2024 at 09:39:59AM +0800, Hou Tao wrote:
> > From: Hou Tao <houtao1@huawei.com>
> >
> > If a corresponding link type doesn't invoke BPF_LINK_TYPE(), accessing
> > bpf_link_type_strs[link->type] may result in out-of-bound access.
> >
> > To prevent such missed invocations in the future, the following static
> > assertion seems feasible:
> >
> >   BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE)
> >
> > However, this doesn't work well. The reason is that the invocation of
> > BPF_LINK_TYPE() for one link type is optional due to its CONFIG_XXX
> > dependency and the elements in bpf_link_type_strs[] will be sparse. For
> > example, if CONFIG_NET is disabled, the size of bpf_link_type_strs will
> > be BPF_LINK_TYPE_UPROBE_MULTI + 1.
> >
> > Therefore, in addition to the static assertion, remove all CONFIG_XXX
> > conditions for the invocation of BPF_LINK_TYPE(). If these CONFIG_XXX
> > conditions become necessary later, the fix may need to be revised (e.g.,
> > to check the validity of link_type in bpf_link_show_fdinfo()).
> >
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > ---
> >  include/linux/bpf_types.h | 6 ------
> >  kernel/bpf/syscall.c      | 2 ++
> >  2 files changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > index fa78f49d4a9a..6b7eabe9a115 100644
> > --- a/include/linux/bpf_types.h
> > +++ b/include/linux/bpf_types.h
> > @@ -136,21 +136,15 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
> >
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> > -#ifdef CONFIG_CGROUP_BPF
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
> > -#endif
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
> > -#ifdef CONFIG_NET
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_NETFILTER, netfilter)
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_TCX, tcx)
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_NETKIT, netkit)
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_SOCKMAP, sockmap)
> > -#endif
> > -#ifdef CONFIG_PERF_EVENTS
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
> > -#endif

I'm not sure what's the implication here, but I'd avoid doing that.
But see below.

> >  BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_UPROBE_MULTI, uprobe_multi)
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 8cfa7183d2ef..9f335c379b05 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -3071,6 +3071,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
> >       const struct bpf_prog *prog = link->prog;
> >       char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
> >
> > +     BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE);

If this is useless, why are you adding it?

Let's instead do a NULL check inside bpf_link_show_fdinfo() to handle
sparsity. And to avoid out-of-bounds, just add

[__MAX_BPF_LINK_TYPE] = NULL,

into the definition of bpf_link_type_strs

pw-bot: cr

>
> I wonder it'd be simpler to just kill BPF_LINK_TYPE completely
> and add link names directly to bpf_link_type_strs array..
> it seems it's the only purpose of the BPF_LINK_TYPE macro
>

This seems like a bit too short-sighted approach, let's not go there just yet.

> jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 4/7] bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
  2024-10-21  2:45   ` Hou Tao
@ 2024-10-21 23:07     ` Andrii Nakryiko
  2024-10-22  7:25       ` Hou Tao
  0 siblings, 1 reply; 28+ messages in thread
From: Andrii Nakryiko @ 2024-10-21 23:07 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, xukuohai, houtao1@huawei.com

On Sun, Oct 20, 2024 at 7:45 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 10/21/2024 9:40 AM, Hou Tao wrote:
> > From: Hou Tao <houtao1@huawei.com>
> >
> > bpf_iter_bits_destroy() uses "kit->nr_bits <= 64" to check whether the
> > bits are dynamically allocated. However, the check is incorrect and may
> > cause a kmemleak as shown below:
> >
> > unreferenced object 0xffff88812628c8c0 (size 32):
> >   comm "swapper/0", pid 1, jiffies 4294727320
> >   hex dump (first 32 bytes):
> >       b0 c1 55 f5 81 88 ff ff f0 f0 f0 f0 f0 f0 f0 f0  ..U.............
> >       f0 f0 f0 f0 f0 f0 f0 f0 00 00 00 00 00 00 00 00  ................
> >   backtrace (crc 781e32cc):
> >       [<00000000c452b4ab>] kmemleak_alloc+0x4b/0x80
> >       [<0000000004e09f80>] __kmalloc_node_noprof+0x480/0x5c0
> >       [<00000000597124d6>] __alloc.isra.0+0x89/0xb0
> >       [<000000004ebfffcd>] alloc_bulk+0x2af/0x720
> >       [<00000000d9c10145>] prefill_mem_cache+0x7f/0xb0
> >       [<00000000ff9738ff>] bpf_mem_alloc_init+0x3e2/0x610
> >       [<000000008b616eac>] bpf_global_ma_init+0x19/0x30
> >       [<00000000fc473efc>] do_one_initcall+0xd3/0x3c0
> >       [<00000000ec81498c>] kernel_init_freeable+0x66a/0x940
> >       [<00000000b119f72f>] kernel_init+0x20/0x160
> >       [<00000000f11ac9a7>] ret_from_fork+0x3c/0x70
> >       [<0000000004671da4>] ret_from_fork_asm+0x1a/0x30
> >
> > That is because nr_bits will be set as zero in bpf_iter_bits_next()
> > after all bits have been iterated.
> >
> > Fix the problem by not setting nr_bits to zero in bpf_iter_bits_next().
> > Instead, use "bits >= nr_bits" to indicate when iteration is completed
> > and still use "nr_bits > 64" to indicate when bits are dynamically
> > allocated.
> >
> > Fixes: 4665415975b0 ("bpf: Add bits iterator")
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > ---
> >  kernel/bpf/helpers.c | 8 +++-----
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 1a43d06eab28..62349e206a29 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2888,7 +2888,7 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
> >
> >       kit->nr_bits = 0;
> >       kit->bits_copy = 0;
> > -     kit->bit = -1;
> > +     kit->bit = 0;
>
> Sent the patch out in a hurry and didn't run the related test.
>
> The change above will break "fewer words" test in verifier_bits_iter,
> because it will skip bit 0 in the bit. The correct fix should be as below:
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 1a43d06eab28..190b730e0f86 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2934,15 +2934,13 @@ __bpf_kfunc int *bpf_iter_bits_next(struct
> bpf_iter_bits *it)
>         const unsigned long *bits;
>         int bit;
>
> -       if (nr_bits == 0)
> +       if (kit->bit >= 0 && kit->bit >= nr_bits)

this looks quite weird. Maybe instead of this seemingly unnecessary
`kit->bit >= 0` check, either add (int)nr_bits cast or just switch
nr_bits from u32 to int?


BTW,

u32 nr_bytes = nr_words * sizeof(u64);

seems like a problem, no? nr_words is u32, so this can overflow,
please check and fix as well, while you are at it?

>                 return NULL;
>
>         bits = nr_bits == 64 ? &kit->bits_copy : kit->bits;
>         bit = find_next_bit(bits, nr_bits, kit->bit + 1);
> -       if (bit >= nr_bits) {
> -               kit->nr_bits = 0;
> +       if (bit >= nr_bits)
>                 return NULL;
> -       }
>
>         kit->bit = bit;
>         return &kit->bit;
>
> >
> >       if (!unsafe_ptr__ign || !nr_words)
> >               return -EINVAL;
> > @@ -2934,15 +2934,13 @@ __bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it)
> >       const unsigned long *bits;
> >       int bit;
> >
> > -     if (nr_bits == 0)
> > +     if (kit->bit >= nr_bits)
> >               return NULL;
> >
> >       bits = nr_bits == 64 ? &kit->bits_copy : kit->bits;
> >       bit = find_next_bit(bits, nr_bits, kit->bit + 1);
> > -     if (bit >= nr_bits) {
> > -             kit->nr_bits = 0;
> > +     if (bit >= nr_bits)
> >               return NULL;
> > -     }
> >
> >       kit->bit = bit;
> >       return &kit->bit;
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new()
  2024-10-21  1:40 ` [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new() Hou Tao
  2024-10-21  9:51   ` Jiri Olsa
@ 2024-10-21 23:09   ` Andrii Nakryiko
  2024-10-23  3:17   ` Yafang Shao
  2 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2024-10-21 23:09 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

On Sun, Oct 20, 2024 at 6:28 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> Check the validity of nr_words in bpf_iter_bits_new(). Without this
> check, when multiplication overflow occurs for nr_bits (e.g., when
> nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
> due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).
>
> Fix it by limiting the max value of nr_words to 512.
>
> Fixes: 4665415975b0 ("bpf: Add bits iterator")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/helpers.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 62349e206a29..c147f75e1b48 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2851,6 +2851,8 @@ struct bpf_iter_bits {
>         __u64 __opaque[2];
>  } __aligned(8);
>
> +#define BITS_ITER_NR_WORDS_MAX 512
> +
>  struct bpf_iter_bits_kern {
>         union {
>                 unsigned long *bits;
> @@ -2892,6 +2894,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
>
>         if (!unsafe_ptr__ign || !nr_words)
>                 return -EINVAL;
> +       if (nr_words > BITS_ITER_NR_WORDS_MAX)
> +               return -E2BIG;

ah, didn't see this before replying on the previous patch. But still,
maybe, let's move nr_bytes and nr_bits assignment to after this check?

>
>         /* Optimization for u64 mask */
>         if (nr_bits == 64) {
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 0/7] Misc fixes for bpf
  2024-10-21  1:39 [PATCH bpf v2 0/7] Misc fixes for bpf Hou Tao
                   ` (6 preceding siblings ...)
  2024-10-21  1:40 ` [PATCH bpf v2 7/7] selftests/bpf: Test multiplication overflow of nr_bits in bits_iter Hou Tao
@ 2024-10-21 23:11 ` Andrii Nakryiko
  2024-10-22  7:37   ` Hou Tao
  7 siblings, 1 reply; 28+ messages in thread
From: Andrii Nakryiko @ 2024-10-21 23:11 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

On Sun, Oct 20, 2024 at 6:28 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> Hi,
>
> The patch set is just a bundle of fixes. Patch #1 fixes the out-of-bound
> for sockmap link fdinfo. Patch #2 adds a static assertion to guarantee
> such out-of-bound access will never happen again. Patch #3 fixes the
> kmemleak report when parsing the mount options for bpffs. Patch #4~#7
> fix issues related to bits iterator.
>
> Please check the individual patches for more details. And comments are
> always welcome.
>
> v2:
>  * patch #1: update tools/include/uapi/linux/bpf.h as well (Alexei)
>  * patch #2: new patch. Add a static assertion for bpf_link_type_strs[] (Andrii)
>  * patch #4: doesn't set nr_bits to zero in bpf_iter_bits_next (Andrii)
>  * patch #5: use 512 as the maximal value of nr_words
>  * patch #6: change the type of both bits and bits_copy to u64 (Andrii)
>
> v1: https://lore.kernel.org/bpf/20241008091718.3797027-1-houtao@huaweicloud.com/
>
You have three separate groups of fixes, please send them separately
so they can be landed separately and tested separately:

> Hou Tao (7):
>   bpf: Add the missing BPF_LINK_TYPE invocation for sockmap
>   bpf: Add assertion for the size of bpf_link_type_strs[]

first, link type fixes

>   bpf: Preserve param->string when parsing mount options

parsing fix

>   bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
>   bpf: Check the validity of nr_words in bpf_iter_bits_new()
>   bpf: Use __u64 to save the bits in bits iterator
>   selftests/bpf: Test multiplication overflow of nr_bits in bits_iter

bits iter fixes

>
>  include/linux/bpf_types.h                     |  7 +--
>  include/uapi/linux/bpf.h                      |  3 ++
>  kernel/bpf/helpers.c                          | 45 +++++++++++++++----
>  kernel/bpf/inode.c                            |  5 ++-
>  kernel/bpf/syscall.c                          |  2 +
>  tools/include/uapi/linux/bpf.h                |  3 ++
>  .../selftests/bpf/progs/verifier_bits_iter.c  | 14 ++++++
>  7 files changed, 63 insertions(+), 16 deletions(-)
>
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 4/7] bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
  2024-10-21 23:07     ` Andrii Nakryiko
@ 2024-10-22  7:25       ` Hou Tao
  0 siblings, 0 replies; 28+ messages in thread
From: Hou Tao @ 2024-10-22  7:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, xukuohai, houtao1@huawei.com

Hi,

On 10/22/2024 7:07 AM, Andrii Nakryiko wrote:
> On Sun, Oct 20, 2024 at 7:45 PM Hou Tao <houtao@huaweicloud.com> wrote:
>> Hi,
>>
>> On 10/21/2024 9:40 AM, Hou Tao wrote:
>>> From: Hou Tao <houtao1@huawei.com>
>>>
>>> bpf_iter_bits_destroy() uses "kit->nr_bits <= 64" to check whether the
>>> bits are dynamically allocated. However, the check is incorrect and may
>>> cause a kmemleak as shown below:
>>>
>>> unreferenced object 0xffff88812628c8c0 (size 32):
>>>   comm "swapper/0", pid 1, jiffies 4294727320
>>>   hex dump (first 32 bytes):
>>>       b0 c1 55 f5 81 88 ff ff f0 f0 f0 f0 f0 f0 f0 f0  ..U.............
>>>       f0 f0 f0 f0 f0 f0 f0 f0 00 00 00 00 00 00 00 00  ................
>>>   backtrace (crc 781e32cc):
>>>       [<00000000c452b4ab>] kmemleak_alloc+0x4b/0x80
>>>       [<0000000004e09f80>] __kmalloc_node_noprof+0x480/0x5c0
>>>       [<00000000597124d6>] __alloc.isra.0+0x89/0xb0
>>>       [<000000004ebfffcd>] alloc_bulk+0x2af/0x720
>>>       [<00000000d9c10145>] prefill_mem_cache+0x7f/0xb0
>>>       [<00000000ff9738ff>] bpf_mem_alloc_init+0x3e2/0x610
>>>       [<000000008b616eac>] bpf_global_ma_init+0x19/0x30
>>>       [<00000000fc473efc>] do_one_initcall+0xd3/0x3c0
>>>       [<00000000ec81498c>] kernel_init_freeable+0x66a/0x940
>>>       [<00000000b119f72f>] kernel_init+0x20/0x160
>>>       [<00000000f11ac9a7>] ret_from_fork+0x3c/0x70
>>>       [<0000000004671da4>] ret_from_fork_asm+0x1a/0x30
>>>
>>> That is because nr_bits will be set as zero in bpf_iter_bits_next()
>>> after all bits have been iterated.
>>>
>>> Fix the problem by not setting nr_bits to zero in bpf_iter_bits_next().
>>> Instead, use "bits >= nr_bits" to indicate when iteration is completed
>>> and still use "nr_bits > 64" to indicate when bits are dynamically
>>> allocated.
>>>
>>> Fixes: 4665415975b0 ("bpf: Add bits iterator")
>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>> ---
>>>  kernel/bpf/helpers.c | 8 +++-----
>>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index 1a43d06eab28..62349e206a29 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -2888,7 +2888,7 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
>>>
>>>       kit->nr_bits = 0;
>>>       kit->bits_copy = 0;
>>> -     kit->bit = -1;
>>> +     kit->bit = 0;
>> Sent the patch out in a hurry and didn't run the related test.
>>
>> The change above will break "fewer words" test in verifier_bits_iter,
>> because it will skip bit 0 in the bit. The correct fix should be as below:
>>
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 1a43d06eab28..190b730e0f86 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2934,15 +2934,13 @@ __bpf_kfunc int *bpf_iter_bits_next(struct
>> bpf_iter_bits *it)
>>         const unsigned long *bits;
>>         int bit;
>>
>> -       if (nr_bits == 0)
>> +       if (kit->bit >= 0 && kit->bit >= nr_bits)
> this looks quite weird. Maybe instead of this seemingly unnecessary
> `kit->bit >= 0` check, either add (int)nr_bits cast or just switch
> nr_bits from u32 to int?

OK. Will change nr_bits to int in the next revision.


>
>
> BTW,
>
> u32 nr_bytes = nr_words * sizeof(u64);
>
> seems like a problem, no? nr_words is u32, so this can overflow,
> please check and fix as well, while you are at it?

Will move it after the checking of nr_words in the following patch.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[]
  2024-10-21 23:02     ` Andrii Nakryiko
@ 2024-10-22  7:35       ` Hou Tao
  2024-10-22 17:40         ` Andrii Nakryiko
  0 siblings, 1 reply; 28+ messages in thread
From: Hou Tao @ 2024-10-22  7:35 UTC (permalink / raw)
  To: Andrii Nakryiko, Jiri Olsa
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, John Fastabend,
	Yafang Shao, houtao1, xukuohai

Hi,

On 10/22/2024 7:02 AM, Andrii Nakryiko wrote:
> On Mon, Oct 21, 2024 at 1:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>> On Mon, Oct 21, 2024 at 09:39:59AM +0800, Hou Tao wrote:
>>> From: Hou Tao <houtao1@huawei.com>
>>>
>>> If a corresponding link type doesn't invoke BPF_LINK_TYPE(), accessing
>>> bpf_link_type_strs[link->type] may result in out-of-bound access.
>>>
>>> To prevent such missed invocations in the future, the following static
>>> assertion seems feasible:
>>>
>>>   BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE)
>>>
>>> However, this doesn't work well. The reason is that the invocation of
>>> BPF_LINK_TYPE() for one link type is optional due to its CONFIG_XXX
>>> dependency and the elements in bpf_link_type_strs[] will be sparse. For
>>> example, if CONFIG_NET is disabled, the size of bpf_link_type_strs will
>>> be BPF_LINK_TYPE_UPROBE_MULTI + 1.
>>>
>>> Therefore, in addition to the static assertion, remove all CONFIG_XXX
>>> conditions for the invocation of BPF_LINK_TYPE(). If these CONFIG_XXX
>>> conditions become necessary later, the fix may need to be revised (e.g.,
>>> to check the validity of link_type in bpf_link_show_fdinfo()).
>>>
>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>> ---
>>>  include/linux/bpf_types.h | 6 ------
>>>  kernel/bpf/syscall.c      | 2 ++
>>>  2 files changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
>>> index fa78f49d4a9a..6b7eabe9a115 100644
>>> --- a/include/linux/bpf_types.h
>>> +++ b/include/linux/bpf_types.h
>>> @@ -136,21 +136,15 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
>>>
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
>>> -#ifdef CONFIG_CGROUP_BPF
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
>>> -#endif
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
>>> -#ifdef CONFIG_NET
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETFILTER, netfilter)
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_TCX, tcx)
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETKIT, netkit)
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_SOCKMAP, sockmap)
>>> -#endif
>>> -#ifdef CONFIG_PERF_EVENTS
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
>>> -#endif
> I'm not sure what's the implication here, but I'd avoid doing that.
> But see below.

OK.
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
>>>  BPF_LINK_TYPE(BPF_LINK_TYPE_UPROBE_MULTI, uprobe_multi)
>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>> index 8cfa7183d2ef..9f335c379b05 100644
>>> --- a/kernel/bpf/syscall.c
>>> +++ b/kernel/bpf/syscall.c
>>> @@ -3071,6 +3071,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
>>>       const struct bpf_prog *prog = link->prog;
>>>       char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
>>>
>>> +     BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE);
> If this is useless, why are you adding it?

It will work after removing these CONFIG_XXX dependencies for
BPF_LINK_TYPE() invocations.
>
> Let's instead do a NULL check inside bpf_link_show_fdinfo() to handle
> sparsity. And to avoid out-of-bounds, just add
>
> [__MAX_BPF_LINK_TYPE] = NULL,
>
> into the definition of bpf_link_type_strs

Instead of outputting a null string for a link_type which didn't invoke
BPF_LINK_TYPE, is outputting the numerical value of link->type more
reasonable as shown below ?

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2873302faf39..9a02cd914ed8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3073,14 +3073,15 @@ static void bpf_link_show_fdinfo(struct seq_file
*m, struct file *filp)
        const struct bpf_link *link = filp->private_data;
        const struct bpf_prog *prog = link->prog;
        char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
+       enum bpf_link_type type;

+       if (type < ARRAY_SIZE(bpf_link_type_strs) &&
bpf_link_type_strs[type])
+               seq_printf(m, "link_type:\t%s\n", bpf_link_type_strs[type]);
+       else
+               seq_printf(m, "link_type:\t[%d]\n", type);
+
+       seq_printf(m, "link_id:\t%u\n", link->id);

-       seq_printf(m,
-                  "link_type:\t%s\n"
-                  "link_id:\t%u\n",
-                  bpf_link_type_strs[link->type],
-                  link->id);

>
> pw-bot: cr
>
>> I wonder it'd be simpler to just kill BPF_LINK_TYPE completely
>> and add link names directly to bpf_link_type_strs array..
>> it seems it's the only purpose of the BPF_LINK_TYPE macro
>>
> This seems like a bit too short-sighted approach, let's not go there just yet.
>
>> jirka


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 0/7] Misc fixes for bpf
  2024-10-21 23:11 ` [PATCH bpf v2 0/7] Misc fixes for bpf Andrii Nakryiko
@ 2024-10-22  7:37   ` Hou Tao
  0 siblings, 0 replies; 28+ messages in thread
From: Hou Tao @ 2024-10-22  7:37 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Yafang Shao, houtao1, xukuohai

Hi Andrii,

On 10/22/2024 7:11 AM, Andrii Nakryiko wrote:
> On Sun, Oct 20, 2024 at 6:28 PM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Hi,
>>
>> The patch set is just a bundle of fixes. Patch #1 fixes the out-of-bound
>> for sockmap link fdinfo. Patch #2 adds a static assertion to guarantee
>> such out-of-bound access will never happen again. Patch #3 fixes the
>> kmemleak report when parsing the mount options for bpffs. Patch #4~#7
>> fix issues related to bits iterator.
>>
>> Please check the individual patches for more details. And comments are
>> always welcome.
>>
>> v2:
>>  * patch #1: update tools/include/uapi/linux/bpf.h as well (Alexei)
>>  * patch #2: new patch. Add a static assertion for bpf_link_type_strs[] (Andrii)
>>  * patch #4: doesn't set nr_bits to zero in bpf_iter_bits_next (Andrii)
>>  * patch #5: use 512 as the maximal value of nr_words
>>  * patch #6: change the type of both bits and bits_copy to u64 (Andrii)
>>
>> v1: https://lore.kernel.org/bpf/20241008091718.3797027-1-houtao@huaweicloud.com/
>>
> You have three separate groups of fixes, please send them separately
> so they can be landed separately and tested separately:
>
>> Hou Tao (7):
>>   bpf: Add the missing BPF_LINK_TYPE invocation for sockmap
>>   bpf: Add assertion for the size of bpf_link_type_strs[]
> first, link type fixes
>
>>   bpf: Preserve param->string when parsing mount options
> parsing fix
>
>>   bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
>>   bpf: Check the validity of nr_words in bpf_iter_bits_new()
>>   bpf: Use __u64 to save the bits in bits iterator
>>   selftests/bpf: Test multiplication overflow of nr_bits in bits_iter
> bits iter fixes

Thanks for the suggestion. Will do in v3.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[]
  2024-10-22  7:35       ` Hou Tao
@ 2024-10-22 17:40         ` Andrii Nakryiko
  2024-10-22 20:26           ` Alexei Starovoitov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrii Nakryiko @ 2024-10-22 17:40 UTC (permalink / raw)
  To: Hou Tao
  Cc: Jiri Olsa, bpf, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, Eduard Zingerman, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	John Fastabend, Yafang Shao, houtao1, xukuohai

On Tue, Oct 22, 2024 at 12:36 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 10/22/2024 7:02 AM, Andrii Nakryiko wrote:
> > On Mon, Oct 21, 2024 at 1:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >> On Mon, Oct 21, 2024 at 09:39:59AM +0800, Hou Tao wrote:
> >>> From: Hou Tao <houtao1@huawei.com>
> >>>
> >>> If a corresponding link type doesn't invoke BPF_LINK_TYPE(), accessing
> >>> bpf_link_type_strs[link->type] may result in out-of-bound access.
> >>>
> >>> To prevent such missed invocations in the future, the following static
> >>> assertion seems feasible:
> >>>
> >>>   BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE)
> >>>
> >>> However, this doesn't work well. The reason is that the invocation of
> >>> BPF_LINK_TYPE() for one link type is optional due to its CONFIG_XXX
> >>> dependency and the elements in bpf_link_type_strs[] will be sparse. For
> >>> example, if CONFIG_NET is disabled, the size of bpf_link_type_strs will
> >>> be BPF_LINK_TYPE_UPROBE_MULTI + 1.
> >>>
> >>> Therefore, in addition to the static assertion, remove all CONFIG_XXX
> >>> conditions for the invocation of BPF_LINK_TYPE(). If these CONFIG_XXX
> >>> conditions become necessary later, the fix may need to be revised (e.g.,
> >>> to check the validity of link_type in bpf_link_show_fdinfo()).
> >>>
> >>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> >>> ---
> >>>  include/linux/bpf_types.h | 6 ------
> >>>  kernel/bpf/syscall.c      | 2 ++
> >>>  2 files changed, 2 insertions(+), 6 deletions(-)
> >>>
> >>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> >>> index fa78f49d4a9a..6b7eabe9a115 100644
> >>> --- a/include/linux/bpf_types.h
> >>> +++ b/include/linux/bpf_types.h
> >>> @@ -136,21 +136,15 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
> >>>
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> >>> -#ifdef CONFIG_CGROUP_BPF
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
> >>> -#endif
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
> >>> -#ifdef CONFIG_NET
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETFILTER, netfilter)
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_TCX, tcx)
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETKIT, netkit)
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_SOCKMAP, sockmap)
> >>> -#endif
> >>> -#ifdef CONFIG_PERF_EVENTS
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
> >>> -#endif
> > I'm not sure what's the implication here, but I'd avoid doing that.
> > But see below.
>
> OK.
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
> >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_UPROBE_MULTI, uprobe_multi)
> >>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >>> index 8cfa7183d2ef..9f335c379b05 100644
> >>> --- a/kernel/bpf/syscall.c
> >>> +++ b/kernel/bpf/syscall.c
> >>> @@ -3071,6 +3071,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
> >>>       const struct bpf_prog *prog = link->prog;
> >>>       char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
> >>>
> >>> +     BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE);
> > If this is useless, why are you adding it?
>
> It will work after removing these CONFIG_XXX dependencies for
> BPF_LINK_TYPE() invocations.
> >
> > Let's instead do a NULL check inside bpf_link_show_fdinfo() to handle
> > sparsity. And to avoid out-of-bounds, just add
> >
> > [__MAX_BPF_LINK_TYPE] = NULL,
> >
> > into the definition of bpf_link_type_strs
>
> Instead of outputting a null string for a link_type which didn't invoke
> BPF_LINK_TYPE, is outputting the numerical value of link->type more
> reasonable as shown below ?

In correctly configured kernel this should never happen. So we can
have WARN() there for the NULL case and just return an error or
something.

>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 2873302faf39..9a02cd914ed8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3073,14 +3073,15 @@ static void bpf_link_show_fdinfo(struct seq_file
> *m, struct file *filp)
>         const struct bpf_link *link = filp->private_data;
>         const struct bpf_prog *prog = link->prog;
>         char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
> +       enum bpf_link_type type;
>
> +       if (type < ARRAY_SIZE(bpf_link_type_strs) &&
> bpf_link_type_strs[type])
> +               seq_printf(m, "link_type:\t%s\n", bpf_link_type_strs[type]);
> +       else
> +               seq_printf(m, "link_type:\t[%d]\n", type);
> +
> +       seq_printf(m, "link_id:\t%u\n", link->id);
>
> -       seq_printf(m,
> -                  "link_type:\t%s\n"
> -                  "link_id:\t%u\n",
> -                  bpf_link_type_strs[link->type],
> -                  link->id);
>
> >
> > pw-bot: cr
> >
> >> I wonder it'd be simpler to just kill BPF_LINK_TYPE completely
> >> and add link names directly to bpf_link_type_strs array..
> >> it seems it's the only purpose of the BPF_LINK_TYPE macro
> >>
> > This seems like a bit too short-sighted approach, let's not go there just yet.
> >
> >> jirka
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[]
  2024-10-22 17:40         ` Andrii Nakryiko
@ 2024-10-22 20:26           ` Alexei Starovoitov
  2024-10-22 20:41             ` Andrii Nakryiko
  0 siblings, 1 reply; 28+ messages in thread
From: Alexei Starovoitov @ 2024-10-22 20:26 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Hou Tao, Jiri Olsa, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, John Fastabend,
	Yafang Shao, Hou Tao, Xu Kuohai

On Tue, Oct 22, 2024 at 10:40 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Oct 22, 2024 at 12:36 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >
> > Hi,
> >
> > On 10/22/2024 7:02 AM, Andrii Nakryiko wrote:
> > > On Mon, Oct 21, 2024 at 1:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >> On Mon, Oct 21, 2024 at 09:39:59AM +0800, Hou Tao wrote:
> > >>> From: Hou Tao <houtao1@huawei.com>
> > >>>
> > >>> If a corresponding link type doesn't invoke BPF_LINK_TYPE(), accessing
> > >>> bpf_link_type_strs[link->type] may result in out-of-bound access.
> > >>>
> > >>> To prevent such missed invocations in the future, the following static
> > >>> assertion seems feasible:
> > >>>
> > >>>   BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE)
> > >>>
> > >>> However, this doesn't work well. The reason is that the invocation of
> > >>> BPF_LINK_TYPE() for one link type is optional due to its CONFIG_XXX
> > >>> dependency and the elements in bpf_link_type_strs[] will be sparse. For
> > >>> example, if CONFIG_NET is disabled, the size of bpf_link_type_strs will
> > >>> be BPF_LINK_TYPE_UPROBE_MULTI + 1.
> > >>>
> > >>> Therefore, in addition to the static assertion, remove all CONFIG_XXX
> > >>> conditions for the invocation of BPF_LINK_TYPE(). If these CONFIG_XXX
> > >>> conditions become necessary later, the fix may need to be revised (e.g.,
> > >>> to check the validity of link_type in bpf_link_show_fdinfo()).
> > >>>
> > >>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> > >>> ---
> > >>>  include/linux/bpf_types.h | 6 ------
> > >>>  kernel/bpf/syscall.c      | 2 ++
> > >>>  2 files changed, 2 insertions(+), 6 deletions(-)
> > >>>
> > >>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > >>> index fa78f49d4a9a..6b7eabe9a115 100644
> > >>> --- a/include/linux/bpf_types.h
> > >>> +++ b/include/linux/bpf_types.h
> > >>> @@ -136,21 +136,15 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
> > >>>
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> > >>> -#ifdef CONFIG_CGROUP_BPF
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
> > >>> -#endif
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
> > >>> -#ifdef CONFIG_NET
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETFILTER, netfilter)
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_TCX, tcx)
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETKIT, netkit)
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_SOCKMAP, sockmap)
> > >>> -#endif
> > >>> -#ifdef CONFIG_PERF_EVENTS
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
> > >>> -#endif
> > > I'm not sure what's the implication here, but I'd avoid doing that.
> > > But see below.
> >
> > OK.
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
> > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_UPROBE_MULTI, uprobe_multi)
> > >>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > >>> index 8cfa7183d2ef..9f335c379b05 100644
> > >>> --- a/kernel/bpf/syscall.c
> > >>> +++ b/kernel/bpf/syscall.c
> > >>> @@ -3071,6 +3071,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
> > >>>       const struct bpf_prog *prog = link->prog;
> > >>>       char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
> > >>>
> > >>> +     BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE);
> > > If this is useless, why are you adding it?
> >
> > It will work after removing these CONFIG_XXX dependencies for
> > BPF_LINK_TYPE() invocations.
> > >
> > > Let's instead do a NULL check inside bpf_link_show_fdinfo() to handle
> > > sparsity. And to avoid out-of-bounds, just add
> > >
> > > [__MAX_BPF_LINK_TYPE] = NULL,
> > >
> > > into the definition of bpf_link_type_strs
> >
> > Instead of outputting a null string for a link_type which didn't invoke
> > BPF_LINK_TYPE, is outputting the numerical value of link->type more
> > reasonable as shown below ?
>
> In correctly configured kernel this should never happen. So we can
> have WARN() there for the NULL case and just return an error or
> something.

I don't understand why this patch is needed.
Is it solving a theoretical problem ?

Something like the kernel managed to create a link
with link->type == BPF_LINK_TYPE_CGROUP,
but CONFIG_CGROUP_BPF was not defined somehow ?

There is no out-of-bounds or access to empty
bpf_link_type_strs[link->type] as far as I can tell.

What am I missing?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[]
  2024-10-22 20:26           ` Alexei Starovoitov
@ 2024-10-22 20:41             ` Andrii Nakryiko
  0 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2024-10-22 20:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hou Tao, Jiri Olsa, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, John Fastabend,
	Yafang Shao, Hou Tao, Xu Kuohai

On Tue, Oct 22, 2024 at 1:27 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Oct 22, 2024 at 10:40 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Oct 22, 2024 at 12:36 AM Hou Tao <houtao@huaweicloud.com> wrote:
> > >
> > > Hi,
> > >
> > > On 10/22/2024 7:02 AM, Andrii Nakryiko wrote:
> > > > On Mon, Oct 21, 2024 at 1:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >> On Mon, Oct 21, 2024 at 09:39:59AM +0800, Hou Tao wrote:
> > > >>> From: Hou Tao <houtao1@huawei.com>
> > > >>>
> > > >>> If a corresponding link type doesn't invoke BPF_LINK_TYPE(), accessing
> > > >>> bpf_link_type_strs[link->type] may result in out-of-bound access.
> > > >>>
> > > >>> To prevent such missed invocations in the future, the following static
> > > >>> assertion seems feasible:
> > > >>>
> > > >>>   BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE)
> > > >>>
> > > >>> However, this doesn't work well. The reason is that the invocation of
> > > >>> BPF_LINK_TYPE() for one link type is optional due to its CONFIG_XXX
> > > >>> dependency and the elements in bpf_link_type_strs[] will be sparse. For
> > > >>> example, if CONFIG_NET is disabled, the size of bpf_link_type_strs will
> > > >>> be BPF_LINK_TYPE_UPROBE_MULTI + 1.
> > > >>>
> > > >>> Therefore, in addition to the static assertion, remove all CONFIG_XXX
> > > >>> conditions for the invocation of BPF_LINK_TYPE(). If these CONFIG_XXX
> > > >>> conditions become necessary later, the fix may need to be revised (e.g.,
> > > >>> to check the validity of link_type in bpf_link_show_fdinfo()).
> > > >>>
> > > >>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> > > >>> ---
> > > >>>  include/linux/bpf_types.h | 6 ------
> > > >>>  kernel/bpf/syscall.c      | 2 ++
> > > >>>  2 files changed, 2 insertions(+), 6 deletions(-)
> > > >>>
> > > >>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > > >>> index fa78f49d4a9a..6b7eabe9a115 100644
> > > >>> --- a/include/linux/bpf_types.h
> > > >>> +++ b/include/linux/bpf_types.h
> > > >>> @@ -136,21 +136,15 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
> > > >>>
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> > > >>> -#ifdef CONFIG_CGROUP_BPF
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
> > > >>> -#endif
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
> > > >>> -#ifdef CONFIG_NET
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETFILTER, netfilter)
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_TCX, tcx)
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_NETKIT, netkit)
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_SOCKMAP, sockmap)
> > > >>> -#endif
> > > >>> -#ifdef CONFIG_PERF_EVENTS
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
> > > >>> -#endif
> > > > I'm not sure what's the implication here, but I'd avoid doing that.
> > > > But see below.
> > >

I'll just elaborate a bit why I wouldn't remove #ifdef guards. This
BPF_LINK_TYPE() macro magic can be used to define some extra data
structures that are specific to link type. E.g., some sort of
bpf_<type>_link_lops references or something along those lines. Having
BPF_LINK_TYPE() definition when the kernel actually doesn't implement
that link will be PITA in that case, generating references to
non-existent data structures.

> > > OK.
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi)
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops)
> > > >>>  BPF_LINK_TYPE(BPF_LINK_TYPE_UPROBE_MULTI, uprobe_multi)
> > > >>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > >>> index 8cfa7183d2ef..9f335c379b05 100644
> > > >>> --- a/kernel/bpf/syscall.c
> > > >>> +++ b/kernel/bpf/syscall.c
> > > >>> @@ -3071,6 +3071,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
> > > >>>       const struct bpf_prog *prog = link->prog;
> > > >>>       char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
> > > >>>
> > > >>> +     BUILD_BUG_ON(ARRAY_SIZE(bpf_link_type_strs) != __MAX_BPF_LINK_TYPE);
> > > > If this is useless, why are you adding it?
> > >
> > > It will work after removing these CONFIG_XXX dependencies for
> > > BPF_LINK_TYPE() invocations.
> > > >
> > > > Let's instead do a NULL check inside bpf_link_show_fdinfo() to handle
> > > > sparsity. And to avoid out-of-bounds, just add
> > > >
> > > > [__MAX_BPF_LINK_TYPE] = NULL,
> > > >
> > > > into the definition of bpf_link_type_strs
> > >
> > > Instead of outputting a null string for a link_type which didn't invoke
> > > BPF_LINK_TYPE, is outputting the numerical value of link->type more
> > > reasonable as shown below ?
> >
> > In correctly configured kernel this should never happen. So we can
> > have WARN() there for the NULL case and just return an error or

Actually, it seems like this is a void-returning function, so yeah,
instead of returning an error we can just emit an integer value. But
we should definitely have a WARN_ONCE().

> > something.
>
> I don't understand why this patch is needed.
> Is it solving a theoretical problem ?
>
> Something like the kernel managed to create a link
> with link->type == BPF_LINK_TYPE_CGROUP,
> but CONFIG_CGROUP_BPF was not defined somehow ?
>

It's just too easy to forget to add
BPF_LINK_TYPE(BPF_LINK_TYPE_<newlinktype>, ...) into
include/linux/bpf_types.h when adding a new type of BPF link. So Hou
is following up with changes that will make it easier to spot these
omissions in the future.

> There is no out-of-bounds or access to empty
> bpf_link_type_strs[link->type] as far as I can tell.
>
> What am I missing?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 6/7] bpf: Use __u64 to save the bits in bits iterator
  2024-10-21  1:40 ` [PATCH bpf v2 6/7] bpf: Use __u64 to save the bits in bits iterator Hou Tao
@ 2024-10-23  3:10   ` Yafang Shao
  2024-10-23  8:09     ` Hou Tao
  0 siblings, 1 reply; 28+ messages in thread
From: Yafang Shao @ 2024-10-23  3:10 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, xukuohai

On Mon, Oct 21, 2024 at 9:28 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> On 32-bit hosts (e.g., arm32), when a bpf program passes a u64 to
> bpf_iter_bits_new(), bpf_iter_bits_new() will use bits_copy to store the
> content of the u64. However, bits_copy is only 4 bytes, leading to stack
> corruption.
>
> The straightforward solution would be to replace u64 with unsigned long
> in bpf_iter_bits_new(). However, this introduces confusion and problems
> for 32-bit hosts because the size of ulong in bpf program is 8 bytes,
> but it is treated as 4-bytes after passed to bpf_iter_bits_new().
>
> Fix it by changing the type of both bits and bit_count from unsigned
> long to u64.

Thank you for the fix. This change is necessary.

>  However, the change is not enough. The main reason is that
> bpf_iter_bits_next() uses find_next_bit() to find the next bit and the
> pointer passed to find_next_bit() is an unsigned long pointer instead
> of a u64 pointer. For 32-bit little-endian host, it is fine but it is
> not the case for 32-bit big-endian host. Because under 32-bit big-endian
> host, the first iterated unsigned long will be the bits 32-63 of the u64
> instead of the expected bits 0-31. Therefore, in addition to changing
> the type, swap the two unsigned longs within the u64 for 32-bit
> big-endian host.

The API uses a u64 data type, and the nr_words parameter represents
the number of 8-byte units. On a 32-bit system, if you want to call
this API, you would define an array like `u32 data[2]` and invoke the
function as `bpf_for_each(bits, bit, &data[0], 1)`. However, since the
API expects a u64, you'll need to merge the two u32 values into a
single u64 value.

Given this, it might be more appropriate to ask users to handle the
u32 to u64 merge on their side when preparing the data, rather than
performing the swap within the kernel itself.

--
Regards

Yafang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new()
  2024-10-21  1:40 ` [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new() Hou Tao
  2024-10-21  9:51   ` Jiri Olsa
  2024-10-21 23:09   ` Andrii Nakryiko
@ 2024-10-23  3:17   ` Yafang Shao
  2024-10-23  8:29     ` Hou Tao
  2 siblings, 1 reply; 28+ messages in thread
From: Yafang Shao @ 2024-10-23  3:17 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, xukuohai

On Mon, Oct 21, 2024 at 9:28 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> Check the validity of nr_words in bpf_iter_bits_new(). Without this
> check, when multiplication overflow occurs for nr_bits (e.g., when
> nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
> due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).
>
> Fix it by limiting the max value of nr_words to 512.
>
> Fixes: 4665415975b0 ("bpf: Add bits iterator")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/helpers.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 62349e206a29..c147f75e1b48 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2851,6 +2851,8 @@ struct bpf_iter_bits {
>         __u64 __opaque[2];
>  } __aligned(8);
>
> +#define BITS_ITER_NR_WORDS_MAX 512
> +
>  struct bpf_iter_bits_kern {
>         union {
>                 unsigned long *bits;
> @@ -2892,6 +2894,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
>
>         if (!unsafe_ptr__ign || !nr_words)
>                 return -EINVAL;
> +       if (nr_words > BITS_ITER_NR_WORDS_MAX)
> +               return -E2BIG;

It is documented that nr_words cannot exceed 512, not due to overflow
concerns, but because of memory allocation limits. It might be better
to use 512 directly instead of BITS_ITER_NR_WORDS_MAX. Alternatively,
if we decide to keep using the macro, the documentation should be
updated accordingly.

>
>         /* Optimization for u64 mask */
>         if (nr_bits == 64) {
> --
> 2.29.2
>


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 6/7] bpf: Use __u64 to save the bits in bits iterator
  2024-10-23  3:10   ` Yafang Shao
@ 2024-10-23  8:09     ` Hou Tao
  0 siblings, 0 replies; 28+ messages in thread
From: Hou Tao @ 2024-10-23  8:09 UTC (permalink / raw)
  To: Yafang Shao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, xukuohai

Hi,

On 10/23/2024 11:10 AM, Yafang Shao wrote:
> On Mon, Oct 21, 2024 at 9:28 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> On 32-bit hosts (e.g., arm32), when a bpf program passes a u64 to
>> bpf_iter_bits_new(), bpf_iter_bits_new() will use bits_copy to store the
>> content of the u64. However, bits_copy is only 4 bytes, leading to stack
>> corruption.
>>
>> The straightforward solution would be to replace u64 with unsigned long
>> in bpf_iter_bits_new(). However, this introduces confusion and problems
>> for 32-bit hosts because the size of ulong in bpf program is 8 bytes,
>> but it is treated as 4-bytes after passed to bpf_iter_bits_new().
>>
>> Fix it by changing the type of both bits and bit_count from unsigned
>> long to u64.
> Thank you for the fix. This change is necessary.
>
>>  However, the change is not enough. The main reason is that
>> bpf_iter_bits_next() uses find_next_bit() to find the next bit and the
>> pointer passed to find_next_bit() is an unsigned long pointer instead
>> of a u64 pointer. For 32-bit little-endian host, it is fine but it is
>> not the case for 32-bit big-endian host. Because under 32-bit big-endian
>> host, the first iterated unsigned long will be the bits 32-63 of the u64
>> instead of the expected bits 0-31. Therefore, in addition to changing
>> the type, swap the two unsigned longs within the u64 for 32-bit
>> big-endian host.
> The API uses a u64 data type, and the nr_words parameter represents
> the number of 8-byte units. On a 32-bit system, if you want to call
> this API, you would define an array like `u32 data[2]` and invoke the
> function as `bpf_for_each(bits, bit, &data[0], 1)`. However, since the
> API expects a u64, you'll need to merge the two u32 values into a
> single u64 value.

It is a bit weird to pass a u32 pointer to bpf_for_each, because it
expects a u64 pointer. I think the bpf program should pass a u64 pointer
instead.
>
> Given this, it might be more appropriate to ask users to handle the
> u32 to u64 merge on their side when preparing the data, rather than
> performing the swap within the kernel itself.

However, the swap implemented in the patch has nothing to do whether the
user passes pointer to u32 array or not. It is necessary due to the
mismatched between the pointer type used by find_next_bit and the
pointer used by bpf_iter_bits_new(). The latter uses u64 pointer, but
find_next_bit() uses unsigned long pointer and iterates a long each
time. So just like the comment in the patch said:

    under 32-bit big-endian host, the first iterated unsigned long will
be the bits 32-63 of the u64 instead of the expected bits 0-31.

The swap in the patch will swap the two long values in the u64 and make
the first iterated unsigned long will be the bits 0-31 of the u64 value.
>
> --
> Regards
>
> Yafang


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new()
  2024-10-23  3:17   ` Yafang Shao
@ 2024-10-23  8:29     ` Hou Tao
  2024-10-23  9:25       ` Yafang Shao
  0 siblings, 1 reply; 28+ messages in thread
From: Hou Tao @ 2024-10-23  8:29 UTC (permalink / raw)
  To: Yafang Shao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, xukuohai



On 10/23/2024 11:17 AM, Yafang Shao wrote:
> On Mon, Oct 21, 2024 at 9:28 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Check the validity of nr_words in bpf_iter_bits_new(). Without this
>> check, when multiplication overflow occurs for nr_bits (e.g., when
>> nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
>> due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).
>>
>> Fix it by limiting the max value of nr_words to 512.
>>
>> Fixes: 4665415975b0 ("bpf: Add bits iterator")
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  kernel/bpf/helpers.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 62349e206a29..c147f75e1b48 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2851,6 +2851,8 @@ struct bpf_iter_bits {
>>         __u64 __opaque[2];
>>  } __aligned(8);
>>
>> +#define BITS_ITER_NR_WORDS_MAX 512
>> +
>>  struct bpf_iter_bits_kern {
>>         union {
>>                 unsigned long *bits;
>> @@ -2892,6 +2894,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
>>
>>         if (!unsafe_ptr__ign || !nr_words)
>>                 return -EINVAL;
>> +       if (nr_words > BITS_ITER_NR_WORDS_MAX)
>> +               return -E2BIG;
> It is documented that nr_words cannot exceed 512, not due to overflow
> concerns, but because of memory allocation limits. It might be better
> to use 512 directly instead of BITS_ITER_NR_WORDS_MAX. Alternatively,
> if we decide to keep using the macro, the documentation should be
> updated accordingly.

Thanks for the explanation. Actually according to the limitation of bpf
memory allocator, the limitation should be (4096 - 8) / 8 = 511 due to
the overhead of llist_head in the returned pointer.

I prefer to keep the macro. How about updating the comment as follows:

 * Due to the limitation of memalloc, it can't be greater than 511.
Therefore,
 * 511 is used as the maximum value for @nr_words.
>
>>         /* Optimization for u64 mask */
>>         if (nr_bits == 64) {
>> --
>> 2.29.2
>>
>
> --
> Regards
> Yafang


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new()
  2024-10-23  8:29     ` Hou Tao
@ 2024-10-23  9:25       ` Yafang Shao
  2024-10-23  9:34         ` Yafang Shao
  0 siblings, 1 reply; 28+ messages in thread
From: Yafang Shao @ 2024-10-23  9:25 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, xukuohai

On Wed, Oct 23, 2024 at 4:29 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
>
>
> On 10/23/2024 11:17 AM, Yafang Shao wrote:
> > On Mon, Oct 21, 2024 at 9:28 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> Check the validity of nr_words in bpf_iter_bits_new(). Without this
> >> check, when multiplication overflow occurs for nr_bits (e.g., when
> >> nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
> >> due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).
> >>
> >> Fix it by limiting the max value of nr_words to 512.
> >>
> >> Fixes: 4665415975b0 ("bpf: Add bits iterator")
> >> Signed-off-by: Hou Tao <houtao1@huawei.com>
> >> ---
> >>  kernel/bpf/helpers.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index 62349e206a29..c147f75e1b48 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2851,6 +2851,8 @@ struct bpf_iter_bits {
> >>         __u64 __opaque[2];
> >>  } __aligned(8);
> >>
> >> +#define BITS_ITER_NR_WORDS_MAX 512
> >> +
> >>  struct bpf_iter_bits_kern {
> >>         union {
> >>                 unsigned long *bits;
> >> @@ -2892,6 +2894,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
> >>
> >>         if (!unsafe_ptr__ign || !nr_words)
> >>                 return -EINVAL;
> >> +       if (nr_words > BITS_ITER_NR_WORDS_MAX)
> >> +               return -E2BIG;
> > It is documented that nr_words cannot exceed 512, not due to overflow
> > concerns, but because of memory allocation limits. It might be better
> > to use 512 directly instead of BITS_ITER_NR_WORDS_MAX. Alternatively,
> > if we decide to keep using the macro, the documentation should be
> > updated accordingly.
>
> Thanks for the explanation. Actually according to the limitation of bpf
> memory allocator, the limitation should be (4096 - 8) / 8 = 511 due to
> the overhead of llist_head in the returned pointer.

If that's the case, we should make the following code change, right ?

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 1a1b4458114c..64e73579c7d6 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -65,7 +65,7 @@ static u8 size_index[24] __ro_after_init = {

 static int bpf_mem_cache_idx(size_t size)
 {
-       if (!size || size > 4096)
+       if (!size || size > (4096 - 8))
                return -1;

        if (size <= 192)



--
Regards
Yafang

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new()
  2024-10-23  9:25       ` Yafang Shao
@ 2024-10-23  9:34         ` Yafang Shao
  0 siblings, 0 replies; 28+ messages in thread
From: Yafang Shao @ 2024-10-23  9:34 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, xukuohai

On Wed, Oct 23, 2024 at 5:25 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Wed, Oct 23, 2024 at 4:29 PM Hou Tao <houtao@huaweicloud.com> wrote:
> >
> >
> >
> > On 10/23/2024 11:17 AM, Yafang Shao wrote:
> > > On Mon, Oct 21, 2024 at 9:28 AM Hou Tao <houtao@huaweicloud.com> wrote:
> > >> From: Hou Tao <houtao1@huawei.com>
> > >>
> > >> Check the validity of nr_words in bpf_iter_bits_new(). Without this
> > >> check, when multiplication overflow occurs for nr_bits (e.g., when
> > >> nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur
> > >> due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008).
> > >>
> > >> Fix it by limiting the max value of nr_words to 512.
> > >>
> > >> Fixes: 4665415975b0 ("bpf: Add bits iterator")
> > >> Signed-off-by: Hou Tao <houtao1@huawei.com>
> > >> ---
> > >>  kernel/bpf/helpers.c | 4 ++++
> > >>  1 file changed, 4 insertions(+)
> > >>
> > >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > >> index 62349e206a29..c147f75e1b48 100644
> > >> --- a/kernel/bpf/helpers.c
> > >> +++ b/kernel/bpf/helpers.c
> > >> @@ -2851,6 +2851,8 @@ struct bpf_iter_bits {
> > >>         __u64 __opaque[2];
> > >>  } __aligned(8);
> > >>
> > >> +#define BITS_ITER_NR_WORDS_MAX 512
> > >> +
> > >>  struct bpf_iter_bits_kern {
> > >>         union {
> > >>                 unsigned long *bits;
> > >> @@ -2892,6 +2894,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w
> > >>
> > >>         if (!unsafe_ptr__ign || !nr_words)
> > >>                 return -EINVAL;
> > >> +       if (nr_words > BITS_ITER_NR_WORDS_MAX)
> > >> +               return -E2BIG;
> > > It is documented that nr_words cannot exceed 512, not due to overflow
> > > concerns, but because of memory allocation limits. It might be better
> > > to use 512 directly instead of BITS_ITER_NR_WORDS_MAX. Alternatively,
> > > if we decide to keep using the macro, the documentation should be
> > > updated accordingly.
> >
> > Thanks for the explanation. Actually according to the limitation of bpf
> > memory allocator, the limitation should be (4096 - 8) / 8 = 511 due to
> > the overhead of llist_head in the returned pointer.
>
> If that's the case, we should make the following code change, right ?
>
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 1a1b4458114c..64e73579c7d6 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -65,7 +65,7 @@ static u8 size_index[24] __ro_after_init = {
>
>  static int bpf_mem_cache_idx(size_t size)
>  {
> -       if (!size || size > 4096)
> +       if (!size || size > (4096 - 8))
>                 return -1;
>
>         if (size <= 192)
>
>
>
> --
> Regards
> Yafang

Please disregard my previous comment—I missed the !ma->percpu case.
You're right, the value cannot exceed 511.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-10-23  9:34 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-21  1:39 [PATCH bpf v2 0/7] Misc fixes for bpf Hou Tao
2024-10-21  1:39 ` [PATCH bpf v2 1/7] bpf: Add the missing BPF_LINK_TYPE invocation for sockmap Hou Tao
2024-10-21  1:39 ` [PATCH bpf v2 2/7] bpf: Add assertion for the size of bpf_link_type_strs[] Hou Tao
2024-10-21  8:18   ` Jiri Olsa
2024-10-21 23:02     ` Andrii Nakryiko
2024-10-22  7:35       ` Hou Tao
2024-10-22 17:40         ` Andrii Nakryiko
2024-10-22 20:26           ` Alexei Starovoitov
2024-10-22 20:41             ` Andrii Nakryiko
2024-10-21  1:40 ` [PATCH bpf v2 3/7] bpf: Preserve param->string when parsing mount options Hou Tao
2024-10-21  9:09   ` Jiri Olsa
2024-10-21  1:40 ` [PATCH bpf v2 4/7] bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() Hou Tao
2024-10-21  2:45   ` Hou Tao
2024-10-21 23:07     ` Andrii Nakryiko
2024-10-22  7:25       ` Hou Tao
2024-10-21  1:40 ` [PATCH bpf v2 5/7] bpf: Check the validity of nr_words in bpf_iter_bits_new() Hou Tao
2024-10-21  9:51   ` Jiri Olsa
2024-10-21 23:09   ` Andrii Nakryiko
2024-10-23  3:17   ` Yafang Shao
2024-10-23  8:29     ` Hou Tao
2024-10-23  9:25       ` Yafang Shao
2024-10-23  9:34         ` Yafang Shao
2024-10-21  1:40 ` [PATCH bpf v2 6/7] bpf: Use __u64 to save the bits in bits iterator Hou Tao
2024-10-23  3:10   ` Yafang Shao
2024-10-23  8:09     ` Hou Tao
2024-10-21  1:40 ` [PATCH bpf v2 7/7] selftests/bpf: Test multiplication overflow of nr_bits in bits_iter Hou Tao
2024-10-21 23:11 ` [PATCH bpf v2 0/7] Misc fixes for bpf Andrii Nakryiko
2024-10-22  7:37   ` Hou Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox