* [PATCH AUTOSEL 6.6 02/58] bpf: Check percpu map value size first
2024-10-04 18:23 [PATCH AUTOSEL 6.6 01/58] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
@ 2024-10-04 18:23 ` Sasha Levin
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 03/58] bpftool: Fix undefined behavior in qsort(NULL, 0, ...) Sasha Levin
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-10-04 18:23 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Tao Chen, Jinke Han, Andrii Nakryiko, Jiri Olsa, Sasha Levin, ast,
daniel, bpf
From: Tao Chen <chen.dylane@gmail.com>
[ Upstream commit 1d244784be6b01162b732a5a7d637dfc024c3203 ]
Percpu map is often used, but the map value size limit often ignored,
like issue: https://github.com/iovisor/bcc/issues/2519. Actually,
percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we
can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first,
like percpu map of local_storage. Maybe the error message seems clearer
compared with "cannot allocate memory".
Signed-off-by: Jinke Han <jinkehan@didiglobal.com>
Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/bpf/arraymap.c | 3 +++
kernel/bpf/hashtab.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index c9843dde69081..1811efcfbd6e3 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -73,6 +73,9 @@ int array_map_alloc_check(union bpf_attr *attr)
/* avoid overflow on round_up(map->value_size) */
if (attr->value_size > INT_MAX)
return -E2BIG;
+ /* percpu map value size is bound by PCPU_MIN_UNIT_SIZE */
+ if (percpu && round_up(attr->value_size, 8) > PCPU_MIN_UNIT_SIZE)
+ return -E2BIG;
return 0;
}
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 85cd17ca38290..7c64ad4f3732b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -458,6 +458,9 @@ static int htab_map_alloc_check(union bpf_attr *attr)
* kmalloc-able later in htab_map_update_elem()
*/
return -E2BIG;
+ /* percpu map value size is bound by PCPU_MIN_UNIT_SIZE */
+ if (percpu && round_up(attr->value_size, 8) > PCPU_MIN_UNIT_SIZE)
+ return -E2BIG;
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 6.6 03/58] bpftool: Fix undefined behavior in qsort(NULL, 0, ...)
2024-10-04 18:23 [PATCH AUTOSEL 6.6 01/58] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 02/58] bpf: Check percpu map value size first Sasha Levin
@ 2024-10-04 18:23 ` Sasha Levin
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 04/58] bpftool: Fix undefined behavior caused by shifting into the sign bit Sasha Levin
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-10-04 18:23 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuan-Wei Chiu, Andrii Nakryiko, Quentin Monnet, Sasha Levin, ast,
daniel, bpf
From: Kuan-Wei Chiu <visitorckw@gmail.com>
[ Upstream commit f04e2ad394e2755d0bb2d858ecb5598718bf00d5 ]
When netfilter has no entry to display, qsort is called with
qsort(NULL, 0, ...). This results in undefined behavior, as UBSan
reports:
net.c:827:2: runtime error: null pointer passed as argument 1, which is declared to never be null
Although the C standard does not explicitly state whether calling qsort
with a NULL pointer when the size is 0 constitutes undefined behavior,
Section 7.1.4 of the C standard (Use of library functions) mentions:
"Each of the following statements applies unless explicitly stated
otherwise in the detailed descriptions that follow: If an argument to a
function has an invalid value (such as a value outside the domain of
the function, or a pointer outside the address space of the program, or
a null pointer, or a pointer to non-modifiable storage when the
corresponding parameter is not const-qualified) or a type (after
promotion) not expected by a function with variable number of
arguments, the behavior is undefined."
To avoid this, add an early return when nf_link_info is NULL to prevent
calling qsort with a NULL pointer.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240910150207.3179306-1-visitorckw@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/bpf/bpftool/net.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index 66a8ce8ae0127..bd4e66d514f14 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -819,6 +819,9 @@ static void show_link_netfilter(void)
nf_link_count++;
}
+ if (!nf_link_info)
+ return;
+
qsort(nf_link_info, nf_link_count, sizeof(*nf_link_info), netfilter_link_compar);
for (id = 0; id < nf_link_count; id++) {
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 6.6 04/58] bpftool: Fix undefined behavior caused by shifting into the sign bit
2024-10-04 18:23 [PATCH AUTOSEL 6.6 01/58] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 02/58] bpf: Check percpu map value size first Sasha Levin
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 03/58] bpftool: Fix undefined behavior in qsort(NULL, 0, ...) Sasha Levin
@ 2024-10-04 18:23 ` Sasha Levin
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 08/58] bpf, x64: Fix a jit convergence issue Sasha Levin
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 17/58] bpf: Prevent tail call between progs attached to different hooks Sasha Levin
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-10-04 18:23 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuan-Wei Chiu, Andrii Nakryiko, Quentin Monnet, Sasha Levin, ast,
daniel, bpf
From: Kuan-Wei Chiu <visitorckw@gmail.com>
[ Upstream commit 4cdc0e4ce5e893bc92255f5f734d983012f2bc2e ]
Replace shifts of '1' with '1U' in bitwise operations within
__show_dev_tc_bpf() to prevent undefined behavior caused by shifting
into the sign bit of a signed integer. By using '1U', the operations
are explicitly performed on unsigned integers, avoiding potential
integer overflow or sign-related issues.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240908140009.3149781-1-visitorckw@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/bpf/bpftool/net.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index bd4e66d514f14..28e9417a5c2e3 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -480,9 +480,9 @@ static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev,
if (prog_flags[i] || json_output) {
NET_START_ARRAY("prog_flags", "%s ");
for (j = 0; prog_flags[i] && j < 32; j++) {
- if (!(prog_flags[i] & (1 << j)))
+ if (!(prog_flags[i] & (1U << j)))
continue;
- NET_DUMP_UINT_ONLY(1 << j);
+ NET_DUMP_UINT_ONLY(1U << j);
}
NET_END_ARRAY("");
}
@@ -491,9 +491,9 @@ static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev,
if (link_flags[i] || json_output) {
NET_START_ARRAY("link_flags", "%s ");
for (j = 0; link_flags[i] && j < 32; j++) {
- if (!(link_flags[i] & (1 << j)))
+ if (!(link_flags[i] & (1U << j)))
continue;
- NET_DUMP_UINT_ONLY(1 << j);
+ NET_DUMP_UINT_ONLY(1U << j);
}
NET_END_ARRAY("");
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 6.6 08/58] bpf, x64: Fix a jit convergence issue
2024-10-04 18:23 [PATCH AUTOSEL 6.6 01/58] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
` (2 preceding siblings ...)
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 04/58] bpftool: Fix undefined behavior caused by shifting into the sign bit Sasha Levin
@ 2024-10-04 18:23 ` Sasha Levin
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 17/58] bpf: Prevent tail call between progs attached to different hooks Sasha Levin
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-10-04 18:23 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Yonghong Song, Daniel Hodges, Alexei Starovoitov, Sasha Levin,
daniel, andrii, davem, dsahern, tglx, mingo, bp, dave.hansen, x86,
bpf, netdev
From: Yonghong Song <yonghong.song@linux.dev>
[ Upstream commit c8831bdbfbab672c006a18006d36932a494b2fd6 ]
Daniel Hodges reported a jit error when playing with a sched-ext program.
The error message is:
unexpected jmp_cond padding: -4 bytes
But further investigation shows the error is actual due to failed
convergence. The following are some analysis:
...
pass4, final_proglen=4391:
...
20e: 48 85 ff test rdi,rdi
211: 74 7d je 0x290
213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
289: 48 85 ff test rdi,rdi
28c: 74 17 je 0x2a5
28e: e9 7f ff ff ff jmp 0x212
293: bf 03 00 00 00 mov edi,0x3
Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
and insn at 0x28e is 5-byte jmp insn with offset -129.
pass5, final_proglen=4392:
...
20e: 48 85 ff test rdi,rdi
211: 0f 84 80 00 00 00 je 0x297
217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
28d: 48 85 ff test rdi,rdi
290: 74 1a je 0x2ac
292: eb 84 jmp 0x218
294: bf 03 00 00 00 mov edi,0x3
Note that insn at 0x211 is 6-byte cond jump insn now since its offset
becomes 0x80 based on previous round (0x293 - 0x213 = 0x80). At the same
time, insn at 0x292 is a 2-byte insn since its offset is -124.
pass6 will repeat the same code as in pass4. pass7 will repeat the same
code as in pass5, and so on. This will prevent eventual convergence.
Passes 1-14 are with padding = 0. At pass15, padding is 1 and related
insn looks like:
211: 0f 84 80 00 00 00 je 0x297
217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
24d: 48 85 d2 test rdx,rdx
The similar code in pass14:
211: 74 7d je 0x290
213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
249: 48 85 d2 test rdx,rdx
24c: 74 21 je 0x26f
24e: 48 01 f7 add rdi,rsi
...
Before generating the following insn,
250: 74 21 je 0x273
"padding = 1" enables some checking to ensure nops is either 0 or 4
where
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
nops = INSN_SZ_DIFF - 2
In this specific case,
addrs[i] = 0x24e // from pass14
addrs[i-1] = 0x24d // from pass15
prog - temp = 3 // from 'test rdx,rdx' in pass15
so
nops = -4
and this triggers the failure.
To fix the issue, we need to break cycles of je <-> jmp. For example,
in the above case, we have
211: 74 7d je 0x290
the offset is 0x7d. If 2-byte je insn is generated only if
the offset is less than 0x7d (<= 0x7c), the cycle can be
break and we can achieve the convergence.
I did some study on other cases like je <-> je, jmp <-> je and
jmp <-> jmp which may cause cycles. Those cases are not from actual
reproducible cases since it is pretty hard to construct a test case
for them. the results show that the offset <= 0x7b (0x7b = 123) should
be enough to cover all cases. This patch added a new helper to generate 8-bit
cond/uncond jmp insns only if the offset range is [-128, 123].
Reported-by: Daniel Hodges <hodgesd@meta.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221251.37109-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/net/bpf_jit_comp.c | 54 +++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 878a4c6dd7565..a50c99e9b5c01 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -58,6 +58,56 @@ static bool is_imm8(int value)
return value <= 127 && value >= -128;
}
+/*
+ * Let us limit the positive offset to be <= 123.
+ * This is to ensure eventual jit convergence For the following patterns:
+ * ...
+ * pass4, final_proglen=4391:
+ * ...
+ * 20e: 48 85 ff test rdi,rdi
+ * 211: 74 7d je 0x290
+ * 213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
+ * ...
+ * 289: 48 85 ff test rdi,rdi
+ * 28c: 74 17 je 0x2a5
+ * 28e: e9 7f ff ff ff jmp 0x212
+ * 293: bf 03 00 00 00 mov edi,0x3
+ * Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
+ * and insn at 0x28e is 5-byte jmp insn with offset -129.
+ *
+ * pass5, final_proglen=4392:
+ * ...
+ * 20e: 48 85 ff test rdi,rdi
+ * 211: 0f 84 80 00 00 00 je 0x297
+ * 217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
+ * ...
+ * 28d: 48 85 ff test rdi,rdi
+ * 290: 74 1a je 0x2ac
+ * 292: eb 84 jmp 0x218
+ * 294: bf 03 00 00 00 mov edi,0x3
+ * Note that insn at 0x211 is 6-byte cond jump insn now since its offset
+ * becomes 0x80 based on previous round (0x293 - 0x213 = 0x80).
+ * At the same time, insn at 0x292 is a 2-byte insn since its offset is
+ * -124.
+ *
+ * pass6 will repeat the same code as in pass4 and this will prevent
+ * eventual convergence.
+ *
+ * To fix this issue, we need to break je (2->6 bytes) <-> jmp (5->2 bytes)
+ * cycle in the above. In the above example je offset <= 0x7c should work.
+ *
+ * For other cases, je <-> je needs offset <= 0x7b to avoid no convergence
+ * issue. For jmp <-> je and jmp <-> jmp cases, jmp offset <= 0x7c should
+ * avoid no convergence issue.
+ *
+ * Overall, let us limit the positive offset for 8bit cond/uncond jmp insn
+ * to maximum 123 (0x7b). This way, the jit pass can eventually converge.
+ */
+static bool is_imm8_jmp_offset(int value)
+{
+ return value <= 123 && value >= -128;
+}
+
static bool is_simm32(s64 value)
{
return value == (s64)(s32)value;
@@ -1774,7 +1824,7 @@ st: if (is_imm8(insn->off))
return -EFAULT;
}
jmp_offset = addrs[i + insn->off] - addrs[i];
- if (is_imm8(jmp_offset)) {
+ if (is_imm8_jmp_offset(jmp_offset)) {
if (jmp_padding) {
/* To keep the jmp_offset valid, the extra bytes are
* padded before the jump insn, so we subtract the
@@ -1856,7 +1906,7 @@ st: if (is_imm8(insn->off))
break;
}
emit_jmp:
- if (is_imm8(jmp_offset)) {
+ if (is_imm8_jmp_offset(jmp_offset)) {
if (jmp_padding) {
/* To avoid breaking jmp_offset, the extra bytes
* are padded before the actual jmp insn, so
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH AUTOSEL 6.6 17/58] bpf: Prevent tail call between progs attached to different hooks
2024-10-04 18:23 [PATCH AUTOSEL 6.6 01/58] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
` (3 preceding siblings ...)
2024-10-04 18:23 ` [PATCH AUTOSEL 6.6 08/58] bpf, x64: Fix a jit convergence issue Sasha Levin
@ 2024-10-04 18:23 ` Sasha Levin
4 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2024-10-04 18:23 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Xu Kuohai, Alexei Starovoitov, Andrii Nakryiko, Sasha Levin,
daniel, davem, kuba, hawk, john.fastabend, bpf, netdev
From: Xu Kuohai <xukuohai@huawei.com>
[ Upstream commit 28ead3eaabc16ecc907cfb71876da028080f6356 ]
bpf progs can be attached to kernel functions, and the attached functions
can take different parameters or return different return values. If
prog attached to one kernel function tail calls prog attached to another
kernel function, the ctx access or return value verification could be
bypassed.
For example, if prog1 is attached to func1 which takes only 1 parameter
and prog2 is attached to func2 which takes two parameters. Since verifier
assumes the bpf ctx passed to prog2 is constructed based on func2's
prototype, verifier allows prog2 to access the second parameter from
the bpf ctx passed to it. The problem is that verifier does not prevent
prog1 from passing its bpf ctx to prog2 via tail call. In this case,
the bpf ctx passed to prog2 is constructed from func1 instead of func2,
that is, the assumption for ctx access verification is bypassed.
Another example, if BPF LSM prog1 is attached to hook file_alloc_security,
and BPF LSM prog2 is attached to hook bpf_lsm_audit_rule_known. Verifier
knows the return value rules for these two hooks, e.g. it is legal for
bpf_lsm_audit_rule_known to return positive number 1, and it is illegal
for file_alloc_security to return positive number. So verifier allows
prog2 to return positive number 1, but does not allow prog1 to return
positive number. The problem is that verifier does not prevent prog1
from calling prog2 via tail call. In this case, prog2's return value 1
will be used as the return value for prog1's hook file_alloc_security.
That is, the return value rule is bypassed.
This patch adds restriction for tail call to prevent such bypasses.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20240719110059.797546-4-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/bpf.h | 1 +
kernel/bpf/core.c | 21 ++++++++++++++++++---
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e4cd28c38b825..0f0e0265cbdf5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -288,6 +288,7 @@ struct bpf_map {
* same prog type, JITed flag and xdp_has_frags flag.
*/
struct {
+ const struct btf_type *attach_func_proto;
spinlock_t lock;
enum bpf_prog_type type;
bool jited;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 4124805ad7ba5..58ee17f429a33 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2259,6 +2259,7 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
{
enum bpf_prog_type prog_type = resolve_prog_type(fp);
bool ret;
+ struct bpf_prog_aux *aux = fp->aux;
if (fp->kprobe_override)
return false;
@@ -2268,7 +2269,7 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
* in the case of devmap and cpumap). Until device checks
* are implemented, prohibit adding dev-bound programs to program maps.
*/
- if (bpf_prog_is_dev_bound(fp->aux))
+ if (bpf_prog_is_dev_bound(aux))
return false;
spin_lock(&map->owner.lock);
@@ -2278,12 +2279,26 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
*/
map->owner.type = prog_type;
map->owner.jited = fp->jited;
- map->owner.xdp_has_frags = fp->aux->xdp_has_frags;
+ map->owner.xdp_has_frags = aux->xdp_has_frags;
+ map->owner.attach_func_proto = aux->attach_func_proto;
ret = true;
} else {
ret = map->owner.type == prog_type &&
map->owner.jited == fp->jited &&
- map->owner.xdp_has_frags == fp->aux->xdp_has_frags;
+ map->owner.xdp_has_frags == aux->xdp_has_frags;
+ if (ret &&
+ map->owner.attach_func_proto != aux->attach_func_proto) {
+ switch (prog_type) {
+ case BPF_PROG_TYPE_TRACING:
+ case BPF_PROG_TYPE_LSM:
+ case BPF_PROG_TYPE_EXT:
+ case BPF_PROG_TYPE_STRUCT_OPS:
+ ret = false;
+ break;
+ default:
+ break;
+ }
+ }
}
spin_unlock(&map->owner.lock);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread