* [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails
@ 2024-10-04 18:19 Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 02/70] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Sasha Levin @ 2024-10-04 18:19 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Hou Tao, Jiri Olsa, Alexei Starovoitov, Sasha Levin, daniel,
andrii, bpf
From: Hou Tao <houtao1@huawei.com>
[ Upstream commit 87e9675a0dfd0bf4a36550e4a0e673038ec67aee ]
When security_bpf_map_create() in map_create() fails, map_create() will
call btf_put() and ->map_free() callback to free the map. It doesn't
free the btf_record of map value, so add the missed btf_record_free()
when map creation fails.
However btf_record_free() needs to be called after ->map_free() just
like bpf_map_free_deferred() did, because ->map_free() may use the
btf_record to free the special fields in preallocated map value. So
factor out bpf_map_free() helper to free the map, btf_record, and btf
orderly and use the helper in both map_create() and
bpf_map_free_deferred().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240912012845.3458483-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/bpf/syscall.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f45ed6adc092a..92590ddb2042d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -733,15 +733,11 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
}
}
-/* called from workqueue */
-static void bpf_map_free_deferred(struct work_struct *work)
+static void bpf_map_free(struct bpf_map *map)
{
- struct bpf_map *map = container_of(work, struct bpf_map, work);
struct btf_record *rec = map->record;
struct btf *btf = map->btf;
- security_bpf_map_free(map);
- bpf_map_release_memcg(map);
/* implementation dependent freeing */
map->ops->map_free(map);
/* Delay freeing of btf_record for maps, as map_free
@@ -760,6 +756,16 @@ static void bpf_map_free_deferred(struct work_struct *work)
btf_put(btf);
}
+/* called from workqueue */
+static void bpf_map_free_deferred(struct work_struct *work)
+{
+ struct bpf_map *map = container_of(work, struct bpf_map, work);
+
+ security_bpf_map_free(map);
+ bpf_map_release_memcg(map);
+ bpf_map_free(map);
+}
+
static void bpf_map_put_uref(struct bpf_map *map)
{
if (atomic64_dec_and_test(&map->usercnt)) {
@@ -1411,8 +1417,7 @@ static int map_create(union bpf_attr *attr)
free_map_sec:
security_bpf_map_free(map);
free_map:
- btf_put(map->btf);
- map->ops->map_free(map);
+ bpf_map_free(map);
put_token:
bpf_token_put(token);
return err;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH AUTOSEL 6.10 02/70] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test
2024-10-04 18:19 [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
@ 2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 03/70] bpf: Fix a sdiv overflow issue Sasha Levin
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-10-04 18:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Daniel Borkmann, Andrii Nakryiko, Alexei Starovoitov, Sasha Levin,
eddyz87, shuah, andreimatei1, shung-hsi.yu, bpf, linux-kselftest
From: Daniel Borkmann <daniel@iogearbox.net>
[ Upstream commit b8e188f023e07a733b47d5865311ade51878fe40 ]
The assumption of 'in privileged mode reads from uninitialized stack locations
are permitted' is not quite correct since the verifier was probing for read
access rather than write access. Both tests need to be annotated as __success
for privileged and unprivileged.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240913191754.13290-6-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/testing/selftests/bpf/progs/verifier_int_ptr.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/verifier_int_ptr.c b/tools/testing/selftests/bpf/progs/verifier_int_ptr.c
index 9fc3fae5cd833..87206803c0255 100644
--- a/tools/testing/selftests/bpf/progs/verifier_int_ptr.c
+++ b/tools/testing/selftests/bpf/progs/verifier_int_ptr.c
@@ -8,7 +8,6 @@
SEC("socket")
__description("ARG_PTR_TO_LONG uninitialized")
__success
-__failure_unpriv __msg_unpriv("invalid indirect read from stack R4 off -16+0 size 8")
__naked void arg_ptr_to_long_uninitialized(void)
{
asm volatile (" \
@@ -36,9 +35,7 @@ __naked void arg_ptr_to_long_uninitialized(void)
SEC("socket")
__description("ARG_PTR_TO_LONG half-uninitialized")
-/* in privileged mode reads from uninitialized stack locations are permitted */
-__success __failure_unpriv
-__msg_unpriv("invalid indirect read from stack R4 off -16+4 size 8")
+__success
__retval(0)
__naked void ptr_to_long_half_uninitialized(void)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH AUTOSEL 6.10 03/70] bpf: Fix a sdiv overflow issue
2024-10-04 18:19 [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 02/70] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
@ 2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 04/70] bpf: Check percpu map value size first Sasha Levin
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-10-04 18:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Yonghong Song, Zac Ecob, Andrii Nakryiko, Alexei Starovoitov,
Sasha Levin, daniel, bpf
From: Yonghong Song <yonghong.song@linux.dev>
[ Upstream commit 7dd34d7b7dcf9309fc6224caf4dd5b35bedddcb7 ]
Zac Ecob reported a problem where a bpf program may cause kernel crash due
to the following error:
Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
The failure is due to the below signed divide:
LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808.
LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808,
but it is impossible since for 64-bit system, the maximum positive
number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will
cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is
LLONG_MIN.
Further investigation found all the following sdiv/smod cases may trigger
an exception when bpf program is running on x86_64 platform:
- LLONG_MIN/-1 for 64bit operation
- INT_MIN/-1 for 32bit operation
- LLONG_MIN%-1 for 64bit operation
- INT_MIN%-1 for 32bit operation
where -1 can be an immediate or in a register.
On arm64, there are no exceptions:
- LLONG_MIN/-1 = LLONG_MIN
- INT_MIN/-1 = INT_MIN
- LLONG_MIN%-1 = 0
- INT_MIN%-1 = 0
where -1 can be an immediate or in a register.
Insn patching is needed to handle the above cases and the patched codes
produced results aligned with above arm64 result. The below are pseudo
codes to handle sdiv/smod exceptions including both divisor -1 and divisor 0
and the divisor is stored in a register.
sdiv:
tmp = rX
tmp += 1 /* [-1, 0] -> [0, 1]
if tmp >(unsigned) 1 goto L2
if tmp == 0 goto L1
rY = 0
L1:
rY = -rY;
goto L3
L2:
rY /= rX
L3:
smod:
tmp = rX
tmp += 1 /* [-1, 0] -> [0, 1]
if tmp >(unsigned) 1 goto L1
if tmp == 1 (is64 ? goto L2 : goto L3)
rY = 0;
goto L2
L1:
rY %= rX
L2:
goto L4 // only when !is64
L3:
wY = wY // only when !is64
L4:
[1] https://lore.kernel.org/bpf/tPJLTEh7S_DxFEqAI2Ji5MBSoZVg7_G-Py2iaZpAaWtM961fFTWtsnlzwvTbzBzaUzwQAoNATXKUlt0LZOFgnDcIyKCswAnAGdUF3LBrhGQ=@protonmail.com/
Reported-by: Zac Ecob <zacecob@protonmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240913150326.1187788-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/bpf/verifier.c | 93 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 89 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 73f55f4b945ee..10e953fd21fd9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19909,13 +19909,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
/* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
- /* Make divide-by-zero exceptions impossible. */
+ /* Make sdiv/smod divide-by-minus-one exceptions impossible. */
+ if ((insn->code == (BPF_ALU64 | BPF_MOD | BPF_K) ||
+ insn->code == (BPF_ALU64 | BPF_DIV | BPF_K) ||
+ insn->code == (BPF_ALU | BPF_MOD | BPF_K) ||
+ insn->code == (BPF_ALU | BPF_DIV | BPF_K)) &&
+ insn->off == 1 && insn->imm == -1) {
+ bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
+ bool isdiv = BPF_OP(insn->code) == BPF_DIV;
+ struct bpf_insn *patchlet;
+ struct bpf_insn chk_and_sdiv[] = {
+ BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
+ BPF_NEG | BPF_K, insn->dst_reg,
+ 0, 0, 0),
+ };
+ struct bpf_insn chk_and_smod[] = {
+ BPF_MOV32_IMM(insn->dst_reg, 0),
+ };
+
+ patchlet = isdiv ? chk_and_sdiv : chk_and_smod;
+ cnt = isdiv ? ARRAY_SIZE(chk_and_sdiv) : ARRAY_SIZE(chk_and_smod);
+
+ new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
+ if (!new_prog)
+ return -ENOMEM;
+
+ delta += cnt - 1;
+ env->prog = prog = new_prog;
+ insn = new_prog->insnsi + i + delta;
+ goto next_insn;
+ }
+
+ /* Make divide-by-zero and divide-by-minus-one exceptions impossible. */
if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
bool isdiv = BPF_OP(insn->code) == BPF_DIV;
+ bool is_sdiv = isdiv && insn->off == 1;
+ bool is_smod = !isdiv && insn->off == 1;
struct bpf_insn *patchlet;
struct bpf_insn chk_and_div[] = {
/* [R,W]x div 0 -> 0 */
@@ -19935,10 +19968,62 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
BPF_JMP_IMM(BPF_JA, 0, 0, 1),
BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
};
+ struct bpf_insn chk_and_sdiv[] = {
+ /* [R,W]x sdiv 0 -> 0
+ * LLONG_MIN sdiv -1 -> LLONG_MIN
+ * INT_MIN sdiv -1 -> INT_MIN
+ */
+ BPF_MOV64_REG(BPF_REG_AX, insn->src_reg),
+ BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
+ BPF_ADD | BPF_K, BPF_REG_AX,
+ 0, 0, 1),
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JGT | BPF_K, BPF_REG_AX,
+ 0, 4, 1),
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JEQ | BPF_K, BPF_REG_AX,
+ 0, 1, 0),
+ BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
+ BPF_MOV | BPF_K, insn->dst_reg,
+ 0, 0, 0),
+ /* BPF_NEG(LLONG_MIN) == -LLONG_MIN == LLONG_MIN */
+ BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
+ BPF_NEG | BPF_K, insn->dst_reg,
+ 0, 0, 0),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+ *insn,
+ };
+ struct bpf_insn chk_and_smod[] = {
+ /* [R,W]x mod 0 -> [R,W]x */
+ /* [R,W]x mod -1 -> 0 */
+ BPF_MOV64_REG(BPF_REG_AX, insn->src_reg),
+ BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
+ BPF_ADD | BPF_K, BPF_REG_AX,
+ 0, 0, 1),
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JGT | BPF_K, BPF_REG_AX,
+ 0, 3, 1),
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JEQ | BPF_K, BPF_REG_AX,
+ 0, 3 + (is64 ? 0 : 1), 1),
+ BPF_MOV32_IMM(insn->dst_reg, 0),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+ *insn,
+ BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+ BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
+ };
- patchlet = isdiv ? chk_and_div : chk_and_mod;
- cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
- ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
+ if (is_sdiv) {
+ patchlet = chk_and_sdiv;
+ cnt = ARRAY_SIZE(chk_and_sdiv);
+ } else if (is_smod) {
+ patchlet = chk_and_smod;
+ cnt = ARRAY_SIZE(chk_and_smod) - (is64 ? 2 : 0);
+ } else {
+ patchlet = isdiv ? chk_and_div : chk_and_mod;
+ cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
+ ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
+ }
new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
if (!new_prog)
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH AUTOSEL 6.10 04/70] bpf: Check percpu map value size first
2024-10-04 18:19 [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 02/70] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 03/70] bpf: Fix a sdiv overflow issue Sasha Levin
@ 2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 05/70] bpftool: Fix undefined behavior in qsort(NULL, 0, ...) Sasha Levin
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-10-04 18:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Tao Chen, Jinke Han, Andrii Nakryiko, Jiri Olsa, Sasha Levin, ast,
daniel, bpf
From: Tao Chen <chen.dylane@gmail.com>
[ Upstream commit 1d244784be6b01162b732a5a7d637dfc024c3203 ]
Percpu map is often used, but the map value size limit often ignored,
like issue: https://github.com/iovisor/bcc/issues/2519. Actually,
percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we
can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first,
like percpu map of local_storage. Maybe the error message seems clearer
compared with "cannot allocate memory".
Signed-off-by: Jinke Han <jinkehan@didiglobal.com>
Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/bpf/arraymap.c | 3 +++
kernel/bpf/hashtab.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index feabc01938525..a5c6f8aa49015 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -73,6 +73,9 @@ int array_map_alloc_check(union bpf_attr *attr)
/* avoid overflow on round_up(map->value_size) */
if (attr->value_size > INT_MAX)
return -E2BIG;
+ /* percpu map value size is bound by PCPU_MIN_UNIT_SIZE */
+ if (percpu && round_up(attr->value_size, 8) > PCPU_MIN_UNIT_SIZE)
+ return -E2BIG;
return 0;
}
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 06115f8728e89..bcb74ac880cb5 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -462,6 +462,9 @@ static int htab_map_alloc_check(union bpf_attr *attr)
* kmalloc-able later in htab_map_update_elem()
*/
return -E2BIG;
+ /* percpu map value size is bound by PCPU_MIN_UNIT_SIZE */
+ if (percpu && round_up(attr->value_size, 8) > PCPU_MIN_UNIT_SIZE)
+ return -E2BIG;
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH AUTOSEL 6.10 05/70] bpftool: Fix undefined behavior in qsort(NULL, 0, ...)
2024-10-04 18:19 [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
` (2 preceding siblings ...)
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 04/70] bpf: Check percpu map value size first Sasha Levin
@ 2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 06/70] bpftool: Fix undefined behavior caused by shifting into the sign bit Sasha Levin
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-10-04 18:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuan-Wei Chiu, Andrii Nakryiko, Quentin Monnet, Sasha Levin, ast,
daniel, bpf
From: Kuan-Wei Chiu <visitorckw@gmail.com>
[ Upstream commit f04e2ad394e2755d0bb2d858ecb5598718bf00d5 ]
When netfilter has no entry to display, qsort is called with
qsort(NULL, 0, ...). This results in undefined behavior, as UBSan
reports:
net.c:827:2: runtime error: null pointer passed as argument 1, which is declared to never be null
Although the C standard does not explicitly state whether calling qsort
with a NULL pointer when the size is 0 constitutes undefined behavior,
Section 7.1.4 of the C standard (Use of library functions) mentions:
"Each of the following statements applies unless explicitly stated
otherwise in the detailed descriptions that follow: If an argument to a
function has an invalid value (such as a value outside the domain of
the function, or a pointer outside the address space of the program, or
a null pointer, or a pointer to non-modifiable storage when the
corresponding parameter is not const-qualified) or a type (after
promotion) not expected by a function with variable number of
arguments, the behavior is undefined."
To avoid this, add an early return when nf_link_info is NULL to prevent
calling qsort with a NULL pointer.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240910150207.3179306-1-visitorckw@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/bpf/bpftool/net.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index 968714b4c3d45..0ad684e810f34 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -824,6 +824,9 @@ static void show_link_netfilter(void)
nf_link_count++;
}
+ if (!nf_link_info)
+ return;
+
qsort(nf_link_info, nf_link_count, sizeof(*nf_link_info), netfilter_link_compar);
for (id = 0; id < nf_link_count; id++) {
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH AUTOSEL 6.10 06/70] bpftool: Fix undefined behavior caused by shifting into the sign bit
2024-10-04 18:19 [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
` (3 preceding siblings ...)
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 05/70] bpftool: Fix undefined behavior in qsort(NULL, 0, ...) Sasha Levin
@ 2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 10/70] bpf, x64: Fix a jit convergence issue Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 19/70] bpf: Prevent tail call between progs attached to different hooks Sasha Levin
6 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-10-04 18:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuan-Wei Chiu, Andrii Nakryiko, Quentin Monnet, Sasha Levin, ast,
daniel, bpf
From: Kuan-Wei Chiu <visitorckw@gmail.com>
[ Upstream commit 4cdc0e4ce5e893bc92255f5f734d983012f2bc2e ]
Replace shifts of '1' with '1U' in bitwise operations within
__show_dev_tc_bpf() to prevent undefined behavior caused by shifting
into the sign bit of a signed integer. By using '1U', the operations
are explicitly performed on unsigned integers, avoiding potential
integer overflow or sign-related issues.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240908140009.3149781-1-visitorckw@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/bpf/bpftool/net.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index 0ad684e810f34..0f2106218e1f0 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -482,9 +482,9 @@ static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev,
if (prog_flags[i] || json_output) {
NET_START_ARRAY("prog_flags", "%s ");
for (j = 0; prog_flags[i] && j < 32; j++) {
- if (!(prog_flags[i] & (1 << j)))
+ if (!(prog_flags[i] & (1U << j)))
continue;
- NET_DUMP_UINT_ONLY(1 << j);
+ NET_DUMP_UINT_ONLY(1U << j);
}
NET_END_ARRAY("");
}
@@ -493,9 +493,9 @@ static void __show_dev_tc_bpf(const struct ip_devname_ifindex *dev,
if (link_flags[i] || json_output) {
NET_START_ARRAY("link_flags", "%s ");
for (j = 0; link_flags[i] && j < 32; j++) {
- if (!(link_flags[i] & (1 << j)))
+ if (!(link_flags[i] & (1U << j)))
continue;
- NET_DUMP_UINT_ONLY(1 << j);
+ NET_DUMP_UINT_ONLY(1U << j);
}
NET_END_ARRAY("");
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH AUTOSEL 6.10 10/70] bpf, x64: Fix a jit convergence issue
2024-10-04 18:19 [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
` (4 preceding siblings ...)
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 06/70] bpftool: Fix undefined behavior caused by shifting into the sign bit Sasha Levin
@ 2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 19/70] bpf: Prevent tail call between progs attached to different hooks Sasha Levin
6 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-10-04 18:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Yonghong Song, Daniel Hodges, Alexei Starovoitov, Sasha Levin,
davem, dsahern, daniel, andrii, tglx, mingo, bp, dave.hansen, x86,
netdev, bpf
From: Yonghong Song <yonghong.song@linux.dev>
[ Upstream commit c8831bdbfbab672c006a18006d36932a494b2fd6 ]
Daniel Hodges reported a jit error when playing with a sched-ext program.
The error message is:
unexpected jmp_cond padding: -4 bytes
But further investigation shows the error is actual due to failed
convergence. The following are some analysis:
...
pass4, final_proglen=4391:
...
20e: 48 85 ff test rdi,rdi
211: 74 7d je 0x290
213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
289: 48 85 ff test rdi,rdi
28c: 74 17 je 0x2a5
28e: e9 7f ff ff ff jmp 0x212
293: bf 03 00 00 00 mov edi,0x3
Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
and insn at 0x28e is 5-byte jmp insn with offset -129.
pass5, final_proglen=4392:
...
20e: 48 85 ff test rdi,rdi
211: 0f 84 80 00 00 00 je 0x297
217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
28d: 48 85 ff test rdi,rdi
290: 74 1a je 0x2ac
292: eb 84 jmp 0x218
294: bf 03 00 00 00 mov edi,0x3
Note that insn at 0x211 is 6-byte cond jump insn now since its offset
becomes 0x80 based on previous round (0x293 - 0x213 = 0x80). At the same
time, insn at 0x292 is a 2-byte insn since its offset is -124.
pass6 will repeat the same code as in pass4. pass7 will repeat the same
code as in pass5, and so on. This will prevent eventual convergence.
Passes 1-14 are with padding = 0. At pass15, padding is 1 and related
insn looks like:
211: 0f 84 80 00 00 00 je 0x297
217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
24d: 48 85 d2 test rdx,rdx
The similar code in pass14:
211: 74 7d je 0x290
213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
249: 48 85 d2 test rdx,rdx
24c: 74 21 je 0x26f
24e: 48 01 f7 add rdi,rsi
...
Before generating the following insn,
250: 74 21 je 0x273
"padding = 1" enables some checking to ensure nops is either 0 or 4
where
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
nops = INSN_SZ_DIFF - 2
In this specific case,
addrs[i] = 0x24e // from pass14
addrs[i-1] = 0x24d // from pass15
prog - temp = 3 // from 'test rdx,rdx' in pass15
so
nops = -4
and this triggers the failure.
To fix the issue, we need to break cycles of je <-> jmp. For example,
in the above case, we have
211: 74 7d je 0x290
the offset is 0x7d. If 2-byte je insn is generated only if
the offset is less than 0x7d (<= 0x7c), the cycle can be
break and we can achieve the convergence.
I did some study on other cases like je <-> je, jmp <-> je and
jmp <-> jmp which may cause cycles. Those cases are not from actual
reproducible cases since it is pretty hard to construct a test case
for them. the results show that the offset <= 0x7b (0x7b = 123) should
be enough to cover all cases. This patch added a new helper to generate 8-bit
cond/uncond jmp insns only if the offset range is [-128, 123].
Reported-by: Daniel Hodges <hodgesd@meta.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221251.37109-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/net/bpf_jit_comp.c | 54 +++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 5159c7a229229..dd60500497ead 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -64,6 +64,56 @@ static bool is_imm8(int value)
return value <= 127 && value >= -128;
}
+/*
+ * Let us limit the positive offset to be <= 123.
+ * This is to ensure eventual jit convergence For the following patterns:
+ * ...
+ * pass4, final_proglen=4391:
+ * ...
+ * 20e: 48 85 ff test rdi,rdi
+ * 211: 74 7d je 0x290
+ * 213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
+ * ...
+ * 289: 48 85 ff test rdi,rdi
+ * 28c: 74 17 je 0x2a5
+ * 28e: e9 7f ff ff ff jmp 0x212
+ * 293: bf 03 00 00 00 mov edi,0x3
+ * Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
+ * and insn at 0x28e is 5-byte jmp insn with offset -129.
+ *
+ * pass5, final_proglen=4392:
+ * ...
+ * 20e: 48 85 ff test rdi,rdi
+ * 211: 0f 84 80 00 00 00 je 0x297
+ * 217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
+ * ...
+ * 28d: 48 85 ff test rdi,rdi
+ * 290: 74 1a je 0x2ac
+ * 292: eb 84 jmp 0x218
+ * 294: bf 03 00 00 00 mov edi,0x3
+ * Note that insn at 0x211 is 6-byte cond jump insn now since its offset
+ * becomes 0x80 based on previous round (0x293 - 0x213 = 0x80).
+ * At the same time, insn at 0x292 is a 2-byte insn since its offset is
+ * -124.
+ *
+ * pass6 will repeat the same code as in pass4 and this will prevent
+ * eventual convergence.
+ *
+ * To fix this issue, we need to break je (2->6 bytes) <-> jmp (5->2 bytes)
+ * cycle in the above. In the above example je offset <= 0x7c should work.
+ *
+ * For other cases, je <-> je needs offset <= 0x7b to avoid no convergence
+ * issue. For jmp <-> je and jmp <-> jmp cases, jmp offset <= 0x7c should
+ * avoid no convergence issue.
+ *
+ * Overall, let us limit the positive offset for 8bit cond/uncond jmp insn
+ * to maximum 123 (0x7b). This way, the jit pass can eventually converge.
+ */
+static bool is_imm8_jmp_offset(int value)
+{
+ return value <= 123 && value >= -128;
+}
+
static bool is_simm32(s64 value)
{
return value == (s64)(s32)value;
@@ -2191,7 +2241,7 @@ st: if (is_imm8(insn->off))
return -EFAULT;
}
jmp_offset = addrs[i + insn->off] - addrs[i];
- if (is_imm8(jmp_offset)) {
+ if (is_imm8_jmp_offset(jmp_offset)) {
if (jmp_padding) {
/* To keep the jmp_offset valid, the extra bytes are
* padded before the jump insn, so we subtract the
@@ -2273,7 +2323,7 @@ st: if (is_imm8(insn->off))
break;
}
emit_jmp:
- if (is_imm8(jmp_offset)) {
+ if (is_imm8_jmp_offset(jmp_offset)) {
if (jmp_padding) {
/* To avoid breaking jmp_offset, the extra bytes
* are padded before the actual jmp insn, so
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH AUTOSEL 6.10 19/70] bpf: Prevent tail call between progs attached to different hooks
2024-10-04 18:19 [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
` (5 preceding siblings ...)
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 10/70] bpf, x64: Fix a jit convergence issue Sasha Levin
@ 2024-10-04 18:20 ` Sasha Levin
6 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2024-10-04 18:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Xu Kuohai, Alexei Starovoitov, Andrii Nakryiko, Sasha Levin,
daniel, davem, kuba, hawk, john.fastabend, bpf, netdev
From: Xu Kuohai <xukuohai@huawei.com>
[ Upstream commit 28ead3eaabc16ecc907cfb71876da028080f6356 ]
bpf progs can be attached to kernel functions, and the attached functions
can take different parameters or return different return values. If
prog attached to one kernel function tail calls prog attached to another
kernel function, the ctx access or return value verification could be
bypassed.
For example, if prog1 is attached to func1 which takes only 1 parameter
and prog2 is attached to func2 which takes two parameters. Since verifier
assumes the bpf ctx passed to prog2 is constructed based on func2's
prototype, verifier allows prog2 to access the second parameter from
the bpf ctx passed to it. The problem is that verifier does not prevent
prog1 from passing its bpf ctx to prog2 via tail call. In this case,
the bpf ctx passed to prog2 is constructed from func1 instead of func2,
that is, the assumption for ctx access verification is bypassed.
Another example, if BPF LSM prog1 is attached to hook file_alloc_security,
and BPF LSM prog2 is attached to hook bpf_lsm_audit_rule_known. Verifier
knows the return value rules for these two hooks, e.g. it is legal for
bpf_lsm_audit_rule_known to return positive number 1, and it is illegal
for file_alloc_security to return positive number. So verifier allows
prog2 to return positive number 1, but does not allow prog1 to return
positive number. The problem is that verifier does not prevent prog1
from calling prog2 via tail call. In this case, prog2's return value 1
will be used as the return value for prog1's hook file_alloc_security.
That is, the return value rule is bypassed.
This patch adds restriction for tail call to prevent such bypasses.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20240719110059.797546-4-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/bpf.h | 1 +
kernel/bpf/core.c | 21 ++++++++++++++++++---
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5e694a308081a..5f77a04736a0e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -294,6 +294,7 @@ struct bpf_map {
* same prog type, JITed flag and xdp_has_frags flag.
*/
struct {
+ const struct btf_type *attach_func_proto;
spinlock_t lock;
enum bpf_prog_type type;
bool jited;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 695a0fb2cd4df..5ccb43adf13db 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2303,6 +2303,7 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
{
enum bpf_prog_type prog_type = resolve_prog_type(fp);
bool ret;
+ struct bpf_prog_aux *aux = fp->aux;
if (fp->kprobe_override)
return false;
@@ -2312,7 +2313,7 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
* in the case of devmap and cpumap). Until device checks
* are implemented, prohibit adding dev-bound programs to program maps.
*/
- if (bpf_prog_is_dev_bound(fp->aux))
+ if (bpf_prog_is_dev_bound(aux))
return false;
spin_lock(&map->owner.lock);
@@ -2322,12 +2323,26 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
*/
map->owner.type = prog_type;
map->owner.jited = fp->jited;
- map->owner.xdp_has_frags = fp->aux->xdp_has_frags;
+ map->owner.xdp_has_frags = aux->xdp_has_frags;
+ map->owner.attach_func_proto = aux->attach_func_proto;
ret = true;
} else {
ret = map->owner.type == prog_type &&
map->owner.jited == fp->jited &&
- map->owner.xdp_has_frags == fp->aux->xdp_has_frags;
+ map->owner.xdp_has_frags == aux->xdp_has_frags;
+ if (ret &&
+ map->owner.attach_func_proto != aux->attach_func_proto) {
+ switch (prog_type) {
+ case BPF_PROG_TYPE_TRACING:
+ case BPF_PROG_TYPE_LSM:
+ case BPF_PROG_TYPE_EXT:
+ case BPF_PROG_TYPE_STRUCT_OPS:
+ ret = false;
+ break;
+ default:
+ break;
+ }
+ }
}
spin_unlock(&map->owner.lock);
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-10-04 18:22 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-04 18:19 [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 02/70] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 03/70] bpf: Fix a sdiv overflow issue Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 04/70] bpf: Check percpu map value size first Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 05/70] bpftool: Fix undefined behavior in qsort(NULL, 0, ...) Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 06/70] bpftool: Fix undefined behavior caused by shifting into the sign bit Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 10/70] bpf, x64: Fix a jit convergence issue Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 19/70] bpf: Prevent tail call between progs attached to different hooks Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox