* [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue
@ 2024-09-12 3:59 Yonghong Song
2024-09-12 3:59 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add tests for sdiv/smod overflow cases Yonghong Song
2024-09-12 18:17 ` [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue Andrii Nakryiko
0 siblings, 2 replies; 6+ messages in thread
From: Yonghong Song @ 2024-09-12 3:59 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
Martin KaFai Lau, Zac Ecob
Zac Ecob reported a problem where a bpf program may cause kernel crash due
to the following error:
Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
The failure is due to the below signed divide:
LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808.
LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808,
but it is impossible since for 64-bit system, the maximum positive
number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will
cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is
LLONG_MIN.
Further investigation found all the following sdiv/smod cases may trigger
an exception when bpf program is running on x86_64 platform:
- LLONG_MIN/-1 for 64bit operation
- INT_MIN/-1 for 32bit operation
- LLONG_MIN%-1 for 64bit operation
- INT_MIN%-1 for 32bit operation
where -1 can be an immediate or in a register.
On arm64, there are no exceptions:
- LLONG_MIN/-1 = LLONG_MIN
- INT_MIN/-1 = INT_MIN
- LLONG_MIN%-1 = 0
- INT_MIN%-1 = 0
where -1 can be an immediate or in a register.
Insn patching is needed to handle the above cases and the patched codes
produced results aligned with above arm64 result.
[1] https://lore.kernel.org/bpf/tPJLTEh7S_DxFEqAI2Ji5MBSoZVg7_G-Py2iaZpAaWtM961fFTWtsnlzwvTbzBzaUzwQAoNATXKUlt0LZOFgnDcIyKCswAnAGdUF3LBrhGQ=@protonmail.com/
Reported-by: Zac Ecob <zacecob@protonmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 80 insertions(+), 4 deletions(-)
Changelogs:
v1 -> v2:
- Handle more crash cases like 32bit operation and modules.
- Add more tests to test new cases.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f35b80c16cda..ad7f51302c70 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -20499,13 +20499,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
/* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
- /* Make divide-by-zero exceptions impossible. */
+ /* Make sdiv/smod divide-by-minus-one exceptions impossible. */
+ if ((insn->code == (BPF_ALU64 | BPF_MOD | BPF_K) ||
+ insn->code == (BPF_ALU64 | BPF_DIV | BPF_K) ||
+ insn->code == (BPF_ALU | BPF_MOD | BPF_K) ||
+ insn->code == (BPF_ALU | BPF_DIV | BPF_K)) &&
+ insn->off == 1 && insn->imm == -1) {
+ bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
+ bool isdiv = BPF_OP(insn->code) == BPF_DIV;
+ struct bpf_insn *patchlet;
+ struct bpf_insn chk_and_div[] = {
+ BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
+ BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
+ 0, 0, 0),
+ };
+ struct bpf_insn chk_and_mod[] = {
+ BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
+ };
+
+ patchlet = isdiv ? chk_and_div : chk_and_mod;
+ cnt = isdiv ? ARRAY_SIZE(chk_and_div) : ARRAY_SIZE(chk_and_mod);
+
+ new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
+ if (!new_prog)
+ return -ENOMEM;
+
+ delta += cnt - 1;
+ env->prog = prog = new_prog;
+ insn = new_prog->insnsi + i + delta;
+ goto next_insn;
+ }
+
+ /* Make divide-by-zero and divide-by-minus-one exceptions impossible. */
if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
bool isdiv = BPF_OP(insn->code) == BPF_DIV;
+ bool is_sdiv = isdiv && insn->off == 1;
+ bool is_smod = !isdiv && insn->off == 1;
struct bpf_insn *patchlet;
struct bpf_insn chk_and_div[] = {
/* [R,W]x div 0 -> 0 */
@@ -20525,10 +20558,53 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
BPF_JMP_IMM(BPF_JA, 0, 0, 1),
BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
};
+ struct bpf_insn chk_and_sdiv[] = {
+ /* [R,W]x sdiv 0 -> 0 */
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JNE | BPF_K, insn->src_reg,
+ 0, 2, 0),
+ BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 4),
+ /* LLONG_MIN sdiv -1 -> LLONG_MIN
+ * INT_MIN sdiv -1 -> INT_MIN
+ */
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JNE | BPF_K, insn->src_reg,
+ 0, 2, -1),
+ /* BPF_NEG(LLONG_MIN) == -LLONG_MIN == LLONG_MIN */
+ BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
+ BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
+ 0, 0, 0),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+ *insn,
+ };
+ struct bpf_insn chk_and_smod[] = {
+ /* [R,W]x mod 0 -> [R,W]x */
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JNE | BPF_K, insn->src_reg,
+ 0, 2, 0),
+ BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 4),
+ /* [R,W]x mod -1 -> 0 */
+ BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
+ BPF_JNE | BPF_K, insn->src_reg,
+ 0, 2, -1),
+ BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+ *insn,
+ };
- patchlet = isdiv ? chk_and_div : chk_and_mod;
- cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
- ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
+ if (is_sdiv) {
+ patchlet = chk_and_sdiv;
+ cnt = ARRAY_SIZE(chk_and_sdiv);
+ } else if (is_smod) {
+ patchlet = chk_and_smod;
+ cnt = ARRAY_SIZE(chk_and_smod);
+ } else {
+ patchlet = isdiv ? chk_and_div : chk_and_mod;
+ cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
+ ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
+ }
new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
if (!new_prog)
--
2.43.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH bpf-next v2 2/2] selftests/bpf: Add tests for sdiv/smod overflow cases
2024-09-12 3:59 [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue Yonghong Song
@ 2024-09-12 3:59 ` Yonghong Song
2024-09-12 18:17 ` [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue Andrii Nakryiko
1 sibling, 0 replies; 6+ messages in thread
From: Yonghong Song @ 2024-09-12 3:59 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
Martin KaFai Lau
Subtests are added to exercise the patched code which handles
- LLONG_MIN/-1
- INT_MIN/-1
- LLONG_MIN%-1
- INT_MIN%-1
where -1 could be an immediate or in a register.
Without the previous patch, all these cases will crash the kernel on
x86_64 platform.
Additional tests are added to use small values (e.g. -5/-1, 5%-1, etc.)
in order to exercise the additional logic with patched insns.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
.../selftests/bpf/progs/verifier_sdiv.c | 431 ++++++++++++++++++
1 file changed, 431 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_sdiv.c b/tools/testing/selftests/bpf/progs/verifier_sdiv.c
index 2a2271cf0294..eebe34fecc4d 100644
--- a/tools/testing/selftests/bpf/progs/verifier_sdiv.c
+++ b/tools/testing/selftests/bpf/progs/verifier_sdiv.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
+#include <limits.h>
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"
@@ -770,6 +771,436 @@ __naked void smod64_zero_divisor(void)
" ::: __clobber_all);
}
+SEC("socket")
+__description("SDIV64, overflow r/r, LLONG_MIN/-1")
+__success __retval(1)
+__arch_x86_64
+__xlated("0: r2 = 0x8000000000000000")
+__xlated("2: r3 = -1")
+__xlated("3: r4 = r2")
+__xlated("4: if r3 != 0x0 goto pc+2")
+__xlated("5: w2 ^= w2")
+__xlated("6: goto pc+4")
+__xlated("7: if r3 != 0xffffffff goto pc+2")
+__xlated("8: r2 = -r2")
+__xlated("9: goto pc+1")
+__xlated("10: r2 s/= r3")
+__xlated("11: r0 = 0")
+__xlated("12: if r2 != r4 goto pc+1")
+__xlated("13: r0 = 1")
+__xlated("14: exit")
+__naked void sdiv64_overflow_rr(void)
+{
+ asm volatile (" \
+ r2 = %[llong_min] ll; \
+ r3 = -1; \
+ r4 = r2; \
+ r2 s/= r3; \
+ r0 = 0; \
+ if r2 != r4 goto +1; \
+ r0 = 1; \
+ exit; \
+" :
+ : __imm_const(llong_min, LLONG_MIN)
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SDIV64, r/r, small_val/-1")
+__success __retval(-5)
+__arch_x86_64
+__xlated("0: r2 = 5")
+__xlated("1: r3 = -1")
+__xlated("2: if r3 != 0x0 goto pc+2")
+__xlated("3: w2 ^= w2")
+__xlated("4: goto pc+4")
+__xlated("5: if r3 != 0xffffffff goto pc+2")
+__xlated("6: r2 = -r2")
+__xlated("7: goto pc+1")
+__xlated("8: r2 s/= r3")
+__xlated("9: r0 = r2")
+__xlated("10: exit")
+__naked void sdiv64_rr_divisor_neg_1(void)
+{
+ asm volatile (" \
+ r2 = 5; \
+ r3 = -1; \
+ r2 s/= r3; \
+ r0 = r2; \
+ exit; \
+" :
+ :
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SDIV64, overflow r/i, LLONG_MIN/-1")
+__success __retval(1)
+__arch_x86_64
+__xlated("0: r2 = 0x8000000000000000")
+__xlated("2: r4 = r2")
+__xlated("3: r2 = -r2")
+__xlated("4: r0 = 0")
+__xlated("5: if r2 != r4 goto pc+1")
+__xlated("6: r0 = 1")
+__xlated("7: exit")
+__naked void sdiv64_overflow_ri(void)
+{
+ asm volatile (" \
+ r2 = %[llong_min] ll; \
+ r4 = r2; \
+ r2 s/= -1; \
+ r0 = 0; \
+ if r2 != r4 goto +1; \
+ r0 = 1; \
+ exit; \
+" :
+ : __imm_const(llong_min, LLONG_MIN)
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SDIV64, r/i, small_val/-1")
+__success __retval(-5)
+__arch_x86_64
+__xlated("0: r2 = 5")
+__xlated("1: r4 = r2")
+__xlated("2: r2 = -r2")
+__xlated("3: r0 = r2")
+__xlated("4: exit")
+__naked void sdiv64_ri_divisor_neg_1(void)
+{
+ asm volatile (" \
+ r2 = 5; \
+ r4 = r2; \
+ r2 s/= -1; \
+ r0 = r2; \
+ exit; \
+" :
+ :
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SDIV32, overflow r/r, INT_MIN/-1")
+__success __retval(1)
+__arch_x86_64
+__xlated("0: w2 = -2147483648")
+__xlated("1: w3 = -1")
+__xlated("2: w4 = w2")
+__xlated("3: if w3 != 0x0 goto pc+2")
+__xlated("4: w2 ^= w2")
+__xlated("5: goto pc+4")
+__xlated("6: if w3 != 0xffffffff goto pc+2")
+__xlated("7: w2 = -w2")
+__xlated("8: goto pc+1")
+__xlated("9: w2 s/= w3")
+__xlated("10: r0 = 0")
+__xlated("11: if w2 != w4 goto pc+1")
+__xlated("12: r0 = 1")
+__xlated("13: exit")
+__naked void sdiv32_overflow_rr(void)
+{
+ asm volatile (" \
+ w2 = %[int_min]; \
+ w3 = -1; \
+ w4 = w2; \
+ w2 s/= w3; \
+ r0 = 0; \
+ if w2 != w4 goto +1; \
+ r0 = 1; \
+ exit; \
+" :
+ : __imm_const(int_min, INT_MIN)
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SDIV32, r/r, small_val/-1")
+__success __retval(5)
+__arch_x86_64
+__xlated("0: w2 = -5")
+__xlated("1: w3 = -1")
+__xlated("2: w4 = w2")
+__xlated("3: if w3 != 0x0 goto pc+2")
+__xlated("4: w2 ^= w2")
+__xlated("5: goto pc+4")
+__xlated("6: if w3 != 0xffffffff goto pc+2")
+__xlated("7: w2 = -w2")
+__xlated("8: goto pc+1")
+__xlated("9: w2 s/= w3")
+__xlated("10: w0 = w2")
+__xlated("11: exit")
+__naked void sdiv32_rr_divisor_neg_1(void)
+{
+ asm volatile (" \
+ w2 = -5; \
+ w3 = -1; \
+ w4 = w2; \
+ w2 s/= w3; \
+ w0 = w2; \
+ exit; \
+" :
+ :
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SDIV32, overflow r/i, INT_MIN/-1")
+__success __retval(1)
+__arch_x86_64
+__xlated("0: w2 = -2147483648")
+__xlated("1: w4 = w2")
+__xlated("2: w2 = -w2")
+__xlated("3: r0 = 0")
+__xlated("4: if w2 != w4 goto pc+1")
+__xlated("5: r0 = 1")
+__xlated("6: exit")
+__naked void sdiv32_overflow_ri(void)
+{
+ asm volatile (" \
+ w2 = %[int_min]; \
+ w4 = w2; \
+ w2 s/= -1; \
+ r0 = 0; \
+ if w2 != w4 goto +1; \
+ r0 = 1; \
+ exit; \
+" :
+ : __imm_const(int_min, INT_MIN)
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SDIV32, r/i, small_val/-1")
+__success __retval(-5)
+__arch_x86_64
+__xlated("0: w2 = 5")
+__xlated("1: w4 = w2")
+__xlated("2: w2 = -w2")
+__xlated("3: w0 = w2")
+__xlated("4: exit")
+__naked void sdiv32_ri_divisor_neg_1(void)
+{
+ asm volatile (" \
+ w2 = 5; \
+ w4 = w2; \
+ w2 s/= -1; \
+ w0 = w2; \
+ exit; \
+" :
+ :
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SMOD64, overflow r/r, LLONG_MIN/-1")
+__success __retval(0)
+__arch_x86_64
+__xlated("0: r2 = 0x8000000000000000")
+__xlated("2: r3 = -1")
+__xlated("3: r4 = r2")
+__xlated("4: if r3 != 0x0 goto pc+2")
+__xlated("5: w2 = w2")
+__xlated("6: goto pc+4")
+__xlated("7: if r3 != 0xffffffff goto pc+2")
+__xlated("8: w2 ^= w2")
+__xlated("9: goto pc+1")
+__xlated("10: r2 s%= r3")
+__xlated("11: r0 = r2")
+__xlated("12: exit")
+__naked void smod64_overflow_rr(void)
+{
+ asm volatile (" \
+ r2 = %[llong_min] ll; \
+ r3 = -1; \
+ r4 = r2; \
+ r2 s%%= r3; \
+ r0 = r2; \
+ exit; \
+" :
+ : __imm_const(llong_min, LLONG_MIN)
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SMOD64, r/r, small_val/-1")
+__success __retval(0)
+__arch_x86_64
+__xlated("0: r2 = 5")
+__xlated("1: r3 = -1")
+__xlated("2: r4 = r2")
+__xlated("3: if r3 != 0x0 goto pc+2")
+__xlated("4: w2 = w2")
+__xlated("5: goto pc+4")
+__xlated("6: if r3 != 0xffffffff goto pc+2")
+__xlated("7: w2 ^= w2")
+__xlated("8: goto pc+1")
+__xlated("9: r2 s%= r3")
+__xlated("10: r0 = r2")
+__xlated("11: exit")
+__naked void smod64_rr_divisor_neg_1(void)
+{
+ asm volatile (" \
+ r2 = 5; \
+ r3 = -1; \
+ r4 = r2; \
+ r2 s%%= r3; \
+ r0 = r2; \
+ exit; \
+" :
+ :
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SMOD64, overflow r/i, LLONG_MIN/-1")
+__success __retval(0)
+__arch_x86_64
+__xlated("0: r2 = 0x8000000000000000")
+__xlated("2: r4 = r2")
+__xlated("3: w2 ^= w2")
+__xlated("4: r0 = r2")
+__xlated("5: exit")
+__naked void smod64_overflow_ri(void)
+{
+ asm volatile (" \
+ r2 = %[llong_min] ll; \
+ r4 = r2; \
+ r2 s%%= -1; \
+ r0 = r2; \
+ exit; \
+" :
+ : __imm_const(llong_min, LLONG_MIN)
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SMOD64, r/i, small_val/-1")
+__success __retval(0)
+__arch_x86_64
+__xlated("0: r2 = 5")
+__xlated("1: r4 = r2")
+__xlated("2: w2 ^= w2")
+__xlated("3: r0 = r2")
+__xlated("4: exit")
+__naked void smod64_ri_divisor_neg_1(void)
+{
+ asm volatile (" \
+ r2 = 5; \
+ r4 = r2; \
+ r2 s%%= -1; \
+ r0 = r2; \
+ exit; \
+" :
+ :
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SMOD32, overflow r/r, INT_MIN/-1")
+__success __retval(0)
+__arch_x86_64
+__xlated("0: w2 = -2147483648")
+__xlated("1: w3 = -1")
+__xlated("2: w4 = w2")
+__xlated("3: if w3 != 0x0 goto pc+2")
+__xlated("4: w2 = w2")
+__xlated("5: goto pc+4")
+__xlated("6: if w3 != 0xffffffff goto pc+2")
+__xlated("7: w2 ^= w2")
+__xlated("8: goto pc+1")
+__xlated("9: w2 s%= w3")
+__xlated("10: r0 = r2")
+__xlated("11: exit")
+__naked void smod32_overflow_rr(void)
+{
+ asm volatile (" \
+ w2 = %[int_min]; \
+ w3 = -1; \
+ w4 = w2; \
+ w2 s%%= w3; \
+ r0 = r2; \
+ exit; \
+" :
+ : __imm_const(int_min, INT_MIN)
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SMOD32, r/r, small_val/-1")
+__success __retval(0)
+__arch_x86_64
+__xlated("0: w2 = -5")
+__xlated("1: w3 = -1")
+__xlated("2: w4 = w2")
+__xlated("3: if w3 != 0x0 goto pc+2")
+__xlated("4: w2 = w2")
+__xlated("5: goto pc+4")
+__xlated("6: if w3 != 0xffffffff goto pc+2")
+__xlated("7: w2 ^= w2")
+__xlated("8: goto pc+1")
+__xlated("9: w2 s%= w3")
+__xlated("10: r0 = r2")
+__xlated("11: exit")
+__naked void smod32_rr_divisor_neg_1(void)
+{
+ asm volatile (" \
+ w2 = -5; \
+ w3 = -1; \
+ w4 = w2; \
+ w2 s%%= w3; \
+ r0 = r2; \
+ exit; \
+" :
+ :
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SMOD32, overflow r/i, INT_MIN/-1")
+__success __retval(0)
+__arch_x86_64
+__xlated("0: w2 = -2147483648")
+__xlated("1: w4 = w2")
+__xlated("2: w2 ^= w2")
+__xlated("3: r0 = r2")
+__xlated("4: exit")
+__naked void smod32_overflow_ri(void)
+{
+ asm volatile (" \
+ w2 = %[int_min]; \
+ w4 = w2; \
+ w2 s%%= -1; \
+ r0 = r2; \
+ exit; \
+" :
+ : __imm_const(int_min, INT_MIN)
+ : __clobber_all);
+}
+
+SEC("socket")
+__description("SMOD32, r/i, small_val/-1")
+__success __retval(0)
+__arch_x86_64
+__xlated("0: w2 = 5")
+__xlated("1: w4 = w2")
+__xlated("2: w2 ^= w2")
+__xlated("3: w0 = w2")
+__xlated("4: exit")
+__naked void smod32_ri_divisor_neg_1(void)
+{
+ asm volatile (" \
+ w2 = 5; \
+ w4 = w2; \
+ w2 s%%= -1; \
+ w0 = w2; \
+ exit; \
+" :
+ :
+ : __clobber_all);
+}
+
#else
SEC("socket")
--
2.43.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue
2024-09-12 3:59 [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue Yonghong Song
2024-09-12 3:59 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add tests for sdiv/smod overflow cases Yonghong Song
@ 2024-09-12 18:17 ` Andrii Nakryiko
2024-09-12 18:19 ` Andrii Nakryiko
2024-09-12 22:53 ` Yonghong Song
1 sibling, 2 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2024-09-12 18:17 UTC (permalink / raw)
To: Yonghong Song
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
kernel-team, Martin KaFai Lau, Zac Ecob
On Wed, Sep 11, 2024 at 9:00 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Zac Ecob reported a problem where a bpf program may cause kernel crash due
> to the following error:
> Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
>
> The failure is due to the below signed divide:
> LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808.
> LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808,
> but it is impossible since for 64-bit system, the maximum positive
> number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will
> cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is
> LLONG_MIN.
>
> Further investigation found all the following sdiv/smod cases may trigger
> an exception when bpf program is running on x86_64 platform:
> - LLONG_MIN/-1 for 64bit operation
> - INT_MIN/-1 for 32bit operation
> - LLONG_MIN%-1 for 64bit operation
> - INT_MIN%-1 for 32bit operation
> where -1 can be an immediate or in a register.
>
> On arm64, there are no exceptions:
> - LLONG_MIN/-1 = LLONG_MIN
> - INT_MIN/-1 = INT_MIN
> - LLONG_MIN%-1 = 0
> - INT_MIN%-1 = 0
> where -1 can be an immediate or in a register.
>
> Insn patching is needed to handle the above cases and the patched codes
> produced results aligned with above arm64 result.
>
> [1] https://lore.kernel.org/bpf/tPJLTEh7S_DxFEqAI2Ji5MBSoZVg7_G-Py2iaZpAaWtM961fFTWtsnlzwvTbzBzaUzwQAoNATXKUlt0LZOFgnDcIyKCswAnAGdUF3LBrhGQ=@protonmail.com/
>
> Reported-by: Zac Ecob <zacecob@protonmail.com>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
> kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 80 insertions(+), 4 deletions(-)
>
> Changelogs:
> v1 -> v2:
> - Handle more crash cases like 32bit operation and modules.
> - Add more tests to test new cases.
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index f35b80c16cda..ad7f51302c70 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -20499,13 +20499,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> /* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
> insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
>
> - /* Make divide-by-zero exceptions impossible. */
> + /* Make sdiv/smod divide-by-minus-one exceptions impossible. */
> + if ((insn->code == (BPF_ALU64 | BPF_MOD | BPF_K) ||
> + insn->code == (BPF_ALU64 | BPF_DIV | BPF_K) ||
> + insn->code == (BPF_ALU | BPF_MOD | BPF_K) ||
> + insn->code == (BPF_ALU | BPF_DIV | BPF_K)) &&
> + insn->off == 1 && insn->imm == -1) {
> + bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
> + bool isdiv = BPF_OP(insn->code) == BPF_DIV;
> + struct bpf_insn *patchlet;
> + struct bpf_insn chk_and_div[] = {
> + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
> + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
> + 0, 0, 0),
> + };
> + struct bpf_insn chk_and_mod[] = {
> + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> + };
> +
> + patchlet = isdiv ? chk_and_div : chk_and_mod;
nit: "chk_and_" part in the name is misleading, it's more like
"safe_div" and "safe_mod". Oh, and it's "sdiv" and "smod" specific, so
probably not a bad idea to have that in the name as well.
> + cnt = isdiv ? ARRAY_SIZE(chk_and_div) : ARRAY_SIZE(chk_and_mod);
> +
> + new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
> + if (!new_prog)
> + return -ENOMEM;
> +
> + delta += cnt - 1;
> + env->prog = prog = new_prog;
> + insn = new_prog->insnsi + i + delta;
> + goto next_insn;
> + }
> +
> + /* Make divide-by-zero and divide-by-minus-one exceptions impossible. */
> if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
> insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
> insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
> insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
> bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
> bool isdiv = BPF_OP(insn->code) == BPF_DIV;
> + bool is_sdiv = isdiv && insn->off == 1;
> + bool is_smod = !isdiv && insn->off == 1;
> struct bpf_insn *patchlet;
> struct bpf_insn chk_and_div[] = {
> /* [R,W]x div 0 -> 0 */
> @@ -20525,10 +20558,53 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
> };
> + struct bpf_insn chk_and_sdiv[] = {
> + /* [R,W]x sdiv 0 -> 0 */
> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> + BPF_JNE | BPF_K, insn->src_reg,
> + 0, 2, 0),
> + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> + BPF_JMP_IMM(BPF_JA, 0, 0, 4),
> + /* LLONG_MIN sdiv -1 -> LLONG_MIN
> + * INT_MIN sdiv -1 -> INT_MIN
> + */
> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> + BPF_JNE | BPF_K, insn->src_reg,
> + 0, 2, -1),
> + /* BPF_NEG(LLONG_MIN) == -LLONG_MIN == LLONG_MIN */
> + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
> + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
> + 0, 0, 0),
> + BPF_JMP_IMM(BPF_JA, 0, 0, 1),
I don't know how much it actually matters, but it feels like common
safe case should be as straight-line-executed as possible, no?
So maybe it's better to rearrange to roughly this (where rX is the
divisor register):
if rX == 0 goto L1
if rX == -1 goto L2
rY /= rX
goto L3
L1: /* zero case */
rY = 0 /* fallthrough, negation doesn't hurt, but less jumping */
L2: /* negative one case (or zero) */
rY = -rY
L3:
... the rest of the program code ...
Those two branches for common case are still annoyingly inefficient, I
wonder if we should do
rX += 1 /* [-1, 0] -> [0, 1]
if rX <=(unsigned) 1 goto L1
rX -= 1 /* restore original divisor */
rY /= rX /* common case */
goto L3
L1:
if rX == 0 goto L2 /* jump if originally -1 */
rY = 0 /* division by zero case */
L2: /* fallthrough */
rY = -rY
rX -= 1 /* restore original divisor */
L3:
... continue with the rest ...
It's a bit trickier to follow, but should be faster in a common case.
WDYT? Too much too far?
> + *insn,
> + };
> + struct bpf_insn chk_and_smod[] = {
> + /* [R,W]x mod 0 -> [R,W]x */
> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> + BPF_JNE | BPF_K, insn->src_reg,
> + 0, 2, 0),
> + BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
> + BPF_JMP_IMM(BPF_JA, 0, 0, 4),
> + /* [R,W]x mod -1 -> 0 */
> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> + BPF_JNE | BPF_K, insn->src_reg,
> + 0, 2, -1),
> + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> + BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> + *insn,
> + };
>
Same idea here, keep the common case as straight as possible.
> - patchlet = isdiv ? chk_and_div : chk_and_mod;
> - cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
> - ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
> + if (is_sdiv) {
> + patchlet = chk_and_sdiv;
> + cnt = ARRAY_SIZE(chk_and_sdiv);
> + } else if (is_smod) {
> + patchlet = chk_and_smod;
> + cnt = ARRAY_SIZE(chk_and_smod);
> + } else {
> + patchlet = isdiv ? chk_and_div : chk_and_mod;
> + cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
> + ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
> + }
>
> new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
> if (!new_prog)
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue
2024-09-12 18:17 ` [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue Andrii Nakryiko
@ 2024-09-12 18:19 ` Andrii Nakryiko
2024-09-12 22:53 ` Yonghong Song
1 sibling, 0 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2024-09-12 18:19 UTC (permalink / raw)
To: Yonghong Song
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
kernel-team, Martin KaFai Lau, Zac Ecob
On Thu, Sep 12, 2024 at 11:17 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Sep 11, 2024 at 9:00 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >
> > Zac Ecob reported a problem where a bpf program may cause kernel crash due
> > to the following error:
> > Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
> >
> > The failure is due to the below signed divide:
> > LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808.
> > LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808,
> > but it is impossible since for 64-bit system, the maximum positive
> > number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will
> > cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is
> > LLONG_MIN.
> >
> > Further investigation found all the following sdiv/smod cases may trigger
> > an exception when bpf program is running on x86_64 platform:
> > - LLONG_MIN/-1 for 64bit operation
> > - INT_MIN/-1 for 32bit operation
> > - LLONG_MIN%-1 for 64bit operation
> > - INT_MIN%-1 for 32bit operation
> > where -1 can be an immediate or in a register.
> >
> > On arm64, there are no exceptions:
> > - LLONG_MIN/-1 = LLONG_MIN
> > - INT_MIN/-1 = INT_MIN
> > - LLONG_MIN%-1 = 0
> > - INT_MIN%-1 = 0
> > where -1 can be an immediate or in a register.
> >
> > Insn patching is needed to handle the above cases and the patched codes
> > produced results aligned with above arm64 result.
> >
> > [1] https://lore.kernel.org/bpf/tPJLTEh7S_DxFEqAI2Ji5MBSoZVg7_G-Py2iaZpAaWtM961fFTWtsnlzwvTbzBzaUzwQAoNATXKUlt0LZOFgnDcIyKCswAnAGdUF3LBrhGQ=@protonmail.com/
> >
> > Reported-by: Zac Ecob <zacecob@protonmail.com>
> > Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> > ---
> > kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 80 insertions(+), 4 deletions(-)
> >
> > Changelogs:
> > v1 -> v2:
> > - Handle more crash cases like 32bit operation and modules.
> > - Add more tests to test new cases.
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index f35b80c16cda..ad7f51302c70 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -20499,13 +20499,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> > /* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
> > insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
> >
> > - /* Make divide-by-zero exceptions impossible. */
> > + /* Make sdiv/smod divide-by-minus-one exceptions impossible. */
> > + if ((insn->code == (BPF_ALU64 | BPF_MOD | BPF_K) ||
> > + insn->code == (BPF_ALU64 | BPF_DIV | BPF_K) ||
> > + insn->code == (BPF_ALU | BPF_MOD | BPF_K) ||
> > + insn->code == (BPF_ALU | BPF_DIV | BPF_K)) &&
> > + insn->off == 1 && insn->imm == -1) {
> > + bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
> > + bool isdiv = BPF_OP(insn->code) == BPF_DIV;
> > + struct bpf_insn *patchlet;
> > + struct bpf_insn chk_and_div[] = {
> > + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
> > + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
> > + 0, 0, 0),
> > + };
> > + struct bpf_insn chk_and_mod[] = {
> > + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> > + };
> > +
> > + patchlet = isdiv ? chk_and_div : chk_and_mod;
>
> nit: "chk_and_" part in the name is misleading, it's more like
> "safe_div" and "safe_mod". Oh, and it's "sdiv" and "smod" specific, so
> probably not a bad idea to have that in the name as well.
>
> > + cnt = isdiv ? ARRAY_SIZE(chk_and_div) : ARRAY_SIZE(chk_and_mod);
> > +
> > + new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
> > + if (!new_prog)
> > + return -ENOMEM;
> > +
> > + delta += cnt - 1;
> > + env->prog = prog = new_prog;
> > + insn = new_prog->insnsi + i + delta;
> > + goto next_insn;
> > + }
> > +
> > + /* Make divide-by-zero and divide-by-minus-one exceptions impossible. */
> > if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
> > insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
> > insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
> > insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
> > bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
> > bool isdiv = BPF_OP(insn->code) == BPF_DIV;
> > + bool is_sdiv = isdiv && insn->off == 1;
> > + bool is_smod = !isdiv && insn->off == 1;
> > struct bpf_insn *patchlet;
> > struct bpf_insn chk_and_div[] = {
> > /* [R,W]x div 0 -> 0 */
> > @@ -20525,10 +20558,53 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> > BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> > BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
> > };
> > + struct bpf_insn chk_and_sdiv[] = {
> > + /* [R,W]x sdiv 0 -> 0 */
> > + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> > + BPF_JNE | BPF_K, insn->src_reg,
> > + 0, 2, 0),
> > + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> > + BPF_JMP_IMM(BPF_JA, 0, 0, 4),
> > + /* LLONG_MIN sdiv -1 -> LLONG_MIN
> > + * INT_MIN sdiv -1 -> INT_MIN
> > + */
> > + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> > + BPF_JNE | BPF_K, insn->src_reg,
> > + 0, 2, -1),
> > + /* BPF_NEG(LLONG_MIN) == -LLONG_MIN == LLONG_MIN */
> > + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
> > + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
> > + 0, 0, 0),
> > + BPF_JMP_IMM(BPF_JA, 0, 0, 1),
>
> I don't know how much it actually matters, but it feels like common
> safe case should be as straight-line-executed as possible, no?
>
> So maybe it's better to rearrange to roughly this (where rX is the
> divisor register):
>
> if rX == 0 goto L1
> if rX == -1 goto L2
> rY /= rX
> goto L3
> L1: /* zero case */
> rY = 0 /* fallthrough, negation doesn't hurt, but less jumping */
> L2: /* negative one case (or zero) */
> rY = -rY
> L3:
> ... the rest of the program code ...
>
>
> Those two branches for common case are still annoyingly inefficient, I
> wonder if we should do
>
> rX += 1 /* [-1, 0] -> [0, 1]
> if rX <=(unsigned) 1 goto L1
> rX -= 1 /* restore original divisor */
> rY /= rX /* common case */
> goto L3
> L1:
> if rX == 0 goto L2 /* jump if originally -1 */
> rY = 0 /* division by zero case */
> L2: /* fallthrough */
> rY = -rY
> rX -= 1 /* restore original divisor */
> L3:
> ... continue with the rest ...
hmm.. just in case rX is the same register as rY, probably best to
restore rX early right at L1: label (and adjust `if rX == 0 goto L2`
into `if rX != 0 goto L2`).
>
>
> It's a bit trickier to follow, but should be faster in a common case.
>
> WDYT? Too much too far?
>
>
> > + *insn,
> > + };
> > + struct bpf_insn chk_and_smod[] = {
> > + /* [R,W]x mod 0 -> [R,W]x */
> > + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> > + BPF_JNE | BPF_K, insn->src_reg,
> > + 0, 2, 0),
> > + BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
> > + BPF_JMP_IMM(BPF_JA, 0, 0, 4),
> > + /* [R,W]x mod -1 -> 0 */
> > + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> > + BPF_JNE | BPF_K, insn->src_reg,
> > + 0, 2, -1),
> > + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> > + BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> > + *insn,
> > + };
> >
>
> Same idea here, keep the common case as straight as possible.
>
> > - patchlet = isdiv ? chk_and_div : chk_and_mod;
> > - cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
> > - ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
> > + if (is_sdiv) {
> > + patchlet = chk_and_sdiv;
> > + cnt = ARRAY_SIZE(chk_and_sdiv);
> > + } else if (is_smod) {
> > + patchlet = chk_and_smod;
> > + cnt = ARRAY_SIZE(chk_and_smod);
> > + } else {
> > + patchlet = isdiv ? chk_and_div : chk_and_mod;
> > + cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
> > + ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
> > + }
> >
> > new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
> > if (!new_prog)
> > --
> > 2.43.5
> >
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue
2024-09-12 18:17 ` [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue Andrii Nakryiko
2024-09-12 18:19 ` Andrii Nakryiko
@ 2024-09-12 22:53 ` Yonghong Song
2024-09-13 2:00 ` Andrii Nakryiko
1 sibling, 1 reply; 6+ messages in thread
From: Yonghong Song @ 2024-09-12 22:53 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
kernel-team, Martin KaFai Lau, Zac Ecob
On 9/12/24 11:17 AM, Andrii Nakryiko wrote:
> On Wed, Sep 11, 2024 at 9:00 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Zac Ecob reported a problem where a bpf program may cause kernel crash due
>> to the following error:
>> Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
>>
>> The failure is due to the below signed divide:
>> LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808.
>> LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808,
>> but it is impossible since for 64-bit system, the maximum positive
>> number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will
>> cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is
>> LLONG_MIN.
>>
>> Further investigation found all the following sdiv/smod cases may trigger
>> an exception when bpf program is running on x86_64 platform:
>> - LLONG_MIN/-1 for 64bit operation
>> - INT_MIN/-1 for 32bit operation
>> - LLONG_MIN%-1 for 64bit operation
>> - INT_MIN%-1 for 32bit operation
>> where -1 can be an immediate or in a register.
>>
>> On arm64, there are no exceptions:
>> - LLONG_MIN/-1 = LLONG_MIN
>> - INT_MIN/-1 = INT_MIN
>> - LLONG_MIN%-1 = 0
>> - INT_MIN%-1 = 0
>> where -1 can be an immediate or in a register.
>>
>> Insn patching is needed to handle the above cases and the patched codes
>> produced results aligned with above arm64 result.
>>
>> [1] https://lore.kernel.org/bpf/tPJLTEh7S_DxFEqAI2Ji5MBSoZVg7_G-Py2iaZpAaWtM961fFTWtsnlzwvTbzBzaUzwQAoNATXKUlt0LZOFgnDcIyKCswAnAGdUF3LBrhGQ=@protonmail.com/
>>
>> Reported-by: Zac Ecob <zacecob@protonmail.com>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>> kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 80 insertions(+), 4 deletions(-)
>>
>> Changelogs:
>> v1 -> v2:
>> - Handle more crash cases like 32bit operation and modules.
>> - Add more tests to test new cases.
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index f35b80c16cda..ad7f51302c70 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -20499,13 +20499,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>> /* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
>> insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
>>
>> - /* Make divide-by-zero exceptions impossible. */
>> + /* Make sdiv/smod divide-by-minus-one exceptions impossible. */
>> + if ((insn->code == (BPF_ALU64 | BPF_MOD | BPF_K) ||
>> + insn->code == (BPF_ALU64 | BPF_DIV | BPF_K) ||
>> + insn->code == (BPF_ALU | BPF_MOD | BPF_K) ||
>> + insn->code == (BPF_ALU | BPF_DIV | BPF_K)) &&
>> + insn->off == 1 && insn->imm == -1) {
>> + bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
>> + bool isdiv = BPF_OP(insn->code) == BPF_DIV;
>> + struct bpf_insn *patchlet;
>> + struct bpf_insn chk_and_div[] = {
>> + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
>> + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
>> + 0, 0, 0),
>> + };
>> + struct bpf_insn chk_and_mod[] = {
>> + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
>> + };
>> +
>> + patchlet = isdiv ? chk_and_div : chk_and_mod;
> nit: "chk_and_" part in the name is misleading, it's more like
> "safe_div" and "safe_mod". Oh, and it's "sdiv" and "smod" specific, so
> probably not a bad idea to have that in the name as well.
good idea. Will use chk_and_sdiv and chk_and_smod.
>
>> + cnt = isdiv ? ARRAY_SIZE(chk_and_div) : ARRAY_SIZE(chk_and_mod);
>> +
>> + new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
>> + if (!new_prog)
>> + return -ENOMEM;
>> +
>> + delta += cnt - 1;
>> + env->prog = prog = new_prog;
>> + insn = new_prog->insnsi + i + delta;
>> + goto next_insn;
>> + }
>> +
>> + /* Make divide-by-zero and divide-by-minus-one exceptions impossible. */
>> if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
>> insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
>> insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
>> insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
>> bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
>> bool isdiv = BPF_OP(insn->code) == BPF_DIV;
>> + bool is_sdiv = isdiv && insn->off == 1;
>> + bool is_smod = !isdiv && insn->off == 1;
>> struct bpf_insn *patchlet;
>> struct bpf_insn chk_and_div[] = {
>> /* [R,W]x div 0 -> 0 */
>> @@ -20525,10 +20558,53 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>> BPF_JMP_IMM(BPF_JA, 0, 0, 1),
>> BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
>> };
>> + struct bpf_insn chk_and_sdiv[] = {
>> + /* [R,W]x sdiv 0 -> 0 */
>> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
>> + BPF_JNE | BPF_K, insn->src_reg,
>> + 0, 2, 0),
>> + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
>> + BPF_JMP_IMM(BPF_JA, 0, 0, 4),
>> + /* LLONG_MIN sdiv -1 -> LLONG_MIN
>> + * INT_MIN sdiv -1 -> INT_MIN
>> + */
>> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
>> + BPF_JNE | BPF_K, insn->src_reg,
>> + 0, 2, -1),
>> + /* BPF_NEG(LLONG_MIN) == -LLONG_MIN == LLONG_MIN */
>> + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
>> + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
>> + 0, 0, 0),
>> + BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> I don't know how much it actually matters, but it feels like common
> safe case should be as straight-line-executed as possible, no?
>
> So maybe it's better to rearrange to roughly this (where rX is the
> divisor register):
>
> if rX == 0 goto L1
> if rX == -1 goto L2
> rY /= rX
> goto L3
> L1: /* zero case */
> rY = 0 /* fallthrough, negation doesn't hurt, but less jumping */
> L2: /* negative one case (or zero) */
> rY = -rY
> L3:
> ... the rest of the program code ...
My previous patched insn try to clearly separate rX == 0 and
rX == -1 case. It has 2 insns including 2 cond jmps, 2 uncond jmps and
one 3 alu operations. The above one removed one uncond jmp, which
is indeed better.
>
>
> Those two branches for common case are still annoyingly inefficient, I
> wonder if we should do
>
> rX += 1 /* [-1, 0] -> [0, 1]
> if rX <=(unsigned) 1 goto L1
> rX -= 1 /* restore original divisor */
> rY /= rX /* common case */
> goto L3
> L1:
> if rX == 0 goto L2 /* jump if originally -1 */
> rY = 0 /* division by zero case */
> L2: /* fallthrough */
> rY = -rY
> rX -= 1 /* restore original divisor */
> L3:
> ... continue with the rest ...
>
>
> It's a bit trickier to follow, but should be faster in a common case.
>
> WDYT? Too much too far?
This is even better. The above rX -= 1 can be removed if we use
BPF_REG_AX as the temporary register. For example,
tmp = rX
tmp += 1 /* [-1, 0] -> [0, 1]
if tmp <=(unsigned) 1 goto L1
rY /= rX /* common case */
goto L3
L1:
if tmp == 0 goto L2 /* jump if originally -1 */
rY = 0 /* division by zero case */
L2: /* fallthrough */
rY = -rY
L3:
... continue with the rest ...
Maybe we can do even better
tmp = rX
tmp += 1 /* [-1, 0] -> [0, 1]
if tmp >(unsigned) 1 goto L2
if tmp == 0 goto L1
rY = 0
L1:
rY = -rY;
goto L3
L2:
rY /= rX
L3:
Could this be even better by reducing one uncond jmp in the fast path?
>
>
>> + *insn,
>> + };
>> + struct bpf_insn chk_and_smod[] = {
>> + /* [R,W]x mod 0 -> [R,W]x */
>> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
>> + BPF_JNE | BPF_K, insn->src_reg,
>> + 0, 2, 0),
>> + BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
>> + BPF_JMP_IMM(BPF_JA, 0, 0, 4),
>> + /* [R,W]x mod -1 -> 0 */
>> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
>> + BPF_JNE | BPF_K, insn->src_reg,
>> + 0, 2, -1),
>> + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
>> + BPF_JMP_IMM(BPF_JA, 0, 0, 1),
>> + *insn,
>> + };
>>
> Same idea here, keep the common case as straight as possible.
Sure. Will do.
>
>> - patchlet = isdiv ? chk_and_div : chk_and_mod;
>> - cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
>> - ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
>> + if (is_sdiv) {
>> + patchlet = chk_and_sdiv;
>> + cnt = ARRAY_SIZE(chk_and_sdiv);
>> + } else if (is_smod) {
>> + patchlet = chk_and_smod;
>> + cnt = ARRAY_SIZE(chk_and_smod);
>> + } else {
>> + patchlet = isdiv ? chk_and_div : chk_and_mod;
>> + cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
>> + ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
>> + }
>>
>> new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
>> if (!new_prog)
>> --
>> 2.43.5
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue
2024-09-12 22:53 ` Yonghong Song
@ 2024-09-13 2:00 ` Andrii Nakryiko
0 siblings, 0 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2024-09-13 2:00 UTC (permalink / raw)
To: Yonghong Song
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
kernel-team, Martin KaFai Lau, Zac Ecob
On Thu, Sep 12, 2024 at 3:53 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 9/12/24 11:17 AM, Andrii Nakryiko wrote:
> > On Wed, Sep 11, 2024 at 9:00 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >> Zac Ecob reported a problem where a bpf program may cause kernel crash due
> >> to the following error:
> >> Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
> >>
> >> The failure is due to the below signed divide:
> >> LLONG_MIN/-1 where LLONG_MIN equals to -9,223,372,036,854,775,808.
> >> LLONG_MIN/-1 is supposed to give a positive number 9,223,372,036,854,775,808,
> >> but it is impossible since for 64-bit system, the maximum positive
> >> number is 9,223,372,036,854,775,807. On x86_64, LLONG_MIN/-1 will
> >> cause a kernel exception. On arm64, the result for LLONG_MIN/-1 is
> >> LLONG_MIN.
> >>
> >> Further investigation found all the following sdiv/smod cases may trigger
> >> an exception when bpf program is running on x86_64 platform:
> >> - LLONG_MIN/-1 for 64bit operation
> >> - INT_MIN/-1 for 32bit operation
> >> - LLONG_MIN%-1 for 64bit operation
> >> - INT_MIN%-1 for 32bit operation
> >> where -1 can be an immediate or in a register.
> >>
> >> On arm64, there are no exceptions:
> >> - LLONG_MIN/-1 = LLONG_MIN
> >> - INT_MIN/-1 = INT_MIN
> >> - LLONG_MIN%-1 = 0
> >> - INT_MIN%-1 = 0
> >> where -1 can be an immediate or in a register.
> >>
> >> Insn patching is needed to handle the above cases and the patched codes
> >> produced results aligned with above arm64 result.
> >>
> >> [1] https://lore.kernel.org/bpf/tPJLTEh7S_DxFEqAI2Ji5MBSoZVg7_G-Py2iaZpAaWtM961fFTWtsnlzwvTbzBzaUzwQAoNATXKUlt0LZOFgnDcIyKCswAnAGdUF3LBrhGQ=@protonmail.com/
> >>
> >> Reported-by: Zac Ecob <zacecob@protonmail.com>
> >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> >> ---
> >> kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++++++++++++++---
> >> 1 file changed, 80 insertions(+), 4 deletions(-)
> >>
> >> Changelogs:
> >> v1 -> v2:
> >> - Handle more crash cases like 32bit operation and modules.
> >> - Add more tests to test new cases.
> >>
> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index f35b80c16cda..ad7f51302c70 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -20499,13 +20499,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >> /* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
> >> insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
> >>
> >> - /* Make divide-by-zero exceptions impossible. */
> >> + /* Make sdiv/smod divide-by-minus-one exceptions impossible. */
> >> + if ((insn->code == (BPF_ALU64 | BPF_MOD | BPF_K) ||
> >> + insn->code == (BPF_ALU64 | BPF_DIV | BPF_K) ||
> >> + insn->code == (BPF_ALU | BPF_MOD | BPF_K) ||
> >> + insn->code == (BPF_ALU | BPF_DIV | BPF_K)) &&
> >> + insn->off == 1 && insn->imm == -1) {
> >> + bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
> >> + bool isdiv = BPF_OP(insn->code) == BPF_DIV;
> >> + struct bpf_insn *patchlet;
> >> + struct bpf_insn chk_and_div[] = {
> >> + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
> >> + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
> >> + 0, 0, 0),
> >> + };
> >> + struct bpf_insn chk_and_mod[] = {
> >> + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> >> + };
> >> +
> >> + patchlet = isdiv ? chk_and_div : chk_and_mod;
> > nit: "chk_and_" part in the name is misleading, it's more like
> > "safe_div" and "safe_mod". Oh, and it's "sdiv" and "smod" specific, so
> > probably not a bad idea to have that in the name as well.
>
> good idea. Will use chk_and_sdiv and chk_and_smod.
>
> >
> >> + cnt = isdiv ? ARRAY_SIZE(chk_and_div) : ARRAY_SIZE(chk_and_mod);
> >> +
> >> + new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
> >> + if (!new_prog)
> >> + return -ENOMEM;
> >> +
> >> + delta += cnt - 1;
> >> + env->prog = prog = new_prog;
> >> + insn = new_prog->insnsi + i + delta;
> >> + goto next_insn;
> >> + }
> >> +
> >> + /* Make divide-by-zero and divide-by-minus-one exceptions impossible. */
> >> if (insn->code == (BPF_ALU64 | BPF_MOD | BPF_X) ||
> >> insn->code == (BPF_ALU64 | BPF_DIV | BPF_X) ||
> >> insn->code == (BPF_ALU | BPF_MOD | BPF_X) ||
> >> insn->code == (BPF_ALU | BPF_DIV | BPF_X)) {
> >> bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
> >> bool isdiv = BPF_OP(insn->code) == BPF_DIV;
> >> + bool is_sdiv = isdiv && insn->off == 1;
> >> + bool is_smod = !isdiv && insn->off == 1;
> >> struct bpf_insn *patchlet;
> >> struct bpf_insn chk_and_div[] = {
> >> /* [R,W]x div 0 -> 0 */
> >> @@ -20525,10 +20558,53 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >> BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> >> BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
> >> };
> >> + struct bpf_insn chk_and_sdiv[] = {
> >> + /* [R,W]x sdiv 0 -> 0 */
> >> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> >> + BPF_JNE | BPF_K, insn->src_reg,
> >> + 0, 2, 0),
> >> + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> >> + BPF_JMP_IMM(BPF_JA, 0, 0, 4),
> >> + /* LLONG_MIN sdiv -1 -> LLONG_MIN
> >> + * INT_MIN sdiv -1 -> INT_MIN
> >> + */
> >> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> >> + BPF_JNE | BPF_K, insn->src_reg,
> >> + 0, 2, -1),
> >> + /* BPF_NEG(LLONG_MIN) == -LLONG_MIN == LLONG_MIN */
> >> + BPF_RAW_INSN((is64 ? BPF_ALU64 : BPF_ALU) |
> >> + BPF_OP(BPF_NEG) | BPF_K, insn->dst_reg,
> >> + 0, 0, 0),
> >> + BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> > I don't know how much it actually matters, but it feels like common
> > safe case should be as straight-line-executed as possible, no?
> >
> > So maybe it's better to rearrange to roughly this (where rX is the
> > divisor register):
> >
> > if rX == 0 goto L1
> > if rX == -1 goto L2
> > rY /= rX
> > goto L3
> > L1: /* zero case */
> > rY = 0 /* fallthrough, negation doesn't hurt, but less jumping */
> > L2: /* negative one case (or zero) */
> > rY = -rY
> > L3:
> > ... the rest of the program code ...
>
> My previous patched insn try to clearly separate rX == 0 and
> rX == -1 case. It has 2 insns including 2 cond jmps, 2 uncond jmps and
> one 3 alu operations. The above one removed one uncond jmp, which
> is indeed better.
>
> >
> >
> > Those two branches for common case are still annoyingly inefficient, I
> > wonder if we should do
> >
> > rX += 1 /* [-1, 0] -> [0, 1]
> > if rX <=(unsigned) 1 goto L1
> > rX -= 1 /* restore original divisor */
> > rY /= rX /* common case */
> > goto L3
> > L1:
> > if rX == 0 goto L2 /* jump if originally -1 */
> > rY = 0 /* division by zero case */
> > L2: /* fallthrough */
> > rY = -rY
> > rX -= 1 /* restore original divisor */
> > L3:
> > ... continue with the rest ...
> >
> >
> > It's a bit trickier to follow, but should be faster in a common case.
> >
> > WDYT? Too much too far?
>
> This is even better. The above rX -= 1 can be removed if we use
> BPF_REG_AX as the temporary register. For example,
>
> tmp = rX
> tmp += 1 /* [-1, 0] -> [0, 1]
> if tmp <=(unsigned) 1 goto L1
> rY /= rX /* common case */
> goto L3
> L1:
> if tmp == 0 goto L2 /* jump if originally -1 */
> rY = 0 /* division by zero case */
> L2: /* fallthrough */
> rY = -rY
> L3:
> ... continue with the rest ...
>
> Maybe we can do even better
>
> tmp = rX
> tmp += 1 /* [-1, 0] -> [0, 1]
> if tmp >(unsigned) 1 goto L2
> if tmp == 0 goto L1
> rY = 0
> L1:
> rY = -rY;
> goto L3
> L2:
> rY /= rX
> L3:
>
> Could this be even better by reducing one uncond jmp in the fast path?
Yep, makes sense to me. Go for it (as far as I'm concerned).
>
> >
> >
> >> + *insn,
> >> + };
> >> + struct bpf_insn chk_and_smod[] = {
> >> + /* [R,W]x mod 0 -> [R,W]x */
> >> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> >> + BPF_JNE | BPF_K, insn->src_reg,
> >> + 0, 2, 0),
> >> + BPF_MOV32_REG(insn->dst_reg, insn->dst_reg),
> >> + BPF_JMP_IMM(BPF_JA, 0, 0, 4),
> >> + /* [R,W]x mod -1 -> 0 */
> >> + BPF_RAW_INSN((is64 ? BPF_JMP : BPF_JMP32) |
> >> + BPF_JNE | BPF_K, insn->src_reg,
> >> + 0, 2, -1),
> >> + BPF_ALU32_REG(BPF_XOR, insn->dst_reg, insn->dst_reg),
> >> + BPF_JMP_IMM(BPF_JA, 0, 0, 1),
> >> + *insn,
> >> + };
> >>
> > Same idea here, keep the common case as straight as possible.
>
> Sure. Will do.
>
> >
> >> - patchlet = isdiv ? chk_and_div : chk_and_mod;
> >> - cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
> >> - ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
> >> + if (is_sdiv) {
> >> + patchlet = chk_and_sdiv;
> >> + cnt = ARRAY_SIZE(chk_and_sdiv);
> >> + } else if (is_smod) {
> >> + patchlet = chk_and_smod;
> >> + cnt = ARRAY_SIZE(chk_and_smod);
> >> + } else {
> >> + patchlet = isdiv ? chk_and_div : chk_and_mod;
> >> + cnt = isdiv ? ARRAY_SIZE(chk_and_div) :
> >> + ARRAY_SIZE(chk_and_mod) - (is64 ? 2 : 0);
> >> + }
> >>
> >> new_prog = bpf_patch_insn_data(env, i + delta, patchlet, cnt);
> >> if (!new_prog)
> >> --
> >> 2.43.5
> >>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-09-13 2:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-12 3:59 [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue Yonghong Song
2024-09-12 3:59 ` [PATCH bpf-next v2 2/2] selftests/bpf: Add tests for sdiv/smod overflow cases Yonghong Song
2024-09-12 18:17 ` [PATCH bpf-next v2 1/2] bpf: Fix a sdiv overflow issue Andrii Nakryiko
2024-09-12 18:19 ` Andrii Nakryiko
2024-09-12 22:53 ` Yonghong Song
2024-09-13 2:00 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox