From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Yonghong Song <yonghong.song@linux.dev>,
Daniel Hodges <hodgesd@meta.com>,
Alexei Starovoitov <ast@kernel.org>,
Sasha Levin <sashal@kernel.org>,
davem@davemloft.net, dsahern@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: [PATCH AUTOSEL 6.10 10/70] bpf, x64: Fix a jit convergence issue
Date: Fri, 4 Oct 2024 14:20:08 -0400 [thread overview]
Message-ID: <20241004182200.3670903-10-sashal@kernel.org> (raw)
In-Reply-To: <20241004182200.3670903-1-sashal@kernel.org>
From: Yonghong Song <yonghong.song@linux.dev>
[ Upstream commit c8831bdbfbab672c006a18006d36932a494b2fd6 ]
Daniel Hodges reported a jit error when playing with a sched-ext program.
The error message is:
unexpected jmp_cond padding: -4 bytes
But further investigation shows the error is actual due to failed
convergence. The following are some analysis:
...
pass4, final_proglen=4391:
...
20e: 48 85 ff test rdi,rdi
211: 74 7d je 0x290
213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
289: 48 85 ff test rdi,rdi
28c: 74 17 je 0x2a5
28e: e9 7f ff ff ff jmp 0x212
293: bf 03 00 00 00 mov edi,0x3
Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
and insn at 0x28e is 5-byte jmp insn with offset -129.
pass5, final_proglen=4392:
...
20e: 48 85 ff test rdi,rdi
211: 0f 84 80 00 00 00 je 0x297
217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
28d: 48 85 ff test rdi,rdi
290: 74 1a je 0x2ac
292: eb 84 jmp 0x218
294: bf 03 00 00 00 mov edi,0x3
Note that insn at 0x211 is 6-byte cond jump insn now since its offset
becomes 0x80 based on previous round (0x293 - 0x213 = 0x80). At the same
time, insn at 0x292 is a 2-byte insn since its offset is -124.
pass6 will repeat the same code as in pass4. pass7 will repeat the same
code as in pass5, and so on. This will prevent eventual convergence.
Passes 1-14 are with padding = 0. At pass15, padding is 1 and related
insn looks like:
211: 0f 84 80 00 00 00 je 0x297
217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
24d: 48 85 d2 test rdx,rdx
The similar code in pass14:
211: 74 7d je 0x290
213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
...
249: 48 85 d2 test rdx,rdx
24c: 74 21 je 0x26f
24e: 48 01 f7 add rdi,rsi
...
Before generating the following insn,
250: 74 21 je 0x273
"padding = 1" enables some checking to ensure nops is either 0 or 4
where
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
nops = INSN_SZ_DIFF - 2
In this specific case,
addrs[i] = 0x24e // from pass14
addrs[i-1] = 0x24d // from pass15
prog - temp = 3 // from 'test rdx,rdx' in pass15
so
nops = -4
and this triggers the failure.
To fix the issue, we need to break cycles of je <-> jmp. For example,
in the above case, we have
211: 74 7d je 0x290
the offset is 0x7d. If 2-byte je insn is generated only if
the offset is less than 0x7d (<= 0x7c), the cycle can be
break and we can achieve the convergence.
I did some study on other cases like je <-> je, jmp <-> je and
jmp <-> jmp which may cause cycles. Those cases are not from actual
reproducible cases since it is pretty hard to construct a test case
for them. the results show that the offset <= 0x7b (0x7b = 123) should
be enough to cover all cases. This patch added a new helper to generate 8-bit
cond/uncond jmp insns only if the offset range is [-128, 123].
Reported-by: Daniel Hodges <hodgesd@meta.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240904221251.37109-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/x86/net/bpf_jit_comp.c | 54 +++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 5159c7a229229..dd60500497ead 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -64,6 +64,56 @@ static bool is_imm8(int value)
return value <= 127 && value >= -128;
}
+/*
+ * Let us limit the positive offset to be <= 123.
+ * This is to ensure eventual jit convergence For the following patterns:
+ * ...
+ * pass4, final_proglen=4391:
+ * ...
+ * 20e: 48 85 ff test rdi,rdi
+ * 211: 74 7d je 0x290
+ * 213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
+ * ...
+ * 289: 48 85 ff test rdi,rdi
+ * 28c: 74 17 je 0x2a5
+ * 28e: e9 7f ff ff ff jmp 0x212
+ * 293: bf 03 00 00 00 mov edi,0x3
+ * Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
+ * and insn at 0x28e is 5-byte jmp insn with offset -129.
+ *
+ * pass5, final_proglen=4392:
+ * ...
+ * 20e: 48 85 ff test rdi,rdi
+ * 211: 0f 84 80 00 00 00 je 0x297
+ * 217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0]
+ * ...
+ * 28d: 48 85 ff test rdi,rdi
+ * 290: 74 1a je 0x2ac
+ * 292: eb 84 jmp 0x218
+ * 294: bf 03 00 00 00 mov edi,0x3
+ * Note that insn at 0x211 is 6-byte cond jump insn now since its offset
+ * becomes 0x80 based on previous round (0x293 - 0x213 = 0x80).
+ * At the same time, insn at 0x292 is a 2-byte insn since its offset is
+ * -124.
+ *
+ * pass6 will repeat the same code as in pass4 and this will prevent
+ * eventual convergence.
+ *
+ * To fix this issue, we need to break je (2->6 bytes) <-> jmp (5->2 bytes)
+ * cycle in the above. In the above example je offset <= 0x7c should work.
+ *
+ * For other cases, je <-> je needs offset <= 0x7b to avoid no convergence
+ * issue. For jmp <-> je and jmp <-> jmp cases, jmp offset <= 0x7c should
+ * avoid no convergence issue.
+ *
+ * Overall, let us limit the positive offset for 8bit cond/uncond jmp insn
+ * to maximum 123 (0x7b). This way, the jit pass can eventually converge.
+ */
+static bool is_imm8_jmp_offset(int value)
+{
+ return value <= 123 && value >= -128;
+}
+
static bool is_simm32(s64 value)
{
return value == (s64)(s32)value;
@@ -2191,7 +2241,7 @@ st: if (is_imm8(insn->off))
return -EFAULT;
}
jmp_offset = addrs[i + insn->off] - addrs[i];
- if (is_imm8(jmp_offset)) {
+ if (is_imm8_jmp_offset(jmp_offset)) {
if (jmp_padding) {
/* To keep the jmp_offset valid, the extra bytes are
* padded before the jump insn, so we subtract the
@@ -2273,7 +2323,7 @@ st: if (is_imm8(insn->off))
break;
}
emit_jmp:
- if (is_imm8(jmp_offset)) {
+ if (is_imm8_jmp_offset(jmp_offset)) {
if (jmp_padding) {
/* To avoid breaking jmp_offset, the extra bytes
* are padded before the actual jmp insn, so
--
2.43.0
next prev parent reply other threads:[~2024-10-04 18:22 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-04 18:19 [PATCH AUTOSEL 6.10 01/70] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 02/70] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 03/70] bpf: Fix a sdiv overflow issue Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 04/70] bpf: Check percpu map value size first Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 05/70] bpftool: Fix undefined behavior in qsort(NULL, 0, ...) Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 06/70] bpftool: Fix undefined behavior caused by shifting into the sign bit Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 07/70] s390/boot: Compile all files with the same march flag Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 08/70] s390/facility: Disable compile time optimization for decompressor code Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 09/70] s390/mm: Add cond_resched() to cmm_alloc/free_pages() Sasha Levin
2024-10-04 18:20 ` Sasha Levin [this message]
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 11/70] ext4: fix i_data_sem unlock order in ext4_ind_migrate() Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 12/70] ext4: avoid use-after-free in ext4_ext_show_leaf() Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 13/70] ext4: ext4_search_dir should return a proper error Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 14/70] iomap: fix iomap_dio_zero() for fs bs > system page size Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 15/70] ext4: don't set SB_RDONLY after filesystem errors Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 16/70] ext4: nested locking for xattr inode Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 17/70] s390/cpum_sf: Remove WARN_ON_ONCE statements Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 18/70] s390/traps: Handle early warnings gracefully Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 19/70] bpf: Prevent tail call between progs attached to different hooks Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 20/70] ktest.pl: Avoid false positives with grub2 skip regex Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 21/70] RDMA/mad: Improve handling of timed out WRs of mad agent Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 22/70] soundwire: intel_bus_common: enable interrupts before exiting reset Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 23/70] PCI: Add function 0 DMA alias quirk for Glenfly Arise chip Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 24/70] RDMA/rtrs-srv: Avoid null pointer deref during path establishment Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 25/70] clk: bcm: bcm53573: fix OF node leak in init Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 26/70] PCI: Add ACS quirk for Qualcomm SA8775P Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 27/70] i2c: i801: Use a different adapter-name for IDF adapters Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 28/70] PCI: Mark Creative Labs EMU20k2 INTx masking as broken Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 29/70] i3c: master: cdns: Fix use after free vulnerability in cdns_i3c_master Driver Due to Race Condition Sasha Levin
2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 30/70] RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t Sasha Levin
2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 31/70] io_uring: check if we need to reschedule during overflow flush Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 32/70] ntb: ntb_hw_switchtec: Fix use after free vulnerability in switchtec_ntb_remove due to race condition Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 33/70] mfd: intel_soc_pmic_chtwc: Make Lenovo Yoga Tab 3 X90F DMI match less strict Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 34/70] riscv: Omit optimized string routines when using KASAN Sasha Levin
2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 35/70] riscv: avoid Imbalance in RAS Sasha Levin
2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 36/70] RDMA/mlx5: Enforce umem boundaries for explicit ODP page faults Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 37/70] PCI: qcom: Disable mirroring of DBI and iATU register space in BAR region Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 38/70] PCI: endpoint: Assign PCI domain number for endpoint controllers Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 39/70] soundwire: cadence: re-check Peripheral status with delayed_work Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 40/70] riscv/kexec_file: Fix relocation type R_RISCV_ADD16 and R_RISCV_SUB16 unknown Sasha Levin
2024-10-04 18:20 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 41/70] media: videobuf2-core: clear memory related fields in __vb2_plane_dmabuf_put() Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 42/70] remoteproc: imx_rproc: Use imx specific hook for find_loaded_rsc_table Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 43/70] clk: imx: Remove CLK_SET_PARENT_GATE for DRAM mux for i.MX7D Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 44/70] fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 45/70] fuse: handle idmappings properly in ->write_iter() Sasha Levin
2024-10-07 10:05 ` Miklos Szeredi
2024-10-11 13:50 ` Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 46/70] serial: protect uart_port_dtr_rts() in uart_shutdown() too Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 47/70] usb: typec: tipd: Free IRQ only if it was requested before Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 48/70] usb: chipidea: udc: enable suspend interrupt after usb reset Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 49/70] usb: dwc2: Adjust the timing of USB Driver Interrupt Registration in the Crashkernel Scenario Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 50/70] xhci: dbc: Fix STALL transfer event handling Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 51/70] comedi: ni_routing: tools: Check when the file could not be opened Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 52/70] LoongArch: Fix memleak in pci_acpi_scan_root() Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 53/70] netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 54/70] netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 55/70] virtio_pmem: Check device status before requesting flush Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 56/70] tools/iio: Add memory allocation failure check for trigger_name Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 57/70] staging: vme_user: added bound check to geoid Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 58/70] usb: gadget: uvc: Fix ERR_PTR dereference in uvc_v4l2.c Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 59/70] dm vdo: don't refer to dedupe_context after releasing it Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 60/70] driver core: bus: Fix double free in driver API bus_register() Sasha Levin
2024-10-04 18:20 ` [PATCH AUTOSEL 6.10 61/70] driver core: bus: Return -EIO instead of 0 when show/store invalid bus attribute Sasha Levin
2024-10-04 18:21 ` [PATCH AUTOSEL 6.10 62/70] scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd() Sasha Levin
2024-10-04 18:21 ` [PATCH AUTOSEL 6.10 63/70] scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instance Sasha Levin
2024-10-04 18:21 ` [PATCH AUTOSEL 6.10 64/70] scsi: lpfc: Revise TRACE_EVENT log flag severities from KERN_ERR to KERN_WARNING Sasha Levin
2024-10-04 18:21 ` [PATCH AUTOSEL 6.10 65/70] drm/xe/oa: Fix overflow in oa batch buffer Sasha Levin
2024-10-04 18:21 ` [PATCH AUTOSEL 6.10 66/70] drm/amdgpu: nuke the VM PD/PT shadow handling Sasha Levin
2024-10-04 18:21 ` [PATCH AUTOSEL 6.10 67/70] drm/amd/display: Check null pointer before dereferencing se Sasha Levin
2024-10-04 18:21 ` [PATCH AUTOSEL 6.10 68/70] fbcon: Fix a NULL pointer dereference issue in fbcon_putcs Sasha Levin
2024-10-04 18:21 ` [PATCH AUTOSEL 6.10 69/70] smb: client: fix UAF in async decryption Sasha Levin
2024-10-04 18:21 ` [PATCH AUTOSEL 6.10 70/70] fbdev: sisfb: Fix strbuf array overflow Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241004182200.3670903-10-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=hodgesd@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.