From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E26D1BF7FD for ; Fri, 30 Aug 2024 20:39:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725050363; cv=none; b=CPJid1FkddRmotaoLqRHd6kHmHfAIeRxi+FlL7lUX/tWd4t/NQgzGVpOAfV0nikicneUnyjUKIRjGUf6RVvxzRinK27HLp8/PvF+9osdwxnrKUnUZDFpHZ0j9Py+OIpLmHTJSDCijc5fFtu+c/E+72hf9+Y/avcYY1F+B72PMoY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725050363; c=relaxed/simple; bh=XG8YtKNUfa1r+x+bnuMKI8119BNy7UAIHk7vB5Q0Cqg=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=uSk+d1L2daBD2en4U+7EkHOgU2kmmLovIDLGpxvHh8STe/DV3wQnCg7SdlzBjmV+7Y6xgRa7+oiiGM5omv/DDnng2Q/a7ByiKZ9YeWCpDTj7Ge6xG5+do8fmj361uWMMFnPb8wcdddPwIEDmaxv+drfR2mEu1E9fZedACut4gWU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=fTQAMWOc; arc=none smtp.client-ip=95.215.58.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="fTQAMWOc" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1725050359; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d2bxvYqY58wIOvjv5Pp0uoSPxrehDFnWAw46G0P2qDA=; b=fTQAMWOcW9yGMaOUR0U3BQkVmSKpmM4lW/ynnOksiJ6jXjEZyaJwNe6vIEIkuKS4cN5e3C 4nap6xhu1uuz7vv+Rcswm7H0GrfAzZ/buW1atkEqK+V+Hs7aJtJA0UX/Yn6aoCI2ozviQ9 Bc3SpMVnAkbRpXYMtc7KLXh/3KQ0s6k= Date: Fri, 30 Aug 2024 13:39:14 -0700 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next] bpf, x64: Fix a jit convergence issue Content-Language: en-GB To: Andrii Nakryiko Cc: bpf@vger.kernel.org, Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , kernel-team@fb.com, Martin KaFai Lau , Daniel Hodges References: <20240825200406.1874982-1-yonghong.song@linux.dev> <0ad6d232-5385-40e4-b138-1b9ec383884a@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yonghong Song In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 8/29/24 10:27 AM, Andrii Nakryiko wrote: > On Wed, Aug 28, 2024 at 3:50 PM Yonghong Song wrote: >> >> On 8/27/24 4:44 PM, Andrii Nakryiko wrote: >>> On Sun, Aug 25, 2024 at 1:04 PM Yonghong Song wrote: >>>> Daniel Hodges reported a jit error when playing with a sched-ext >>>> program. The error message is: >>>> unexpected jmp_cond padding: -4 bytes >>>> >>>> But further investigation shows the error is actual due to failed >>>> convergence. The following are some analysis: >>>> >>>> ... >>>> pass4, final_proglen=4391: >>>> ... >>>> 20e: 48 85 ff test rdi,rdi >>>> 211: 74 7d je 0x290 >>>> 213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0] >>>> ... >>>> 289: 48 85 ff test rdi,rdi >>>> 28c: 74 17 je 0x2a5 >>>> 28e: e9 7f ff ff ff jmp 0x212 >>>> 293: bf 03 00 00 00 mov edi,0x3 >>>> >>>> Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125) >>>> and insn at 0x28e is 5-byte jmp insn with offset -129. >>>> >>>> pass5, final_proglen=4392: >>>> ... >>>> 20e: 48 85 ff test rdi,rdi >>>> 211: 0f 84 80 00 00 00 je 0x297 >>>> 217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0] >>>> ... >>>> 28d: 48 85 ff test rdi,rdi >>>> 290: 74 1a je 0x2ac >>>> 292: eb 84 jmp 0x218 >>>> 294: bf 03 00 00 00 mov edi,0x3 >>>> >>>> Note that insn at 0x211 is 5-byte cond jump insn now since its offset >>>> becomes 0x80 based on previous round (0x293 - 0x213 = 0x80). >>>> At the same time, insn at 0x292 is a 2-byte insn since its offset is >>>> -124. >>>> >>>> pass6 will repeat the same code as in pass4. pass7 will repeat the same >>>> code as in pass5, and so on. This will prevent eventual convergence. >>>> >>>> Passes 1-14 are with padding = 0. At pass15, padding is 1 and related >>>> insn looks like: >>>> >>>> 211: 0f 84 80 00 00 00 je 0x297 >>>> 217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0] >>>> ... >>>> 24d: 48 85 d2 test rdx,rdx >>>> >>>> The similar code in pass14: >>>> 211: 74 7d je 0x290 >>>> 213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0] >>>> ... >>>> 249: 48 85 d2 test rdx,rdx >>>> 24c: 74 21 je 0x26f >>>> 24e: 48 01 f7 add rdi,rsi >>>> ... >>>> >>>> Before generating the following insn, >>>> 250: 74 21 je 0x273 >>>> "padding = 1" enables some checking to ensure nops is either 0 or 4 >>>> where >>>> #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp))) >>>> nops = INSN_SZ_DIFF - 2 >>>> >>>> In this specific case, >>>> addrs[i] = 0x24e // from pass14 >>>> addrs[i-1] = 0x24d // from pass15 >>>> prog - temp = 3 // from 'test rdx,rdx' in pass15 >>>> so >>>> nops = -4 >>>> and this triggers the failure. >>>> Making jit prog convergable can fix the above error. >>>> >>>> Reported-by: Daniel Hodges >>>> Signed-off-by: Yonghong Song >>>> --- >>>> arch/x86/net/bpf_jit_comp.c | 47 ++++++++++++++++++++++++++++++++++++- >>>> 1 file changed, 46 insertions(+), 1 deletion(-) >>>> >>> Probably a stupid question. But instead of hacking things like this to >>> help convergence in some particular cases, why not just add a >>> condition that we should stop jitting as soon as jitted length stops >>> shrinking (and correct the comment that claims "JITed image shrinks >>> with every pass" because that's not true). >>> >>> We have `if (proglen == oldproglen)` condition right now. What will >>> happen if we just change it to `if (proglen >= oldproglen)`? That >>> might be sup-optimal for these rare non-convergent cases, but that >>> seems fine. We can of course do one extra pass to hopefully get back >>> the second-to-last shorter image if proglen > oldproglen, but that >>> seems excessive to me. >> We need convergence. Looks at some comments below: >> >> + * pass5, final_proglen=4392: >> + * ... >> + * 20e: 48 85 ff test rdi,rdi >> + * 211: 0f 84 80 00 00 00 je 0x297 >> + * 217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0] >> + * ... >> + * 28d: 48 85 ff test rdi,rdi >> + * 290: 74 1a je 0x2ac >> + * 292: eb 84 jmp 0x218 >> + * 294: bf 03 00 00 00 mov edi,0x3 >> >> Without convergence, you can see je/jmp target may not be correct. >> > I see, thanks. As I said, probably was a stupid question, I didn't > realize that do_jit() can generate invalid image. > > This whole guessing of acceptable range of relative offset still seems > like a fragile game (what if you have few instructions that expand and > then 124 bound isn't conservative enough anymore). I was wondering if > there is some more generic solution where we can mark jump > instructions that went from shorter to longer, and if that happened, > on subsequent passes don't try to shorten them. I digged out the git history and found that this pass convergence based approach is actually implemented from the beginning from the following patch ===== commit 0a14842f5a3c0e88a1e59fac5c3025db39721f74 (HEAD -> t2) Author: Eric Dumazet Date: Wed Apr 20 09:27:32 2011 +0000 net: filter: Just In Time compiler for x86-64 ===== I cannot find the discussion context for this patch then. Do we have a better approach for this? I do not know, the issue is that for the same branch/jmp insn, the length of the insn could change depending on the 'offset' value. One possible solution is to assume all branch/jmp insn is 8bit long and whenever it won't work, replace previous one with 32bit and continue to enumerate following branch/jmp insn with 8bit. This may take a lot of time though. > > Again, I have no clue how actual code in JIT works and what are all > the nuances, so feel free to ignore me completely, I won't be offended > :) > >>> >>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c >>>> index 074b41fafbe3..ec541aae5d9b 100644 >>>> --- a/arch/x86/net/bpf_jit_comp.c >>>> +++ b/arch/x86/net/bpf_jit_comp.c >>>> @@ -64,6 +64,51 @@ static bool is_imm8(int value) >>>> return value <= 127 && value >= -128; >>>> } >>>> >>>> +/* >>>> + * Let us limit the positive offset to be <= 124. >>>> + * This is to ensure eventual jit convergence For the following patterns: >>>> + * ... >>>> + * pass4, final_proglen=4391: >>>> + * ... >>>> + * 20e: 48 85 ff test rdi,rdi >>>> + * 211: 74 7d je 0x290 >>>> + * 213: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0] >>>> + * ... >>>> + * 289: 48 85 ff test rdi,rdi >>>> + * 28c: 74 17 je 0x2a5 >>>> + * 28e: e9 7f ff ff ff jmp 0x212 >>>> + * 293: bf 03 00 00 00 mov edi,0x3 >>>> + * Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125) >>>> + * and insn at 0x28e is 5-byte jmp insn with offset -129. >>>> + * >>>> + * pass5, final_proglen=4392: >>>> + * ... >>>> + * 20e: 48 85 ff test rdi,rdi >>>> + * 211: 0f 84 80 00 00 00 je 0x297 >>>> + * 217: 48 8b 77 00 mov rsi,QWORD PTR [rdi+0x0] >>>> + * ... >>>> + * 28d: 48 85 ff test rdi,rdi >>>> + * 290: 74 1a je 0x2ac >>>> + * 292: eb 84 jmp 0x218 >>>> + * 294: bf 03 00 00 00 mov edi,0x3 >>>> + * Note that insn at 0x211 is 5-byte cond jump insn now since its offset >>>> + * becomes 0x80 based on previous round (0x293 - 0x213 = 0x80). >>>> + * At the same time, insn at 0x292 is a 2-byte insn since its offset is >>>> + * -124. >>>> + * >>>> + * pass6 will repeat the same code as in pass4 and this will prevent >>>> + * eventual convergence. >>>> + * >>>> + * To fix this issue, we need to break je (2->6 bytes) <-> jmp (5->2 bytes) >>>> + * cycle in the above. Let us limit the positive offset for 8bit cond jump >>>> + * insn to mamximum 124 (0x7c). This way, the jmp insn will be always 2-bytes, >>>> + * and the jit pass can eventually converge. >>>> + */ >>>> +static bool is_imm8_cond_offset(int value) >>>> +{ >>>> + return value <= 124 && value >= -128; >>>> +} >>>> + >>>> static bool is_simm32(s64 value) >>>> { >>>> return value == (s64)(s32)value; >>>> @@ -2231,7 +2276,7 @@ st: if (is_imm8(insn->off)) >>>> return -EFAULT; >>>> } >>>> jmp_offset = addrs[i + insn->off] - addrs[i]; >>>> - if (is_imm8(jmp_offset)) { >>>> + if (is_imm8_cond_offset(jmp_offset)) { >>>> if (jmp_padding) { >>>> /* To keep the jmp_offset valid, the extra bytes are >>>> * padded before the jump insn, so we subtract the >>>> -- >>>> 2.43.5 >>>>