Re: [PATCH bpf-next] bpf, x64: Fix a jit convergence issue

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>,
	Daniel Hodges <hodgesd@meta.com>
Subject: Re: [PATCH bpf-next] bpf, x64: Fix a jit convergence issue
Date: Fri, 30 Aug 2024 13:39:14 -0700	[thread overview]
Message-ID: <a1c6adac-a150-4819-9f36-4d0340e6eb08@linux.dev> (raw)
In-Reply-To: <CAEf4BzaarnU2g9LVP1qfRLDia+owzyQdonkkKcRpb9m4bcr37A@mail.gmail.com>


On 8/29/24 10:27 AM, Andrii Nakryiko wrote:
> On Wed, Aug 28, 2024 at 3:50 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 8/27/24 4:44 PM, Andrii Nakryiko wrote:
>>> On Sun, Aug 25, 2024 at 1:04 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> Daniel Hodges reported a jit error when playing with a sched-ext
>>>> program. The error message is:
>>>>     unexpected jmp_cond padding: -4 bytes
>>>>
>>>> But further investigation shows the error is actual due to failed
>>>> convergence. The following are some analysis:
>>>>
>>>>     ...
>>>>     pass4, final_proglen=4391:
>>>>       ...
>>>>       20e:    48 85 ff                test   rdi,rdi
>>>>       211:    74 7d                   je     0x290
>>>>       213:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
>>>>       ...
>>>>       289:    48 85 ff                test   rdi,rdi
>>>>       28c:    74 17                   je     0x2a5
>>>>       28e:    e9 7f ff ff ff          jmp    0x212
>>>>       293:    bf 03 00 00 00          mov    edi,0x3
>>>>
>>>> Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
>>>> and insn at 0x28e is 5-byte jmp insn with offset -129.
>>>>
>>>>     pass5, final_proglen=4392:
>>>>       ...
>>>>       20e:    48 85 ff                test   rdi,rdi
>>>>       211:    0f 84 80 00 00 00       je     0x297
>>>>       217:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
>>>>       ...
>>>>       28d:    48 85 ff                test   rdi,rdi
>>>>       290:    74 1a                   je     0x2ac
>>>>       292:    eb 84                   jmp    0x218
>>>>       294:    bf 03 00 00 00          mov    edi,0x3
>>>>
>>>> Note that insn at 0x211 is 5-byte cond jump insn now since its offset
>>>> becomes 0x80 based on previous round (0x293 - 0x213 = 0x80).
>>>> At the same time, insn at 0x292 is a 2-byte insn since its offset is
>>>> -124.
>>>>
>>>> pass6 will repeat the same code as in pass4. pass7 will repeat the same
>>>> code as in pass5, and so on. This will prevent eventual convergence.
>>>>
>>>> Passes 1-14 are with padding = 0. At pass15, padding is 1 and related
>>>> insn looks like:
>>>>
>>>>       211:    0f 84 80 00 00 00       je     0x297
>>>>       217:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
>>>>       ...
>>>>       24d:    48 85 d2                test   rdx,rdx
>>>>
>>>> The similar code in pass14:
>>>>       211:    74 7d                   je     0x290
>>>>       213:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
>>>>       ...
>>>>       249:    48 85 d2                test   rdx,rdx
>>>>       24c:    74 21                   je     0x26f
>>>>       24e:    48 01 f7                add    rdi,rsi
>>>>       ...
>>>>
>>>> Before generating the following insn,
>>>>     250:    74 21                   je     0x273
>>>> "padding = 1" enables some checking to ensure nops is either 0 or 4
>>>> where
>>>>     #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
>>>>     nops = INSN_SZ_DIFF - 2
>>>>
>>>> In this specific case,
>>>>     addrs[i] = 0x24e // from pass14
>>>>     addrs[i-1] = 0x24d // from pass15
>>>>     prog - temp = 3 // from 'test rdx,rdx' in pass15
>>>> so
>>>>     nops = -4
>>>> and this triggers the failure.
>>>> Making jit prog convergable can fix the above error.
>>>>
>>>> Reported-by: Daniel Hodges <hodgesd@meta.com>
>>>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>>>> ---
>>>>    arch/x86/net/bpf_jit_comp.c | 47 ++++++++++++++++++++++++++++++++++++-
>>>>    1 file changed, 46 insertions(+), 1 deletion(-)
>>>>
>>> Probably a stupid question. But instead of hacking things like this to
>>> help convergence in some particular cases, why not just add a
>>> condition that we should stop jitting as soon as jitted length stops
>>> shrinking (and correct the comment that claims "JITed image shrinks
>>> with every pass" because that's not true).
>>>
>>> We have `if (proglen == oldproglen)` condition right now. What will
>>> happen if we just change it to `if (proglen >= oldproglen)`? That
>>> might be sup-optimal for these rare non-convergent cases, but that
>>> seems fine. We can of course do one extra pass to hopefully get back
>>> the second-to-last shorter image if proglen > oldproglen, but that
>>> seems excessive to me.
>> We need convergence. Looks at some comments below:
>>
>> + * pass5, final_proglen=4392:
>> + *   ...
>> + *   20e:    48 85 ff                test   rdi,rdi
>> + *   211:    0f 84 80 00 00 00       je     0x297
>> + *   217:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
>> + *   ...
>> + *   28d:    48 85 ff                test   rdi,rdi
>> + *   290:    74 1a                   je     0x2ac
>> + *   292:    eb 84                   jmp    0x218
>> + *   294:    bf 03 00 00 00          mov    edi,0x3
>>
>> Without convergence, you can see je/jmp target may not be correct.
>>
> I see, thanks. As I said, probably was a stupid question, I didn't
> realize that do_jit() can generate invalid image.
>
> This whole guessing of acceptable range of relative offset still seems
> like a fragile game (what if you have few instructions that expand and
> then 124 bound isn't conservative enough anymore). I was wondering if
> there is some more generic solution where we can mark jump
> instructions that went from shorter to longer, and if that happened,
> on subsequent passes don't try to shorten them.

I digged out the git history and found that this pass convergence based
approach is actually implemented from the beginning from the following
patch

=====
commit 0a14842f5a3c0e88a1e59fac5c3025db39721f74 (HEAD -> t2)
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Wed Apr 20 09:27:32 2011 +0000

     net: filter: Just In Time compiler for x86-64
=====

I cannot find the discussion context for this patch then.
Do we have a better approach for this? I do not know, the issue
is that for the same branch/jmp insn, the length of the insn
could change depending on the 'offset' value. One possible
solution is to assume all branch/jmp insn is 8bit long and
whenever it won't work, replace previous one with 32bit and
continue to enumerate following branch/jmp insn with 8bit.
This may take a lot of time though.

>
> Again, I have no clue how actual code in JIT works and what are all
> the nuances, so feel free to ignore me completely, I won't be offended
> :)
>
>>>
>>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>>>> index 074b41fafbe3..ec541aae5d9b 100644
>>>> --- a/arch/x86/net/bpf_jit_comp.c
>>>> +++ b/arch/x86/net/bpf_jit_comp.c
>>>> @@ -64,6 +64,51 @@ static bool is_imm8(int value)
>>>>           return value <= 127 && value >= -128;
>>>>    }
>>>>
>>>> +/*
>>>> + * Let us limit the positive offset to be <= 124.
>>>> + * This is to ensure eventual jit convergence For the following patterns:
>>>> + * ...
>>>> + * pass4, final_proglen=4391:
>>>> + *   ...
>>>> + *   20e:    48 85 ff                test   rdi,rdi
>>>> + *   211:    74 7d                   je     0x290
>>>> + *   213:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
>>>> + *   ...
>>>> + *   289:    48 85 ff                test   rdi,rdi
>>>> + *   28c:    74 17                   je     0x2a5
>>>> + *   28e:    e9 7f ff ff ff          jmp    0x212
>>>> + *   293:    bf 03 00 00 00          mov    edi,0x3
>>>> + * Note that insn at 0x211 is 2-byte cond jump insn for offset 0x7d (-125)
>>>> + * and insn at 0x28e is 5-byte jmp insn with offset -129.
>>>> + *
>>>> + * pass5, final_proglen=4392:
>>>> + *   ...
>>>> + *   20e:    48 85 ff                test   rdi,rdi
>>>> + *   211:    0f 84 80 00 00 00       je     0x297
>>>> + *   217:    48 8b 77 00             mov    rsi,QWORD PTR [rdi+0x0]
>>>> + *   ...
>>>> + *   28d:    48 85 ff                test   rdi,rdi
>>>> + *   290:    74 1a                   je     0x2ac
>>>> + *   292:    eb 84                   jmp    0x218
>>>> + *   294:    bf 03 00 00 00          mov    edi,0x3
>>>> + * Note that insn at 0x211 is 5-byte cond jump insn now since its offset
>>>> + * becomes 0x80 based on previous round (0x293 - 0x213 = 0x80).
>>>> + * At the same time, insn at 0x292 is a 2-byte insn since its offset is
>>>> + * -124.
>>>> + *
>>>> + * pass6 will repeat the same code as in pass4 and this will prevent
>>>> + * eventual convergence.
>>>> + *
>>>> + * To fix this issue, we need to break je (2->6 bytes) <-> jmp (5->2 bytes)
>>>> + * cycle in the above. Let us limit the positive offset for 8bit cond jump
>>>> + * insn to mamximum 124 (0x7c). This way, the jmp insn will be always 2-bytes,
>>>> + * and the jit pass can eventually converge.
>>>> + */
>>>> +static bool is_imm8_cond_offset(int value)
>>>> +{
>>>> +       return value <= 124 && value >= -128;
>>>> +}
>>>> +
>>>>    static bool is_simm32(s64 value)
>>>>    {
>>>>           return value == (s64)(s32)value;
>>>> @@ -2231,7 +2276,7 @@ st:                       if (is_imm8(insn->off))
>>>>                                   return -EFAULT;
>>>>                           }
>>>>                           jmp_offset = addrs[i + insn->off] - addrs[i];
>>>> -                       if (is_imm8(jmp_offset)) {
>>>> +                       if (is_imm8_cond_offset(jmp_offset)) {
>>>>                                   if (jmp_padding) {
>>>>                                           /* To keep the jmp_offset valid, the extra bytes are
>>>>                                            * padded before the jump insn, so we subtract the
>>>> --
>>>> 2.43.5
>>>>

next prev parent reply	other threads:[~2024-08-30 20:39 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-25 20:04 [PATCH bpf-next] bpf, x64: Fix a jit convergence issue Yonghong Song
2024-08-27 23:44 ` Andrii Nakryiko
2024-08-28 22:50   ` Yonghong Song
2024-08-29 17:27     ` Andrii Nakryiko
2024-08-30 20:39       ` Yonghong Song [this message]
2024-08-28  2:24 ` Alexei Starovoitov
2024-08-28 22:47   ` Yonghong Song
2024-08-28 23:44     ` Alexei Starovoitov
2024-08-29  1:54       ` Yonghong Song
2024-08-29 16:37         ` Alexei Starovoitov
2024-08-29 16:50           ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1c6adac-a150-4819-9f36-4d0340e6eb08@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hodgesd@meta.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.