public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed
From: "Björn Töpel" <bjorn@kernel.org>
To: Pu Lehui <pulehui@huawei.com>, Pu Lehui <pulehui@huaweicloud.com>,
	bpf@vger.kernel.org, linux-riscv@lists.infradead.org,
	netdev@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Palmer Dabbelt <palmer@dabbelt.com>,
	Conor Dooley <conor@kernel.org>,
	Luke Nelson <luke.r.nels@gmail.com>
Subject: Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
Date: Tue, 30 Jan 2024 17:03:49 +0100	[thread overview]
Message-ID: <87le86q04a.fsf@all.your.base.are.belong.to.us> (raw)
In-Reply-To: <5a30caa3-3351-41e7-a77f-91e5959b2da6@huawei.com>

Pu Lehui <pulehui@huawei.com> writes:

> On 2024/1/30 21:28, Björn Töpel wrote:
>> Pu Lehui <pulehui@huawei.com> writes:
>> 
>>> On 2024/1/30 16:29, Björn Töpel wrote:
>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>
>>>>> On 2023/9/28 17:59, Björn Töpel wrote:
>>>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>>>
>>>>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>>>>
>>>>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>>>>> the TCC of the parent process cannot be restored when the subprocess
>>>>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>>>>> registers into the stack, we cannot use the callee saved register to
>>>>>>> pass the TCC, otherwise the original value of the callee saved register
>>>>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>>>>> the TCC between functions, and saving that register to the stack to
>>>>>>> protect the TCC value. At the same time, we also consider the scenario
>>>>>>> of mixing trampoline.
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>>>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>>>>> here, so that the pass-TCC-via-register can be used, and the additional
>>>>>> stack access can be avoided.
>>>>>>
>>>>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>>>>> "state" variable/transparent argument/return value. As you point out, we
>>>>>> loose this when we do a call. On (any) calls we move the TCC to a
>>>>>> callee-saved register.
>>>>>>
>>>>>> WDYT about the following scheme:
>>>>>>
>>>>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>>>>      for the main program.
>>>>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>>>>      a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>>>>> 3 For all other calls, a6 is passed transparently.
>>>>>>
>>>>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>>>>> a BPF helper or not.
>>>>>>
>>>>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>>>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>>>>> store/loads.
>>>>>>
>>>>>
>>>>> Valuable scheme. But we need to consider TCC back propagation. Let me
>>>>> show an example of calling subprog with TCC stored in A6:
>>>>>
>>>>> prog1(TCC==1){
>>>>>        subprog1(TCC==1)
>>>>>            -> tailcall1(TCC==0)
>>>>>                -> subprog2(TCC==0)
>>>>>        subprog3(TCC==0) <--- should be TCC==1
>>>>>            -\-> tailcall2 <--- can't be called
>>>>> }
>>>
>>> Let's back with this example again. Imagine that the tailcall chain is a
>>> list limited to 33 elements. When the list has 32 elements, we call
>>> subprog1 and then tailcall1. At this time, the list elements count
>>> becomes 33. Then we call subprog2 and return prog1. At this time, the
>>> list removes 1 element and becomes 32 elements. At this time, there
>>> still can perform 1 tailcall.
>>>
>>> I've attached a diagram that shows mixing tailcall and subprogs is
>>> nearly a "call". It can return to caller function.
>> 
>> Hmm. Let me put my Q in another way.
>> 
>> The kernel calls into BPF_PROG_RUN() (~a BPF context). Would it ever be
>> OK to do more than 33 tail calls, regardless of subprogs or not?
>> 
>> In your example, TCC is 1. You are allowed to perform one tail call. In
>> your example prog1 performs two.
>> 
>> My view of TCC has always been ~a counter of the number of tailcalls~.
>> 
>> With your example expanded:
>> prog1(TCC==33){
>>        subprog1(TCC==33)
>>            -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> ... // 33 times
>>        // Lehui says TCC should be 33 again.
>>        // Björn says "it's the number of tailcalls", and subprog3 cannot perform a tail call
>>        subprog3(TCC==?)
>
> Yes, my view is take this something like a stack,while you take this as 
> a fixed global value.
>
> prog1(TCC==33){
>      subprog1(TCC==33)
>          -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> 
> tailcall1(TCC==31) -> ... // 33 times -> subprog2(TCC==0)
>      subprog3(TCC==33)
> 	-> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> 
> ... // 33 times
>
>>            
>> My view has, again, been than TCC is a run-time count of the number
>> tailcalls (fentry/fexit patch bpf-programs included).
>> 
>> What does x86 and arm64 do?
>
> When subprog return back to caller bpf program, they both restore TCC to 
> the value when enter into subprog. The ARM64 uses the callee saved 
> register to store the TCC. When the ARM64 exits, the TCC is restored to 
> the value when it enter. The while x86 uses the stack to do the same thing.

Ok! Thanks for clarifying. I'll continue reviewing the v2 of your
series!

BTW, I wonder if we can trigger this [1] on RV64 -- i.e. calling the
main prog, will reset the tcc count.

[1] https://lore.kernel.org/bpf/20240104142226.87869-1-hffilwlqm@gmail.com/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2024-01-30 16:04 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-19  3:57 [PATCH bpf-next 0/4] Mixing bpf2bpf and tailcalls for RV64 Pu Lehui
2023-09-19  3:57 ` [PATCH bpf-next 1/4] riscv, bpf: Remove redundant ctx->offset initialization Pu Lehui
2023-09-19  3:57 ` [PATCH bpf-next 2/4] riscv, bpf: Using kvcalloc to allocate cache buffer Pu Lehui
2023-09-19  3:57 ` [PATCH bpf-next 3/4] riscv, bpf: Add RV_TAILCALL_OFFSET macro to format tailcall offset Pu Lehui
2023-09-19  3:57 ` [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls Pu Lehui
2023-09-19 10:04   ` Conor Dooley
2023-09-19 10:54     ` Conor Dooley
2023-09-19 11:23     ` Pu Lehui
2023-09-19 11:50       ` Conor Dooley
2023-09-19 12:01         ` Pu Lehui
2023-09-28  9:59   ` Björn Töpel
2023-09-28 10:39     ` Pu Lehui
2024-01-16 14:21     ` Pu Lehui
2024-01-30  8:29       ` Björn Töpel
2024-01-30  9:14         ` Pu Lehui
2024-01-30 13:28           ` Björn Töpel
2024-01-30 14:10             ` Pu Lehui
2024-01-30 16:03               ` Björn Töpel [this message]
2024-01-31  9:18                 ` Pu Lehui
2024-01-30  3:26   ` Pu Lehui
2023-09-26 13:30 ` [PATCH bpf-next 0/4] Mixing bpf2bpf and tailcalls for RV64 Björn Töpel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87le86q04a.fsf@all.your.base.are.belong.to.us \
    --to=bjorn@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=conor@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luke.r.nels@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=pulehui@huawei.com \
    --cc=pulehui@huaweicloud.com \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox