Re: [PATCH bpf-next] bpf,x86: do RSB balance for trampoline

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Menglong Dong <menglong.dong@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Menglong Dong <menglong8.dong@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	David Ahern <dsahern@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	X86 ML <x86@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	jiang.biao@linux.dev, bpf <bpf@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf-next] bpf,x86: do RSB balance for trampoline
Date: Thu, 06 Nov 2025 11:00:53 +0800	[thread overview]
Message-ID: <2243066.irdbgypaU6@7950hx> (raw)
In-Reply-To: <CAADnVQ+tUO_BJV8w1aPLiY50p7F+uk0GCWFgH0k5zLQBqAif1g@mail.gmail.com>

On 2025/11/6 10:56, Alexei Starovoitov wrote:
> On Wed, Nov 5, 2025 at 6:49 PM Menglong Dong <menglong.dong@linux.dev> wrote:
> >
> > On 2025/11/6 09:40, Menglong Dong wrote:
> > > On 2025/11/6 07:31, Alexei Starovoitov wrote:
> > > > On Tue, Nov 4, 2025 at 11:47 PM Menglong Dong <menglong.dong@linux.dev> wrote:
> > > > >
> > > > > On 2025/11/5 15:13, Menglong Dong wrote:
> > > > > > On 2025/11/5 10:12, Alexei Starovoitov wrote:
> > > > > > > On Tue, Nov 4, 2025 at 5:30 PM Menglong Dong <menglong.dong@linux.dev> wrote:
> > > > > > > >
> > > > > > > > On 2025/11/5 02:56, Alexei Starovoitov wrote:
> > > > > > > > > On Tue, Nov 4, 2025 at 2:49 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > In origin call case, we skip the "rip" directly before we return, which
> > > > > > > > > > break the RSB, as we have twice "call", but only once "ret".
> > > > > > > > >
> > > > > > > > > RSB meaning return stack buffer?
> > > > > > > > >
> > > > > > > > > and by "breaks RSB" you mean it makes the cpu less efficient?
> > > > > > > >
> > > > > > > > Yeah, I mean it makes the cpu less efficient. The RSB is used
> > > > > > > > for the branch predicting, and it will push the "rip" to its hardware
> > > > > > > > stack on "call", and pop it from the stack on "ret". In the origin
> > > > > > > > call case, there are twice "call" but once "ret", will break its
> > > > > > > > balance.
> > > > > > >
> > > > > > > Yes. I'm aware, but your "mov [rbp + 8], rax" screws it up as well,
> > > > > > > since RSB has to be updated/invalidated by this store.
> > > > > > > The behavior depends on the microarchitecture, of course.
> > > > > > > I think:
> > > > > > > add rsp, 8
> > > > > > > ret
> > > > > > > will only screw up the return prediction, but won't invalidate RSB.
> > > > > > >
> > > > > > > > Similar things happen in "return_to_handler" in ftrace_64.S,
> > > > > > > > which has once "call", but twice "ret". And it pretend a "call"
> > > > > > > > to make it balance.
> > > > > > >
> > > > > > > This makes more sense to me. Let's try that approach instead
> > > > > > > of messing with the return address on stack?
> > > > > >
> > > > > > The way here is similar to the "return_to_handler". For the ftrace,
> > > > > > the origin stack before the "ret" of the traced function is:
> > > > > >
> > > > > >     POS:
> > > > > >     rip   ---> return_to_handler
> > > > > >
> > > > > > And the exit of the traced function will jump to return_to_handler.
> > > > > > In return_to_handler, it will query the real "rip" of the traced function
> > > > > > and the it call a internal function:
> > > > > >
> > > > > >     call .Ldo_rop
> > > > > >
> > > > > > And the stack now is:
> > > > > >
> > > > > >     POS:
> > > > > >     rip   ----> the address after "call .Ldo_rop", which is a "int3"
> > > > > >
> > > > > > in the .Ldo_rop, it will modify the rip to the real rip to make
> > > > > > it like this:
> > > > > >
> > > > > >     POS:
> > > > > >     rip   ---> real rip
> > > > > >
> > > > > > And it return. Take the target function "foo" for example, the logic
> > > > > > of it is:
> > > > > >
> > > > > >     call foo -> call ftrace_caller -> return ftrace_caller ->
> > > > > >     return return_to_handler -> call Ldo_rop -> return foo
> > > > > >
> > > > > > As you can see, the call and return address for ".Ldo_rop" is
> > > > > > also messed up. So I think it works here too. Compared with
> > > > > > a messed "return address", a missed return maybe have
> > > > > > better influence?
> > > > > >
> > > > > > And the whole logic for us is:
> > > > > >
> > > > > >     call foo -> call trampoline -> call origin ->
> > > > > >     return origin -> return POS -> return foo
> > > > >
> > > > > The "return POS" will miss the RSB, but the later return
> > > > > will hit it.
> > > > >
> > > > > The origin logic is:
> > > > >
> > > > >      call foo -> call trampoline -> call origin ->
> > > > >      return origin -> return foo
> > > > >
> > > > > The "return foo" and all the later return will miss the RBS.
> > > > >
> > > > > Hmm......Not sure if I understand it correctly.
> > > >
> > > > Here another idea...
> > > > hack tr->func.ftrace_managed = false temporarily
> > > > and use BPF_MOD_JUMP in bpf_arch_text_poke()
> > > > when installing trampoline with fexit progs.
> > > > and also do:
> > > > @@ -3437,10 +3437,6 @@ static int __arch_prepare_bpf_trampoline(struct
> > > > bpf_tramp_image *im, void *rw_im
> > > >
> > > >         emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
> > > >         EMIT1(0xC9); /* leave */
> > > > -       if (flags & BPF_TRAMP_F_SKIP_FRAME) {
> > > > -               /* skip our return address and return to parent */
> > > > -               EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
> > > > -       }
> > > >         emit_return(&prog, image + (prog - (u8 *)rw_image));
> > > >
> > > > Then RSB is perfectly matched without messing up the stack
> > > > and/or extra calls.
> > > > If it works and performance is good the next step is to
> > > > teach ftrace to emit jmp or call in *_ftrace_direct()
> >
> > After the modification, the performance of fexit increase from
> > 76M/s to 137M/s, awesome!
> 
> Nice! much better than double 'ret' :)
> _ftrace_direct() next?

Yeah, I'll do these stuff with _ftrace_direct().

>

next prev parent reply	other threads:[~2025-11-06  3:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04 10:49 [PATCH bpf-next] bpf,x86: do RSB balance for trampoline Menglong Dong
2025-11-04 18:56 ` Alexei Starovoitov
2025-11-05  1:30   ` Menglong Dong
2025-11-05  2:12     ` Alexei Starovoitov
2025-11-05  7:13       ` Menglong Dong
2025-11-05  7:46         ` Menglong Dong
2025-11-05 23:31           ` Alexei Starovoitov
2025-11-06  1:40             ` Menglong Dong
2025-11-06  2:49               ` Menglong Dong
2025-11-06  2:56                 ` Alexei Starovoitov
2025-11-06  3:00                   ` Menglong Dong [this message]
2025-11-10 11:43                   ` Menglong Dong
2025-11-10 16:32                     ` Alexei Starovoitov
2025-11-11  1:28                       ` Menglong Dong
2025-11-11  2:41                         ` Alexei Starovoitov
2025-11-06 12:03           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2243066.irdbgypaU6@7950hx \
    --to=menglong.dong@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=hpa@zytor.com \
    --cc=jiang.biao@linux.dev \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=menglong8.dong@gmail.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).