From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-174.mta1.migadu.com (out-174.mta1.migadu.com [95.215.58.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C208149C6F for ; Thu, 6 Nov 2025 03:01:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762398073; cv=none; b=tAwGygf3o1TjWO5tvhMimHQ2ng3z2ld8bSXfwV8PhTfRxMnIXNzkc3OJpZ5ImI02N4aknCieEq2N3DgkePsOLF/S6hyCdjXNXHrbPdJvFk7HSXa8nsA33E4uDXweohE6su14upJ16W0exxOk7Gn8J5gBNbEnRJHMoyQfnFlNtdU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762398073; c=relaxed/simple; bh=Ar8t7AM7dt01VgA6369kRtmJu6RPlazED4nqbFfwfLo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Xw2YOwczFDs2U0urGOQoagNbpR0WV+lGBNBJAcc6yqxva3XteFh6VAyMnBuuRVdPgf9YRSoVjU/Px04oQ0BQYOEZKLsD3JjPinqGF6sB6ZIYmWCtktZ82XaloJXJiilTFhPZKDdwm2hezAMAlZdEhUlXOeoe4hUp69vZDfTJDD8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=G4MZBgd6; arc=none smtp.client-ip=95.215.58.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="G4MZBgd6" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1762398069; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JpkLmNEYC6E/tP/leGwnP6EjvE6U47KxTKs7Ft7K77g=; b=G4MZBgd6DYLPubGKnIQd7l9UtQlH3uR9Pwv95AIkg39rSnDq0Jzk23FVgoddtTQvqqJe56 64pYv1GPsdMIkDG3adcxm/KRjviwQjVjLTTuIu5oqq8Vf8ULjpKO+GIjLN/HaUQMeoIqhh g/2rh1Lap8r4aJJ9AHHIF4MVV/zHrl8= From: Menglong Dong To: Alexei Starovoitov Cc: Peter Zijlstra , Menglong Dong , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , "David S. Miller" , David Ahern , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , X86 ML , "H. Peter Anvin" , jiang.biao@linux.dev, bpf , Network Development , LKML Subject: Re: [PATCH bpf-next] bpf,x86: do RSB balance for trampoline Date: Thu, 06 Nov 2025 11:00:53 +0800 Message-ID: <2243066.irdbgypaU6@7950hx> In-Reply-To: References: <20251104104913.689439-1-dongml2@chinatelecom.cn> <2388519.ElGaqSPkdT@7950hx> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-Migadu-Flow: FLOW_OUT On 2025/11/6 10:56, Alexei Starovoitov wrote: > On Wed, Nov 5, 2025 at 6:49=E2=80=AFPM Menglong Dong wrote: > > > > On 2025/11/6 09:40, Menglong Dong wrote: > > > On 2025/11/6 07:31, Alexei Starovoitov wrote: > > > > On Tue, Nov 4, 2025 at 11:47=E2=80=AFPM Menglong Dong wrote: > > > > > > > > > > On 2025/11/5 15:13, Menglong Dong wrote: > > > > > > On 2025/11/5 10:12, Alexei Starovoitov wrote: > > > > > > > On Tue, Nov 4, 2025 at 5:30=E2=80=AFPM Menglong Dong wrote: > > > > > > > > > > > > > > > > On 2025/11/5 02:56, Alexei Starovoitov wrote: > > > > > > > > > On Tue, Nov 4, 2025 at 2:49=E2=80=AFAM Menglong Dong wrote: > > > > > > > > > > > > > > > > > > > > In origin call case, we skip the "rip" directly before = we return, which > > > > > > > > > > break the RSB, as we have twice "call", but only once "= ret". > > > > > > > > > > > > > > > > > > RSB meaning return stack buffer? > > > > > > > > > > > > > > > > > > and by "breaks RSB" you mean it makes the cpu less effici= ent? > > > > > > > > > > > > > > > > Yeah, I mean it makes the cpu less efficient. The RSB is us= ed > > > > > > > > for the branch predicting, and it will push the "rip" to it= s hardware > > > > > > > > stack on "call", and pop it from the stack on "ret". In the= origin > > > > > > > > call case, there are twice "call" but once "ret", will brea= k its > > > > > > > > balance. > > > > > > > > > > > > > > Yes. I'm aware, but your "mov [rbp + 8], rax" screws it up as= well, > > > > > > > since RSB has to be updated/invalidated by this store. > > > > > > > The behavior depends on the microarchitecture, of course. > > > > > > > I think: > > > > > > > add rsp, 8 > > > > > > > ret > > > > > > > will only screw up the return prediction, but won't invalidat= e RSB. > > > > > > > > > > > > > > > Similar things happen in "return_to_handler" in ftrace_64.S, > > > > > > > > which has once "call", but twice "ret". And it pretend a "c= all" > > > > > > > > to make it balance. > > > > > > > > > > > > > > This makes more sense to me. Let's try that approach instead > > > > > > > of messing with the return address on stack? > > > > > > > > > > > > The way here is similar to the "return_to_handler". For the ftr= ace, > > > > > > the origin stack before the "ret" of the traced function is: > > > > > > > > > > > > POS: > > > > > > rip ---> return_to_handler > > > > > > > > > > > > And the exit of the traced function will jump to return_to_hand= ler. > > > > > > In return_to_handler, it will query the real "rip" of the trace= d function > > > > > > and the it call a internal function: > > > > > > > > > > > > call .Ldo_rop > > > > > > > > > > > > And the stack now is: > > > > > > > > > > > > POS: > > > > > > rip ----> the address after "call .Ldo_rop", which is a "= int3" > > > > > > > > > > > > in the .Ldo_rop, it will modify the rip to the real rip to make > > > > > > it like this: > > > > > > > > > > > > POS: > > > > > > rip ---> real rip > > > > > > > > > > > > And it return. Take the target function "foo" for example, the = logic > > > > > > of it is: > > > > > > > > > > > > call foo -> call ftrace_caller -> return ftrace_caller -> > > > > > > return return_to_handler -> call Ldo_rop -> return foo > > > > > > > > > > > > As you can see, the call and return address for ".Ldo_rop" is > > > > > > also messed up. So I think it works here too. Compared with > > > > > > a messed "return address", a missed return maybe have > > > > > > better influence? > > > > > > > > > > > > And the whole logic for us is: > > > > > > > > > > > > call foo -> call trampoline -> call origin -> > > > > > > return origin -> return POS -> return foo > > > > > > > > > > The "return POS" will miss the RSB, but the later return > > > > > will hit it. > > > > > > > > > > The origin logic is: > > > > > > > > > > call foo -> call trampoline -> call origin -> > > > > > return origin -> return foo > > > > > > > > > > The "return foo" and all the later return will miss the RBS. > > > > > > > > > > Hmm......Not sure if I understand it correctly. > > > > > > > > Here another idea... > > > > hack tr->func.ftrace_managed =3D false temporarily > > > > and use BPF_MOD_JUMP in bpf_arch_text_poke() > > > > when installing trampoline with fexit progs. > > > > and also do: > > > > @@ -3437,10 +3437,6 @@ static int __arch_prepare_bpf_trampoline(str= uct > > > > bpf_tramp_image *im, void *rw_im > > > > > > > > emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off); > > > > EMIT1(0xC9); /* leave */ > > > > - if (flags & BPF_TRAMP_F_SKIP_FRAME) { > > > > - /* skip our return address and return to parent */ > > > > - EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */ > > > > - } > > > > emit_return(&prog, image + (prog - (u8 *)rw_image)); > > > > > > > > Then RSB is perfectly matched without messing up the stack > > > > and/or extra calls. > > > > If it works and performance is good the next step is to > > > > teach ftrace to emit jmp or call in *_ftrace_direct() > > > > After the modification, the performance of fexit increase from > > 76M/s to 137M/s, awesome! >=20 > Nice! much better than double 'ret' :) > _ftrace_direct() next? Yeah, I'll do these stuff with _ftrace_direct(). >=20