From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CF1B333452 for ; Tue, 11 Nov 2025 01:29:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.188 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762824549; cv=none; b=ED/IVIj0NBQhSmgm49iTNAgyGNQo2FK6KP0r+7YfXLj332uw4VjEkS1UGzeOMWLZ6VcLNcuMjkFmWmvY6TxYhm2JqHhK3CuuQGZSxV/8S6OsJbpAOhqHdfML7tZxPf3ex+WVvRKL8bNz9wRaNVsr5XDJ9szBVn/QGPpkBkb5fec= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762824549; c=relaxed/simple; bh=JniPNzR2zrzCfTZJfDvwxJryTw6qwXKnH7duhMCvPKI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VDtA2CkGFgaByCS92xdqCJ+wu2du64EDpAP+9YgOW1IntxOmxEs2H6sB82bDi2za6og06DtLyuXLVHm3SmYcjqIV4ixGr/OKWL4hyz2Is9QRAwo4Ju3xDllFTNq4dapjJ0RGgm6CVTp7QMZix0YZ7j95w6TWcsGw+4EPkAIlLxU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=X4TlHS5+; arc=none smtp.client-ip=95.215.58.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="X4TlHS5+" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1762824534; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jkCMn6326n5t3YdXIXOy3mMQo+0cpCyiOzgcX5JM5n4=; b=X4TlHS5+5+xX48of/7wvV+4pRZbxJ6vbhmMX8oHZjqDozXm4cUR5/ieDih6LJeK7X4K+/r fKx0kjVXNlgxsIxhATRBHjECnDn/V3065eVGpNu0DDq4ilwNkir6lyWYNoaZ2nnLdHBwJO Ob2bzVZl7fQIwwNUj9uuHfFLyods9Ms= From: Menglong Dong To: Alexei Starovoitov Cc: sjenning@redhat.com, Peter Zijlstra , Menglong Dong , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , "David S. Miller" , David Ahern , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , X86 ML , "H. Peter Anvin" , jiang.biao@linux.dev, bpf , Network Development , LKML Subject: Re: [PATCH bpf-next] bpf,x86: do RSB balance for trampoline Date: Tue, 11 Nov 2025 09:28:11 +0800 Message-ID: <5025905.GXAFRqVoOG@7950hx> In-Reply-To: References: <20251104104913.689439-1-dongml2@chinatelecom.cn> <13884259.uLZWGnKmhe@7950hx> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-Migadu-Flow: FLOW_OUT On 2025/11/11 00:32, Alexei Starovoitov wrote: > On Mon, Nov 10, 2025 at 3:43=E2=80=AFAM Menglong Dong wrote: > > > > > > Do you think if it is worth to implement the livepatch with > > bpf trampoline by introduce the CONFIG_LIVEPATCH_BPF? > > It's easy to achieve it, I have a POC for it, and the performance > > of the livepatch increase from 99M/s to 200M/s according to > > my bench testing. >=20 > what do you mean exactly? This is totally another thing, and we can talk about it later. Let me have a simple describe here. I mean to implement the livepatch by bpf trampoline. For now, the livepatch is implemented with ftrace, which will break the RSB and has more overhead in x86_64. It can be easily implemented by replace the "origin_call" with the address that livepatch offered. > I don't want to add more complexity to bpf trampoline. If you mean the arch-specification, it won't add the complexity. Otherwise, it can make it a little more simple in x86_64 with following patch: =2D-- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -3176,7 +3176,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_t= ramp_image *im, void *rw_im void *rw_image_end, void *image, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, =2D void *func_addr) + void *func_addr, void *origin_call_param) { int i, ret, nr_regs =3D m->nr_args, stack_size =3D 0; int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off; @@ -3280,6 +3280,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_t= ramp_image *im, void *rw_im orig_call +=3D ENDBR_INSN_SIZE; orig_call +=3D X86_PATCH_SIZE; } + orig_call =3D origin_call_param ?: orig_call; =20 prog =3D rw_image; =20 @@ -3369,15 +3370,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf= _tramp_image *im, void *rw_im LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack_size); } =20 =2D if (flags & BPF_TRAMP_F_ORIG_STACK) { =2D emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, 8); =2D EMIT2(0xff, 0xd3); /* call *rbx */ =2D } else { =2D /* call original function */ =2D if (emit_rsb_call(&prog, orig_call, image + (prog - (u8 *)rw_image)))= { =2D ret =3D -EINVAL; =2D goto cleanup; =2D } + /* call original function */ + if (emit_rsb_call(&prog, orig_call, image + (prog - (u8 *)rw_image))) { + ret =3D -EINVAL; + goto cleanup; } /* remember return value in a stack for bpf prog to access */ emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); > Improve current livepatching logic ? jmp vs call isn't special. Some kind. According to my testing, the performance of bpf trampoline is much better than ftrace trampoline, so if we can implement it with bpf trampoline, the performance can be improved. Of course, the bpf trampoline need to offer a API to the livepatch for this propose. Any way, let me finish the work in this patch first. After that, I can send a RFC of the proposal. Thanks! Menglong Dong >=20 > > The results above is tested with return-trunk disabled. With the > > return-trunk enabled, the performance decrease from 58M/s to > > 52M/s. The main performance improvement comes from the RSB, > > and the return-trunk will always break the RSB, which makes it has > > no improvement. The calling to per-cpu-ref get and put make > > the bpf trampoline based livepatch has a worse performance > > than ftrace based. > > > > Thanks! > > Menglong Dong > > > > > > > > > > > > > >=20 >=20