BPF List
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
To: Josh Poimboeuf <jpoimboe@kernel.org>, Miroslav Benes <mbenes@suse.cz>
Cc: <bpf@vger.kernel.org>, <live-patching@vger.kernel.org>,
	DL Linux Open Source Team <linux-open-source@crowdstrike.com>,
	Petr Mladek <pmladek@suse.com>, Song Liu <song@kernel.org>,
	<andrii@kernel.org>, Raja Khan <raja.khan@crowdstrike.com>
Subject: Re: [External] Re: BPF fentry/fexit trampolines stall livepatch stalls transition due to missing ORC unwind metadata
Date: Mon, 24 Nov 2025 17:06:04 -0500	[thread overview]
Message-ID: <d7b75cdc-a872-4425-a5f6-d41b1982cca7@crowdstrike.com> (raw)
In-Reply-To: <h4e7ar2fckfs6y2c2tm4lq4r54edzvqdq6cy5qctb7v3bi5s2u@q4hfzrlembrn>

On 11/21/25 19:56, Josh Poimboeuf wrote:
> On Thu, Nov 20, 2025 at 01:15:12PM +0100, Miroslav Benes wrote:
>>> Impact
>>>
>>> This affects production systems where:
>>> - Security/observability tools use BPF fentry/fexit hooks
>>> - Live kernel patching is required for security updates
>>> - Kernel threads may be blocked in hooked network/storage functions
>>>
>>> The livepatch transition can stall for 60+ seconds before failing, blocking
>>> critical security patches.
>>
>> Unfortunately yes.
>>
>>> Questions for the Community
>>>
>>> 1. Is this a known limitation (I assume yes) ?
>>
>> Yes.
>>
>>> 2. Runtime ORC generation? Could the BPF JIT generate ORC unwind entries for
>>> trampolines, similar to how ftrace trampolines are handled?
>>> 3. Trampoline registration? Could BPF trampolines register their address
>>> ranges with the ORC unwinder to avoid the "unreliable" marking?
>>> 4. Alternative unwinding? Could livepatch use an alternative unwinding method
>>> when BPF trampolines are detected (e.g., frame pointers with validation)?
>>> 5. Workarounds? I mention one bellow and I would be happy to hear if anyone
>>> has a better idea to propose ?
>>
>> There is a parallel discussion going on under sframe unwiding enablement
>> for arm64. See this subthread
>> https://urldefense.com/v3/__https://lore.kernel.org/all/CADBMgpwZ32*shSa0SwO8y4G-Zw14ae-FcoWreA_ptMf08Mu9dA@mail.gmail.com/T/*u__;KyM!!BmdzS3_lV9HdKG8!xBaPCVSvWNSvM982QEXiUrY2fkvotAVXyevfcIz2wDAiLfEC9tKjgVRR11EnkWJifLB3FMTh2ZVU9HwDPT0dZjkow7oFEw$
>>
>> I would really welcome if it is solved eventually because it seems we will
>> meet the described issue more and more often (Josh, I think this email
>> shows that it happens in practice with the existing monitoring services
>> based on BPF).
> 
> Maybe we can take advantage of the fact that BPF uses frame pointers
> unconditionally, and avoid the complexity of "dynamic ORC" for now, by
> just having BPF keep track of where the frame pointer is valid (after
> the prologue, before the epilogue).
> 
> Something like the below (completely untested).
> 
> Andrey, can you try this patch?

Hey Josh, thank you for looking, can you please advise the stable
kernel version you have made this changes on top off so I can cleanly
apply ? Alternatively just provide git commit sha in Linus
tree I can reset my branch to.


I will happily test this as soon as I can and report back.

Thanks,
Andrey

> 
> ---8<---
> 
> diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> index 977ee75e047c..f610fde2d5c4 100644
> --- a/arch/x86/kernel/unwind_orc.c
> +++ b/arch/x86/kernel/unwind_orc.c
> @@ -2,6 +2,7 @@
>   #include <linux/objtool.h>
>   #include <linux/module.h>
>   #include <linux/sort.h>
> +#include <linux/bpf.h>
>   #include <asm/ptrace.h>
>   #include <asm/stacktrace.h>
>   #include <asm/unwind.h>
> @@ -172,6 +173,25 @@ static struct orc_entry *orc_ftrace_find(unsigned long ip)
>   }
>   #endif
>   
> +/* Fake frame pointer entry -- used as a fallback for generated code */
> +static struct orc_entry orc_fp_entry = {
> +	.type		= ORC_TYPE_CALL,
> +	.sp_reg		= ORC_REG_BP,
> +	.sp_offset	= 16,
> +	.bp_reg		= ORC_REG_PREV_SP,
> +	.bp_offset	= -16,
> +};
> +
> +static struct orc_entry *orc_bpf_find(unsigned long ip)
> +{
> +#ifdef CONFIG_BPF_JIT
> +	if (bpf_has_frame_pointer(ip))
> +		return &orc_fp_entry;
> +#endif
> +
> +	return NULL;
> +}
> +
>   /*
>    * If we crash with IP==0, the last successfully executed instruction
>    * was probably an indirect function call with a NULL function pointer,
> @@ -186,15 +206,6 @@ static struct orc_entry null_orc_entry = {
>   	.type = ORC_TYPE_CALL
>   };
>   
> -/* Fake frame pointer entry -- used as a fallback for generated code */
> -static struct orc_entry orc_fp_entry = {
> -	.type		= ORC_TYPE_CALL,
> -	.sp_reg		= ORC_REG_BP,
> -	.sp_offset	= 16,
> -	.bp_reg		= ORC_REG_PREV_SP,
> -	.bp_offset	= -16,
> -};
> -
>   static struct orc_entry *orc_find(unsigned long ip)
>   {
>   	static struct orc_entry *orc;
> @@ -238,6 +249,11 @@ static struct orc_entry *orc_find(unsigned long ip)
>   	if (orc)
>   		return orc;
>   
> +	/* BPF lookup: */
> +	orc = orc_bpf_find(ip);
> +	if (orc)
> +		return orc;
> +
>   	return orc_ftrace_find(ip);
>   }
>   
> @@ -495,9 +511,8 @@ bool unwind_next_frame(struct unwind_state *state)
>   	if (!orc) {
>   		/*
>   		 * As a fallback, try to assume this code uses a frame pointer.
> -		 * This is useful for generated code, like BPF, which ORC
> -		 * doesn't know about.  This is just a guess, so the rest of
> -		 * the unwind is no longer considered reliable.
> +		 * This is just a guess, so the rest of the unwind is no longer
> +		 * considered reliable.
>   		 */
>   		orc = &orc_fp_entry;
>   		state->error = true;
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index de5083cb1d37..510e3e62fd2f 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1661,6 +1661,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
>   	emit_prologue(&prog, image, stack_depth,
>   		      bpf_prog_was_classic(bpf_prog), tail_call_reachable,
>   		      bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb);
> +
> +	bpf_prog->aux->ksym.fp_start = prog - temp;
> +
>   	/* Exception callback will clobber callee regs for its own use, and
>   	 * restore the original callee regs from main prog's stack frame.
>   	 */
> @@ -2716,6 +2719,8 @@ st:			if (is_imm8(insn->off))
>   					pop_r12(&prog);
>   			}
>   			EMIT1(0xC9);         /* leave */
> +			bpf_prog->aux->ksym.fp_end = prog - temp;
> +
>   			emit_return(&prog, image + addrs[i - 1] + (prog - temp));
>   			break;
>   
> @@ -3299,6 +3304,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>   	}
>   	EMIT1(0x55);		 /* push rbp */
>   	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
> +	im->ksym.fp_start = prog - (u8 *)rw_image;
> +
>   	if (!is_imm8(stack_size)) {
>   		/* sub rsp, stack_size */
>   		EMIT3_off32(0x48, 0x81, 0xEC, stack_size);
> @@ -3436,7 +3443,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>   		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
>   
>   	emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
> +
>   	EMIT1(0xC9); /* leave */
> +	im->ksym.fp_end = prog - (u8 *)rw_image;
> +
>   	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
>   		/* skip our return address and return to parent */
>   		EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index d808253f2e94..e3f56e8443da 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1257,6 +1257,8 @@ struct bpf_ksym {
>   	struct list_head	 lnode;
>   	struct latch_tree_node	 tnode;
>   	bool			 prog;
> +	u32			 fp_start;
> +	u32			 fp_end;
>   };
>   
>   enum bpf_tramp_prog_type {
> @@ -1483,6 +1485,7 @@ void bpf_image_ksym_add(struct bpf_ksym *ksym);
>   void bpf_image_ksym_del(struct bpf_ksym *ksym);
>   void bpf_ksym_add(struct bpf_ksym *ksym);
>   void bpf_ksym_del(struct bpf_ksym *ksym);
> +bool bpf_has_frame_pointer(unsigned long ip);
>   int bpf_jit_charge_modmem(u32 size);
>   void bpf_jit_uncharge_modmem(u32 size);
>   bool bpf_prog_has_trampoline(const struct bpf_prog *prog);
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index d595fe512498..7cd8382d1152 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -760,6 +760,22 @@ struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
>   	       NULL;
>   }
>   
> +bool bpf_has_frame_pointer(unsigned long ip)
> +{
> +	struct bpf_ksym *ksym;
> +	unsigned long offset;
> +
> +	guard(rcu)();
> +
> +	ksym = bpf_ksym_find(ip);
> +	if (!ksym || !ksym->fp_start || !ksym->fp_end)
> +		return false;
> +
> +	offset = ip - ksym->start;
> +
> +	return offset >= ksym->fp_start && offset < ksym->fp_end;
> +}
> +
>   const struct exception_table_entry *search_bpf_extables(unsigned long addr)
>   {
>   	const struct exception_table_entry *e = NULL;


  parent reply	other threads:[~2025-11-24 22:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19 15:41 BPF fentry/fexit trampolines stall livepatch stalls transition due to missing ORC unwind metadata Andrey Grodzovsky
2025-11-20 12:15 ` Miroslav Benes
2025-11-22  0:56   ` Josh Poimboeuf
2025-11-24 17:14     ` Alexei Starovoitov
2025-11-24 19:51       ` Josh Poimboeuf
2025-11-24 22:06     ` Andrey Grodzovsky [this message]
2025-11-24 22:51       ` [External] " Josh Poimboeuf
2025-11-24 22:54         ` Andrey Grodzovsky
2025-11-25  0:06           ` Josh Poimboeuf
2025-11-27 14:55             ` Andrey Grodzovsky
2025-12-01 20:59               ` Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7b75cdc-a872-4425-a5f6-d41b1982cca7@crowdstrike.com \
    --to=andrey.grodzovsky@crowdstrike.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=jpoimboe@kernel.org \
    --cc=linux-open-source@crowdstrike.com \
    --cc=live-patching@vger.kernel.org \
    --cc=mbenes@suse.cz \
    --cc=pmladek@suse.com \
    --cc=raja.khan@crowdstrike.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox