BPF List
 help / color / mirror / Atom feed
* Verifier aspects of (was Re: gcc [RFC] bpf: don't synthesize sign_extend for ISA v4)
       [not found] <20260413185834.58706-1-vineet.gupta@linux.dev>
@ 2026-06-30 18:05 ` Vineet Gupta
  2026-07-01 17:58   ` Yonghong Song
  0 siblings, 1 reply; 2+ messages in thread
From: Vineet Gupta @ 2026-06-30 18:05 UTC (permalink / raw)
  To: bpf
  Cc: jose.marchesi, ast, Eduard Zingerman, Yonghong Song, David Faust,
	Alexei Starovoitov, bpf

On 4/13/26 11:58 AM, Vineet Gupta wrote:
> Currently the extendsidi2 expander generates a shift left+right to
> materialize a sign_extend. This is not needed for ISA v4 which
> natively supports the sign extending move.
>
> There are various other reasons for fixing this:
>
>   - The shifts are throwaway work since Combine subsequently undoes them
>     anyways, while diminishing potential opportunities for other optim
>     transformations it could have done, given to its own limitations of
>     max 3 -> 2 combinations.
>
>   - Interim passes scuh as CSE1 can also "see thru" a native sign_extend:DI
>     better than the shifts, again opening up more optimization opportunites.
>
>   - It also helps slightly with debugging Expand dumps as sign_extend is
>     obvious vs. 2 shifts.
>
> The fix itself is easy: just gate the expander on non-availabilty of smov
> which in turn is keyed off of -msmov or -mcpu.
>
>                   Before                              After
> ----------------------------------------+---------------------------------------
> (insn 8 7 9 2 (set (reg:DI 19 [ _1 ])   |  (insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
>     (ashift:DI (subreg:DI (reg:SI 24) 0) |     (sign_extend:DI (reg:SI 24)))
>        (const_int 32 [0x20])))           |       (nil))
>    (nil))                                |
> (insn 9 8 10 2 (set (reg:DI 19 [ _1 ])  |
>     (ashiftrt:DI (reg:DI 19 [ _1 ])      |
>        (const_int 32 [0x20])))           |
>    (nil))                                |
>
> This change is clean running bpf.exp testsuite tests.
>
> The complication comes from a subtle yet very important aspect, which is
> in the old regime, expander unconditionally forced src operand to DImode.
>
> |  operands[1] = gen_lowpart (DImode, operands[1])
>
> With the patch, src remains SImode which exposes some latent issues in
> the backend and/or "impedance mismatch [1]" with the kernel verifier and it
> reports following additional fails vs. gcc trunk against kernel bpf-next [2]
>
> | #12      attach_probe:FAIL
> | #74      cgroup_xattr:FAIL
> | #122     file_reader:FAIL
> | #412     sockopt_inherit:FAIL
> | #413     sockopt_multi:FAIL

For sockopt_multi, the trunk version we have the following snippet:

    0: (81) r0 = *(s32 *)(r1 +24)         ; R0=scalar(smin=0xffffffff80000000,smax=0x7fffffff) R1=ctx()
    1: (55) if r0 != 0x0 goto pc+10       ; R0=0
    ...
    11: (95) exit

Insn 1 branch is not taken, so upon exit, verifier is happy that r0 == 0.

W/ the patch, codegen changes slightly

    0: (61) r2 = *(u32 *)(r1 +24)         ; R1=ctx() R2=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
    1: (bf) r0 = (s32)r2                  ; R0=scalar(smin=0xffffffff80000000,smax=0x7fffffff) R2=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
    2: (56) if w2 != 0x0 goto pc+10       ; R2=0
    ...
    12: (95) exit

Here's verifier can't conclude that r0 is 0 or 1 and complains.

For sign extending move, verifier: check_alu_op () can't establish that 
truncated+extended r2 value (r0) and r2 are and equivalent and delinks 
the scalar ids. So on insn 2 it can deduce that r2 == 0, it can't be 
backpropagated due to delinking of the regs. We initially thought this 
would be a simple "bug fix" in the verifier. But it seems this requires 
additional state tracking - currently all 64-bits are tracked with 
copy_register_state, this need to be broken down into separate lo/hi 
bits with additional link kind similar to BPF_ADD_CONST - will that be 
acceptable ?

What's also interesting is for a reduced version the codegen difference 
between gcc trunk and clang.
The first load is a regular signed int access so technically it needs to 
be either *(s32 *) or a *(u32*) + (s32).
Clang seems to be eliding sign extension altogether [1] and only 
generating *(u32 *)
Any thoughts why clang can do that ?

[1] https://godbolt.org/z/fsdrExPWd

Thx,
-Vineet

> | #415     sockopt_sk:FAIL
> | #517     uprobe_multi_test:FAIL
> | #519     usdt:FAIL
> | #649     verif_scale_pyperf600_iter:FAIL
>
> These fallout issues need to be addressed first but as discussed in the
> bpf patchworks call this morning, I'm sending this out as an RFC.
>
> [1] https://devblogs.microsoft.com/oldnewthing/20180123-00/?p=97865
> [2] 2026-02-18 f620af11c27b ("xsk: avoid double checking against rx queue being full")
>
> gcc/ChangeLog:
>
> 	* config/bpf/bpf.md (define_expand extendsidi2): Only emit shifts
> 	if !bpf_has_smov.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/bpf/zero-ext.c: Check Expand dumps to ensure
> 	sign_extend is present and two shifts are not.
>
> Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
> ---
>   gcc/config/bpf/bpf.md                   | 12 ++++++++----
>   gcc/testsuite/gcc.target/bpf/zero-ext.c |  5 ++++-
>   2 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index a2bceb8998d7..1431bfd225e0 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -308,16 +308,20 @@
>   
>   ;; Sign-extending a 32-bit value into a 64-bit value is achieved using
>   ;; shifting, with instructions generated by the expand below.
> +;; Only needed for ISA V3 and prior builds.
>   
>   (define_expand "extendsidi2"
>     [(set (match_operand:DI 0 "register_operand")
>   	(sign_extend:DI (match_operand:SI 1 "register_operand")))]
>     ""
>   {
> -  operands[1] = gen_lowpart (DImode, operands[1]);
> -  emit_insn (gen_ashldi3 (operands[0], operands[1], GEN_INT (32)));
> -  emit_insn (gen_ashrdi3 (operands[0], operands[0], GEN_INT (32)));
> -  DONE;
> +  if (!bpf_has_smov)
> +    {
> +      operands[1] = gen_lowpart (DImode, operands[1]);
> +      emit_insn (gen_ashldi3 (operands[0], operands[1], GEN_INT (32)));
> +      emit_insn (gen_ashrdi3 (operands[0], operands[0], GEN_INT (32)));
> +      DONE;
> +    }
>   })



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Verifier aspects of (was Re: gcc [RFC] bpf: don't synthesize sign_extend for ISA v4)
  2026-06-30 18:05 ` Verifier aspects of (was Re: gcc [RFC] bpf: don't synthesize sign_extend for ISA v4) Vineet Gupta
@ 2026-07-01 17:58   ` Yonghong Song
  0 siblings, 0 replies; 2+ messages in thread
From: Yonghong Song @ 2026-07-01 17:58 UTC (permalink / raw)
  To: Vineet Gupta, bpf; +Cc: jose.marchesi, ast, Eduard Zingerman, David Faust, bpf



On 6/30/26 11:05 AM, Vineet Gupta wrote:
> On 4/13/26 11:58 AM, Vineet Gupta wrote:
>> Currently the extendsidi2 expander generates a shift left+right to
>> materialize a sign_extend. This is not needed for ISA v4 which
>> natively supports the sign extending move.
>>
>> There are various other reasons for fixing this:
>>
>>   - The shifts are throwaway work since Combine subsequently undoes them
>>     anyways, while diminishing potential opportunities for other optim
>>     transformations it could have done, given to its own limitations of
>>     max 3 -> 2 combinations.
>>
>>   - Interim passes scuh as CSE1 can also "see thru" a native 
>> sign_extend:DI
>>     better than the shifts, again opening up more optimization 
>> opportunites.
>>
>>   - It also helps slightly with debugging Expand dumps as sign_extend is
>>     obvious vs. 2 shifts.
>>
>> The fix itself is easy: just gate the expander on non-availabilty of 
>> smov
>> which in turn is keyed off of -msmov or -mcpu.
>>
>>                   Before                              After
>> ----------------------------------------+--------------------------------------- 
>>
>> (insn 8 7 9 2 (set (reg:DI 19 [ _1 ])   |  (insn 8 7 9 2 (set (reg:DI 
>> 19 [ _1 ])
>>     (ashift:DI (subreg:DI (reg:SI 24) 0) |     (sign_extend:DI 
>> (reg:SI 24)))
>>        (const_int 32 [0x20])))           |       (nil))
>>    (nil))                                |
>> (insn 9 8 10 2 (set (reg:DI 19 [ _1 ])  |
>>     (ashiftrt:DI (reg:DI 19 [ _1 ])      |
>>        (const_int 32 [0x20])))           |
>>    (nil))                                |
>>
>> This change is clean running bpf.exp testsuite tests.
>>
>> The complication comes from a subtle yet very important aspect, which is
>> in the old regime, expander unconditionally forced src operand to 
>> DImode.
>>
>> |  operands[1] = gen_lowpart (DImode, operands[1])
>>
>> With the patch, src remains SImode which exposes some latent issues in
>> the backend and/or "impedance mismatch [1]" with the kernel verifier 
>> and it
>> reports following additional fails vs. gcc trunk against kernel 
>> bpf-next [2]
>>
>> | #12      attach_probe:FAIL
>> | #74      cgroup_xattr:FAIL
>> | #122     file_reader:FAIL
>> | #412     sockopt_inherit:FAIL
>> | #413     sockopt_multi:FAIL
>
> For sockopt_multi, the trunk version we have the following snippet:
>
>    0: (81) r0 = *(s32 *)(r1 +24)         ; 
> R0=scalar(smin=0xffffffff80000000,smax=0x7fffffff) R1=ctx()
>    1: (55) if r0 != 0x0 goto pc+10       ; R0=0
>    ...
>    11: (95) exit
>
> Insn 1 branch is not taken, so upon exit, verifier is happy that r0 == 0.
>
> W/ the patch, codegen changes slightly
>
>    0: (61) r2 = *(u32 *)(r1 +24)         ; R1=ctx() 
> R2=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>    1: (bf) r0 = (s32)r2                  ; 
> R0=scalar(smin=0xffffffff80000000,smax=0x7fffffff) 
> R2=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>    2: (56) if w2 != 0x0 goto pc+10       ; R2=0
>    ...
>    12: (95) exit
>
> Here's verifier can't conclude that r0 is 0 or 1 and complains.
>
> For sign extending move, verifier: check_alu_op () can't establish 
> that truncated+extended r2 value (r0) and r2 are and equivalent and 
> delinks the scalar ids. So on insn 2 it can deduce that r2 == 0, it 
> can't be backpropagated due to delinking of the regs. We initially 
> thought this would be a simple "bug fix" in the verifier. But it seems 
> this requires additional state tracking - currently all 64-bits are 
> tracked with copy_register_state, this need to be broken down into 
> separate lo/hi bits with additional link kind similar to BPF_ADD_CONST 
> - will that be acceptable ?
>
> What's also interesting is for a reduced version the codegen 
> difference between gcc trunk and clang.
> The first load is a regular signed int access so technically it needs 
> to be either *(s32 *) or a *(u32*) + (s32).
> Clang seems to be eliding sign extension altogether [1] and only 
> generating *(u32 *)
> Any thoughts why clang can do that ?

The clang didn't use s32 becomes there is no need.
Related code:

    struct bpf_sockopt {
       int level, optname, optlen, retval;
       void *optval_end, *optval;
    };
    ...
    if (ctx->level != 0 || ctx->optname != 1)
       goto out;

In such case, there is NO sign-extenstion need from 32bit to 64bit.
So llvm decides to just use unsigned load (LDW32).

>
> [1] https://godbolt.org/z/fsdrExPWd
>
> Thx,
> -Vineet
>
>> | #415     sockopt_sk:FAIL
>> | #517     uprobe_multi_test:FAIL
>> | #519     usdt:FAIL
>> | #649     verif_scale_pyperf600_iter:FAIL
>>
>> These fallout issues need to be addressed first but as discussed in the
>> bpf patchworks call this morning, I'm sending this out as an RFC.
>>
>> [1] https://devblogs.microsoft.com/oldnewthing/20180123-00/?p=97865
>> [2] 2026-02-18 f620af11c27b ("xsk: avoid double checking against rx 
>> queue being full")
>>
>> gcc/ChangeLog:
>>
>>     * config/bpf/bpf.md (define_expand extendsidi2): Only emit shifts
>>     if !bpf_has_smov.
>>
>> gcc/testsuite/ChangeLog:
>>
>>     * gcc.target/bpf/zero-ext.c: Check Expand dumps to ensure
>>     sign_extend is present and two shifts are not.
>>
>> Signed-off-by: Vineet Gupta <vineet.gupta@linux.dev>
>> ---
>>   gcc/config/bpf/bpf.md                   | 12 ++++++++----
>>   gcc/testsuite/gcc.target/bpf/zero-ext.c |  5 ++++-
>>   2 files changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
>> index a2bceb8998d7..1431bfd225e0 100644
>> --- a/gcc/config/bpf/bpf.md
>> +++ b/gcc/config/bpf/bpf.md
>> @@ -308,16 +308,20 @@
>>     ;; Sign-extending a 32-bit value into a 64-bit value is achieved 
>> using
>>   ;; shifting, with instructions generated by the expand below.
>> +;; Only needed for ISA V3 and prior builds.
>>     (define_expand "extendsidi2"
>>     [(set (match_operand:DI 0 "register_operand")
>>       (sign_extend:DI (match_operand:SI 1 "register_operand")))]
>>     ""
>>   {
>> -  operands[1] = gen_lowpart (DImode, operands[1]);
>> -  emit_insn (gen_ashldi3 (operands[0], operands[1], GEN_INT (32)));
>> -  emit_insn (gen_ashrdi3 (operands[0], operands[0], GEN_INT (32)));
>> -  DONE;
>> +  if (!bpf_has_smov)
>> +    {
>> +      operands[1] = gen_lowpart (DImode, operands[1]);
>> +      emit_insn (gen_ashldi3 (operands[0], operands[1], GEN_INT (32)));
>> +      emit_insn (gen_ashrdi3 (operands[0], operands[0], GEN_INT (32)));
>> +      DONE;
>> +    }
>>   })
>
>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-07-01 17:58 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260413185834.58706-1-vineet.gupta@linux.dev>
2026-06-30 18:05 ` Verifier aspects of (was Re: gcc [RFC] bpf: don't synthesize sign_extend for ISA v4) Vineet Gupta
2026-07-01 17:58   ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox