* Verifier - wild instructions count fluctiations between versions?
@ 2024-09-23 18:35 Alasdair McWilliam
2024-09-24 2:26 ` Eduard Zingerman
0 siblings, 1 reply; 3+ messages in thread
From: Alasdair McWilliam @ 2024-09-23 18:35 UTC (permalink / raw)
To: bpf
Hello,
First post so please be gentle :-)
I've got an eBPF workload running on kernel 6.1 LTS and we're running great.
Use case actually is using eBPF in combination with XDP and AF_XDP for
volumetric DDoS mitigation.
Makeup of the eBPF program is mostly packet parsing, LPM and map
lookups, and 2x calls to the bpf_loop() helper. Currently no iterators,
dynptrs, etc, but lots of switch-case blocks.
I've started to test newer kernel versions in preparation to upgrade our
stack from 6.1 LTS to 6.6 LTS to gain access to newer functionality and
just for future proofing. However, when loading the BPF object code on a
6.6 kernel, the BPF verifier refuses to load the program that 6.1
accepts and runs well.
This caught me by surprise, because I have witnessed our stack boot
successfully on a 6.7 kernel. So, I've run veristat [0] on the exact
same eBPF object file, compiled by clang17, but each time running on a
different kernel version. Results fluctuate wildly!
Results on 6.1.106: success: 53687 insns and 5114 states [1]
Results on 6.6.52: failure: 1000001 insns and 39501 states [2]
Results on 6.7.9: success: 131418 insns and 8839 states [3]
I have done some searching around and have found references to faults
with bpf_loop around kernel 6.5, patches being backported to 6.6, but
also references to those fixes being difficult to backport to 6.1. Being
truthful, it does feel like bpf_loop is perhaps not working properly in 6.6.
I am going to undertake some more testing on much newer kernels. While
6.7.9 loads the program OK, it's still more than double the instruction
count of 6.1, when obviously the binary isn't changing.
In the meantime, I am wondering if someone might be able to advise if
this is a known issue with 6.6 and the possibility of pending
improvements in the 6.6 branch? Appreciate that isn't easy to answer
without visiblity of the code. Happy to post a repo link if it would help.
Perhaps it might be better to simply write off the 6.6 branch and wait
for the next LTS branch as we are approaching end of year.
Many thanks for any insight anyone can offer!
Kind regards
Alasdair
[0] Exact command run each time is:
$ sudo veristat -e verdict,duration,insns,states,peak_states krn.bpf
[1] Results on 6.1.106:
Verdict Duration (us) Insns States Peak states
------- ------------- ----- ------ -----------
success 23763 53687 5114 1953
------- ------------- ----- ------ -----------
[2] Results on 6.6.52:
Verdict Duration (us) Insns States Peak states
------- ------------- ------- ------ -----------
failure 325270 1000001 39501 866
------- ------------- ------- ------ -----------
[3] Results on 6.7.9:
Verdict Duration (us) Insns States Peak states
------- ------------- ------ ------ -----------
success 56959 131418 8839 2713
------- ------------- ------ ------ -----------
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Verifier - wild instructions count fluctiations between versions?
2024-09-23 18:35 Verifier - wild instructions count fluctiations between versions? Alasdair McWilliam
@ 2024-09-24 2:26 ` Eduard Zingerman
2024-10-22 6:13 ` Shung-Hsi Yu
0 siblings, 1 reply; 3+ messages in thread
From: Eduard Zingerman @ 2024-09-24 2:26 UTC (permalink / raw)
To: Alasdair McWilliam, bpf; +Cc: Shung-Hsi Yu
[-- Attachment #1: Type: text/plain, Size: 2540 bytes --]
On Mon, 2024-09-23 at 19:35 +0100, Alasdair McWilliam wrote:
> Hello,
>
> First post so please be gentle :-)
>
> I've got an eBPF workload running on kernel 6.1 LTS and we're running great.
>
> Use case actually is using eBPF in combination with XDP and AF_XDP for
> volumetric DDoS mitigation.
>
> Makeup of the eBPF program is mostly packet parsing, LPM and map
> lookups, and 2x calls to the bpf_loop() helper. Currently no iterators,
> dynptrs, etc, but lots of switch-case blocks.
>
> I've started to test newer kernel versions in preparation to upgrade our
> stack from 6.1 LTS to 6.6 LTS to gain access to newer functionality and
> just for future proofing. However, when loading the BPF object code on a
> 6.6 kernel, the BPF verifier refuses to load the program that 6.1
> accepts and runs well.
>
> This caught me by surprise, because I have witnessed our stack boot
> successfully on a 6.7 kernel. So, I've run veristat [0] on the exact
> same eBPF object file, compiled by clang17, but each time running on a
> different kernel version. Results fluctuate wildly!
>
> Results on 6.1.106: success: 53687 insns and 5114 states [1]
> Results on 6.6.52: failure: 1000001 insns and 39501 states [2]
> Results on 6.7.9: success: 131418 insns and 8839 states [3]
Hi Alasdair,
It might be the case that your issues with bpf_loop() are triggered by
the following commit:
- "bpf: verify callbacks as if they are called unknown number of times":
- ab5cfac139ab for 6.7.y
- b43550d7d58e for 6.6.y
- not backported to 6.1.y
This commit is a correctness fix, w/o it bodies of the loop callbacks
were not checked exhaustively. But side effect of this fix is
significant verification time regression for some programs.
Comparing BPF related commits in both branches (starting from merge
base, using script from the attachment) gives somewhat sporadic
results:
Commits stats:
only in stable/linux-6.6.y : 50
only in stable/linux-6.7.y : 96
common : 74
Only in stable/linux-6.6.y:
...
Only in stable/linux-6.7.y:
...
Of these only "bpf: Improve JEQ/JNE branch taken logic" from 6.7
looks like an optimization, however it did not show any changes in
veristat data for selftests.
=> it's hard to say what's missing from 6.6 for your use-case.
Maybe let's discuss options for your program optimization
with regards to verifier performance?
Thanks,
Eduard
P.S. hope I did not mess up the script.
[-- Attachment #2: compare-bpf-commits-in-branches.sh --]
[-- Type: application/x-shellscript, Size: 916 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Verifier - wild instructions count fluctiations between versions?
2024-09-24 2:26 ` Eduard Zingerman
@ 2024-10-22 6:13 ` Shung-Hsi Yu
0 siblings, 0 replies; 3+ messages in thread
From: Shung-Hsi Yu @ 2024-10-22 6:13 UTC (permalink / raw)
To: Alasdair McWilliam; +Cc: bpf, Eduard Zingerman
Hi,
Sorry for coming to this late. Replies are in-line/interleaved, so some
of my comments might be hidden by email client.
On Mon, Sep 23, 2024 at 07:26:25PM GMT, Eduard Zingerman wrote:
> On Mon, 2024-09-23 at 19:35 +0100, Alasdair McWilliam wrote:
> > Hello,
> >
> > First post so please be gentle :-)
> >
> > I've got an eBPF workload running on kernel 6.1 LTS and we're running great.
> >
> > Use case actually is using eBPF in combination with XDP and AF_XDP for
> > volumetric DDoS mitigation.
> >
> > Makeup of the eBPF program is mostly packet parsing, LPM and map
> > lookups, and 2x calls to the bpf_loop() helper. Currently no iterators,
> > dynptrs, etc, but lots of switch-case blocks.
> >
> > I've started to test newer kernel versions in preparation to upgrade our
> > stack from 6.1 LTS to 6.6 LTS to gain access to newer functionality and
> > just for future proofing. However, when loading the BPF object code on a
> > 6.6 kernel, the BPF verifier refuses to load the program that 6.1
> > accepts and runs well.
> >
> > This caught me by surprise, because I have witnessed our stack boot
> > successfully on a 6.7 kernel. So, I've run veristat [0] on the exact
> > same eBPF object file, compiled by clang17, but each time running on a
> > different kernel version. Results fluctuate wildly!
> >
> > Results on 6.1.106: success: 53687 insns and 5114 states [1]
> > Results on 6.6.52: failure: 1000001 insns and 39501 states [2]
> > Results on 6.7.9: success: 131418 insns and 8839 states [3]
>
> Hi Alasdair,
>
> It might be the case that your issues with bpf_loop() are triggered by
> the following commit:
> - "bpf: verify callbacks as if they are called unknown number of times":
> - ab5cfac139ab for 6.7.y
> - b43550d7d58e for 6.6.y
> - not backported to 6.1.y
>
> This commit is a correctness fix, w/o it bodies of the loop callbacks
> were not checked exhaustively. But side effect of this fix is
> significant verification time regression for some programs.
>
> Comparing BPF related commits in both branches (starting from merge
> base, using script from the attachment) gives somewhat sporadic
> results:
>
> Commits stats:
> only in stable/linux-6.6.y : 50
> only in stable/linux-6.7.y : 96
> common : 74
>
> Only in stable/linux-6.6.y:
> ...
>
> Only in stable/linux-6.7.y:
> ...
> Of these only "bpf: Improve JEQ/JNE branch taken logic" from 6.7
> looks like an optimization, however it did not show any changes in
> veristat data for selftests.
I've also tried to look at this using a different script based on
in-house tool and come to roughly the same conclusion on the 6.7 side.
Nothing specifically strikes out to me in 6.7 that would explain the
difference.
OTOH 6.7.9 is _missing_ a fix that was backported to 6.6.52 --
e9a8e5a587ca "bpf: check bpf_func_state->callback_depth when pruning
states". It was backported to 6.7.10, bu 6.7.9 doesn't have it yet.
Since it prevents (improper) pruning, it could explain what we're seeing
here.
@Alasdair could you give 6.7.12 a quick try (I suppose that would be
easier since you already tested 6.7.9) and see how it goes there?
Additionally, here's v6.1.y branch, containing the "bpf: verify
callbacks as if they are called unknown number of times" fix Eduard
mentioned,
https://github.com/shunghsiyu/linux/tree/stable/linux-6.1.y-callback-fixes-w-subprog-precision-v1
that I plan to submit (though long overdue). If @Alasdair could also
test it out it is highly appreciated.
Let me know if there's anything that would make things easier.
Thanks,
Shung-Hsi
> => it's hard to say what's missing from 6.6 for your use-case.
>
> Maybe let's discuss options for your program optimization
> with regards to verifier performance?
>
> Thanks,
> Eduard
>
> P.S. hope I did not mess up the script.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-10-22 6:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-23 18:35 Verifier - wild instructions count fluctiations between versions? Alasdair McWilliam
2024-09-24 2:26 ` Eduard Zingerman
2024-10-22 6:13 ` Shung-Hsi Yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox