From: Eduard Zingerman <eddyz87@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@linux.dev>,
Kernel Team <kernel-team@fb.com>,
Yonghong Song <yonghong.song@linux.dev>,
Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
Date: Fri, 28 Feb 2025 20:40:44 -0800 [thread overview]
Message-ID: <cc29975fbaf163d0c2ed904a9a4d6d9452177542.camel@gmail.com> (raw)
In-Reply-To: <CAADnVQ+BEW_yTsm-pMYcCsHhpZ4=FhAMmGvY7AhwyiUOZ+X1Gg@mail.gmail.com>
On Fri, 2025-02-28 at 18:10 -0800, Alexei Starovoitov wrote:
[...]
> I think the end goal is to get rid of mark_reg_read() and
> switch to proper live reg analysis.
> So please include the numbers to see how much work left.
Complete removal of mark_reg_read() means that analysis needs to be
done for stack slots as well. The algorithm to handle stack slots is
much more complicated:
- it needs to track register / stack slot type to handle cases like
"r1 = r10" and spills of the stack pointer to stack;
- it needs to track register values, at-least crudely, to handle cases
like "r1 = r10; r1 += r2;" (array access).
The worst case scenario, as you suggested, is just to assume stack
slots live, but it is a big verification performance hit.
Exact numbers are at the end of the email.
> Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> iirc we did it to support fine grain mark_insn_zext
> to help architectures where zext has to be inserted by JIT.
> I'm not sure whether new liveness has to do it as well.
As far as I understand, this is important for one check in
propagate_liveness(). And that check means something like:
"if this register was read as 64-bit value, remember that
it needs zero extension on 32-bit load".
Meaning that either DFA would need to track this bit of information
(should be simple), or more zero extensions would be added.
---
Repository [1] shared in cover letter was used for benchmarks below.
Abbreviations are as follows:
- Name: dfa-opts
Commit: b73005452a4a
Meaning: DFA as shared in this patch-set + a set of small
improvements which I decided to exclude from the
patch-set as described in the cover letter.
- Name: dfa-opts-no-rm
Commit: e486757fdada
Meaning: dfa-opts + read marks are disabled for registers.
- Name: dfa-opts-no-rm-sl
Commit: a9930e8127a9
Meaning: dfa-opts + read marks are disabled for registers
and stack.
[1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off
Veristat output is filtered using -f "states_pct>5" -f "!insns<200".
Veristat results are followed by a histogram that accounts for all
tests.
Two comparisons are made:
- dfa-opts vs dfa-opts-no-rm (small negative impact, except two
sched_ext programs that hit 1M instructions limit; positive impact
would have indicated a bug);
- dfa-opts vs dfa-opts-no-rm-sl (big negative impact).
========= selftests: dfa-opts vs dfa-opts-no-rm =========
File Program States (A) States (B) States (DIFF)
------------------------ ---------------- ---------- ---------- -------------
test_l4lb_noinline.bpf.o balancer_ingress 219 231 +12 (+5.48%)
Total progs: 3565
Old success: 2054
New success: 2054
States diff min: 0.00%
States diff max: 5.48%
0% .. 5%: 3564
5% .. 10%: 1
========= scx: dfa-opts vs dfa-opts-no-rm =========
File Program States (A) States (B) States (DIFF)
--------- --------------- ---------- ---------- ------------------
bpf.bpf.o rusty_init 1944 55004 +53060 (+2729.42%)
bpf.bpf.o rusty_init_task 1732 55049 +53317 (+3078.35%)
Total progs: 216
Old success: 186
New success: 184
States diff min: 0.00%
States diff max: 3078.35%
0% .. 5%: 214
2725% .. 3080%: 2
========= selftests: dfa-opts vs dfa-opts-no-rm-sl =========
File Program States (A) States (B) States (DIFF)
-------------------------------- ------------------------------------ ---------- ---------- -----------------
arena_htab_asm.bpf.o arena_htab_asm 33 40 +7 (+21.21%)
bpf_cubic.bpf.o bpf_cubic_cong_avoid 92 98 +6 (+6.52%)
bpf_flow.bpf.o flow_dissector_0 66 125 +59 (+89.39%)
bpf_iter_ksym.bpf.o dump_ksym 16 21 +5 (+31.25%)
profiler1.bpf.o kprobe__proc_sys_write 84 140 +56 (+66.67%)
profiler1.bpf.o kprobe__vfs_link 504 543 +39 (+7.74%)
profiler1.bpf.o kprobe__vfs_symlink 238 466 +228 (+95.80%)
profiler1.bpf.o kprobe_ret__do_filp_open 247 274 +27 (+10.93%)
profiler1.bpf.o raw_tracepoint__sched_process_exec 139 350 +211 (+151.80%)
profiler1.bpf.o raw_tracepoint__sched_process_exit 67 86 +19 (+28.36%)
profiler1.bpf.o tracepoint__syscalls__sys_enter_kill 649 758 +109 (+16.80%)
profiler2.bpf.o kprobe__vfs_link 149 257 +108 (+72.48%)
profiler2.bpf.o kprobe_ret__do_filp_open 106 120 +14 (+13.21%)
profiler2.bpf.o raw_tracepoint__sched_process_exec 126 140 +14 (+11.11%)
profiler3.bpf.o kprobe__vfs_link 805 1182 +377 (+46.83%)
pyperf180.bpf.o on_event 10564 17659 +7095 (+67.16%)
pyperf50.bpf.o on_event 2489 3375 +886 (+35.60%)
pyperf600_iter.bpf.o on_event 192 214 +22 (+11.46%)
pyperf_subprogs.bpf.o on_event 2331 2514 +183 (+7.85%)
setget_sockopt.bpf.o skops_sockopt 429 458 +29 (+6.76%)
setget_sockopt.bpf.o socket_post_create 90 95 +5 (+5.56%)
sock_iter_batch.bpf.o iter_tcp_soreuse 3 5 +2 (+66.67%)
strobemeta_bpf_loop.bpf.o on_event 209 331 +122 (+58.37%)
test_bpf_nf.bpf.o nf_skb_ct_test 41 56 +15 (+36.59%)
test_bpf_nf.bpf.o nf_xdp_ct_test 41 56 +15 (+36.59%)
test_cls_redirect.bpf.o cls_redirect 2175 14083 +11908 (+547.49%)
test_cls_redirect_dynptr.bpf.o cls_redirect 220 327 +107 (+48.64%)
test_cls_redirect_subprogs.bpf.o cls_redirect 4390 17001 +12611 (+287.27%)
test_l4lb.bpf.o balancer_ingress 137 256 +119 (+86.86%)
test_l4lb_noinline.bpf.o balancer_ingress 219 643 +424 (+193.61%)
test_l4lb_noinline_dynptr.bpf.o balancer_ingress 73 182 +109 (+149.32%)
test_misc_tcp_hdr_options.bpf.o misc_estab 88 98 +10 (+11.36%)
test_pkt_access.bpf.o test_pkt_access 21 25 +4 (+19.05%)
test_sock_fields.bpf.o egress_read_sock_fields 20 29 +9 (+45.00%)
test_tc_neigh_fib.bpf.o tc_dst 12 14 +2 (+16.67%)
test_tc_neigh_fib.bpf.o tc_src 12 14 +2 (+16.67%)
test_tcp_custom_syncookie.bpf.o tcp_custom_syncookie 420 560 +140 (+33.33%)
test_tcp_hdr_options.bpf.o estab 189 225 +36 (+19.05%)
test_xdp.bpf.o _xdp_tx_iptunnel 17 18 +1 (+5.88%)
test_xdp_dynptr.bpf.o _xdp_tx_iptunnel 26 36 +10 (+38.46%)
test_xdp_loop.bpf.o _xdp_tx_iptunnel 19 20 +1 (+5.26%)
test_xdp_noinline.bpf.o balancer_ingress_v4 271 1080 +809 (+298.52%)
test_xdp_noinline.bpf.o balancer_ingress_v6 268 1030 +762 (+284.33%)
xdp_features.bpf.o xdp_do_tx 10 13 +3 (+30.00%)
xdp_synproxy_kern.bpf.o syncookie_tc 390 467 +77 (+19.74%)
xdp_synproxy_kern.bpf.o syncookie_xdp 384 450 +66 (+17.19%)
Total progs: 3565
Old success: 2054
New success: 2054
States diff min: -9.09%
States diff max: 547.49%
-10% .. 0%: 3
0% .. 5%: 3492
5% .. 10%: 10
10% .. 15%: 8
15% .. 20%: 10
20% .. 25%: 6
25% .. 35%: 8
35% .. 40%: 4
45% .. 50%: 3
50% .. 55%: 4
55% .. 70%: 4
70% .. 90%: 3
95% .. 105%: 3
145% .. 195%: 3
280% .. 300%: 3
545% .. 550%: 1
========= scx: dfa-opts vs dfa-opts-no-rm-sl =========
File Program States (A) States (B) States (DIFF)
-------------- ------------------ ---------- ---------- ------------------
bpf.bpf.o bpfland_enqueue 18 20 +2 (+11.11%)
bpf.bpf.o bpfland_select_cpu 83 103 +20 (+24.10%)
bpf.bpf.o flash_select_cpu 30 49 +19 (+63.33%)
bpf.bpf.o lavd_cpu_offline 303 360 +57 (+18.81%)
bpf.bpf.o lavd_cpu_online 303 360 +57 (+18.81%)
bpf.bpf.o lavd_dispatch 7065 10652 +3587 (+50.77%)
bpf.bpf.o lavd_init 480 554 +74 (+15.42%)
bpf.bpf.o lavd_running 89 94 +5 (+5.62%)
bpf.bpf.o lavd_select_cpu 451 483 +32 (+7.10%)
bpf.bpf.o layered_dispatch 501 950 +449 (+89.62%)
bpf.bpf.o layered_dump 237 258 +21 (+8.86%)
bpf.bpf.o layered_enqueue 1290 1655 +365 (+28.29%)
bpf.bpf.o layered_init 423 552 +129 (+30.50%)
bpf.bpf.o layered_select_cpu 201 311 +110 (+54.73%)
bpf.bpf.o p2dq_dispatch 53 116 +63 (+118.87%)
bpf.bpf.o rusty_init 1944 55006 +53062 (+2729.53%)
bpf.bpf.o rusty_init_task 1732 55052 +53320 (+3078.52%)
bpf.bpf.o rusty_running 19 23 +4 (+21.05%)
bpf.bpf.o rusty_select_cpu 108 227 +119 (+110.19%)
bpf.bpf.o rusty_set_cpumask 313 479 +166 (+53.04%)
scx_nest.bpf.o nest_select_cpu 49 53 +4 (+8.16%)
Total progs: 216
Old success: 186
New success: 184
States diff min: 0.00%
States diff max: 3078.52%
0% .. 5%: 186
5% .. 10%: 4
10% .. 15%: 5
15% .. 20%: 6
20% .. 25%: 3
25% .. 55%: 6
60% .. 115%: 3
115% .. 3080%: 3
next prev parent reply other threads:[~2025-03-01 4:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
2025-03-01 2:01 ` Alexei Starovoitov
2025-03-01 2:09 ` Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers() Eduard Zingerman
2025-03-01 2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
2025-03-01 4:40 ` Eduard Zingerman [this message]
2025-03-02 0:09 ` Alexei Starovoitov
2025-03-03 19:28 ` Eduard Zingerman
2025-03-05 9:00 ` Eduard Zingerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cc29975fbaf163d0c2ed904a9a4d6d9452177542.camel@gmail.com \
--to=eddyz87@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@linux.dev \
--cc=tj@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox