From: Eduard Zingerman <eddyz87@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@linux.dev>,
Kernel Team <kernel-team@fb.com>,
Yonghong Song <yonghong.song@linux.dev>,
Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
Date: Fri, 28 Feb 2025 20:40:44 -0800 [thread overview]
Message-ID: <cc29975fbaf163d0c2ed904a9a4d6d9452177542.camel@gmail.com> (raw)
In-Reply-To: <CAADnVQ+BEW_yTsm-pMYcCsHhpZ4=FhAMmGvY7AhwyiUOZ+X1Gg@mail.gmail.com>
On Fri, 2025-02-28 at 18:10 -0800, Alexei Starovoitov wrote:
[...]
> I think the end goal is to get rid of mark_reg_read() and
> switch to proper live reg analysis.
> So please include the numbers to see how much work left.
Complete removal of mark_reg_read() means that analysis needs to be
done for stack slots as well. The algorithm to handle stack slots is
much more complicated:
- it needs to track register / stack slot type to handle cases like
"r1 = r10" and spills of the stack pointer to stack;
- it needs to track register values, at-least crudely, to handle cases
like "r1 = r10; r1 += r2;" (array access).
The worst case scenario, as you suggested, is just to assume stack
slots live, but it is a big verification performance hit.
Exact numbers are at the end of the email.
> Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> iirc we did it to support fine grain mark_insn_zext
> to help architectures where zext has to be inserted by JIT.
> I'm not sure whether new liveness has to do it as well.
As far as I understand, this is important for one check in
propagate_liveness(). And that check means something like:
"if this register was read as 64-bit value, remember that
it needs zero extension on 32-bit load".
Meaning that either DFA would need to track this bit of information
(should be simple), or more zero extensions would be added.
---
Repository [1] shared in cover letter was used for benchmarks below.
Abbreviations are as follows:
- Name: dfa-opts
Commit: b73005452a4a
Meaning: DFA as shared in this patch-set + a set of small
improvements which I decided to exclude from the
patch-set as described in the cover letter.
- Name: dfa-opts-no-rm
Commit: e486757fdada
Meaning: dfa-opts + read marks are disabled for registers.
- Name: dfa-opts-no-rm-sl
Commit: a9930e8127a9
Meaning: dfa-opts + read marks are disabled for registers
and stack.
[1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off
Veristat output is filtered using -f "states_pct>5" -f "!insns<200".
Veristat results are followed by a histogram that accounts for all
tests.
Two comparisons are made:
- dfa-opts vs dfa-opts-no-rm (small negative impact, except two
sched_ext programs that hit 1M instructions limit; positive impact
would have indicated a bug);
- dfa-opts vs dfa-opts-no-rm-sl (big negative impact).
========= selftests: dfa-opts vs dfa-opts-no-rm =========
File Program States (A) States (B) States (DIFF)
------------------------ ---------------- ---------- ---------- -------------
test_l4lb_noinline.bpf.o balancer_ingress 219 231 +12 (+5.48%)
Total progs: 3565
Old success: 2054
New success: 2054
States diff min: 0.00%
States diff max: 5.48%
0% .. 5%: 3564
5% .. 10%: 1
========= scx: dfa-opts vs dfa-opts-no-rm =========
File Program States (A) States (B) States (DIFF)
--------- --------------- ---------- ---------- ------------------
bpf.bpf.o rusty_init 1944 55004 +53060 (+2729.42%)
bpf.bpf.o rusty_init_task 1732 55049 +53317 (+3078.35%)
Total progs: 216
Old success: 186
New success: 184
States diff min: 0.00%
States diff max: 3078.35%
0% .. 5%: 214
2725% .. 3080%: 2
========= selftests: dfa-opts vs dfa-opts-no-rm-sl =========
File Program States (A) States (B) States (DIFF)
-------------------------------- ------------------------------------ ---------- ---------- -----------------
arena_htab_asm.bpf.o arena_htab_asm 33 40 +7 (+21.21%)
bpf_cubic.bpf.o bpf_cubic_cong_avoid 92 98 +6 (+6.52%)
bpf_flow.bpf.o flow_dissector_0 66 125 +59 (+89.39%)
bpf_iter_ksym.bpf.o dump_ksym 16 21 +5 (+31.25%)
profiler1.bpf.o kprobe__proc_sys_write 84 140 +56 (+66.67%)
profiler1.bpf.o kprobe__vfs_link 504 543 +39 (+7.74%)
profiler1.bpf.o kprobe__vfs_symlink 238 466 +228 (+95.80%)
profiler1.bpf.o kprobe_ret__do_filp_open 247 274 +27 (+10.93%)
profiler1.bpf.o raw_tracepoint__sched_process_exec 139 350 +211 (+151.80%)
profiler1.bpf.o raw_tracepoint__sched_process_exit 67 86 +19 (+28.36%)
profiler1.bpf.o tracepoint__syscalls__sys_enter_kill 649 758 +109 (+16.80%)
profiler2.bpf.o kprobe__vfs_link 149 257 +108 (+72.48%)
profiler2.bpf.o kprobe_ret__do_filp_open 106 120 +14 (+13.21%)
profiler2.bpf.o raw_tracepoint__sched_process_exec 126 140 +14 (+11.11%)
profiler3.bpf.o kprobe__vfs_link 805 1182 +377 (+46.83%)
pyperf180.bpf.o on_event 10564 17659 +7095 (+67.16%)
pyperf50.bpf.o on_event 2489 3375 +886 (+35.60%)
pyperf600_iter.bpf.o on_event 192 214 +22 (+11.46%)
pyperf_subprogs.bpf.o on_event 2331 2514 +183 (+7.85%)
setget_sockopt.bpf.o skops_sockopt 429 458 +29 (+6.76%)
setget_sockopt.bpf.o socket_post_create 90 95 +5 (+5.56%)
sock_iter_batch.bpf.o iter_tcp_soreuse 3 5 +2 (+66.67%)
strobemeta_bpf_loop.bpf.o on_event 209 331 +122 (+58.37%)
test_bpf_nf.bpf.o nf_skb_ct_test 41 56 +15 (+36.59%)
test_bpf_nf.bpf.o nf_xdp_ct_test 41 56 +15 (+36.59%)
test_cls_redirect.bpf.o cls_redirect 2175 14083 +11908 (+547.49%)
test_cls_redirect_dynptr.bpf.o cls_redirect 220 327 +107 (+48.64%)
test_cls_redirect_subprogs.bpf.o cls_redirect 4390 17001 +12611 (+287.27%)
test_l4lb.bpf.o balancer_ingress 137 256 +119 (+86.86%)
test_l4lb_noinline.bpf.o balancer_ingress 219 643 +424 (+193.61%)
test_l4lb_noinline_dynptr.bpf.o balancer_ingress 73 182 +109 (+149.32%)
test_misc_tcp_hdr_options.bpf.o misc_estab 88 98 +10 (+11.36%)
test_pkt_access.bpf.o test_pkt_access 21 25 +4 (+19.05%)
test_sock_fields.bpf.o egress_read_sock_fields 20 29 +9 (+45.00%)
test_tc_neigh_fib.bpf.o tc_dst 12 14 +2 (+16.67%)
test_tc_neigh_fib.bpf.o tc_src 12 14 +2 (+16.67%)
test_tcp_custom_syncookie.bpf.o tcp_custom_syncookie 420 560 +140 (+33.33%)
test_tcp_hdr_options.bpf.o estab 189 225 +36 (+19.05%)
test_xdp.bpf.o _xdp_tx_iptunnel 17 18 +1 (+5.88%)
test_xdp_dynptr.bpf.o _xdp_tx_iptunnel 26 36 +10 (+38.46%)
test_xdp_loop.bpf.o _xdp_tx_iptunnel 19 20 +1 (+5.26%)
test_xdp_noinline.bpf.o balancer_ingress_v4 271 1080 +809 (+298.52%)
test_xdp_noinline.bpf.o balancer_ingress_v6 268 1030 +762 (+284.33%)
xdp_features.bpf.o xdp_do_tx 10 13 +3 (+30.00%)
xdp_synproxy_kern.bpf.o syncookie_tc 390 467 +77 (+19.74%)
xdp_synproxy_kern.bpf.o syncookie_xdp 384 450 +66 (+17.19%)
Total progs: 3565
Old success: 2054
New success: 2054
States diff min: -9.09%
States diff max: 547.49%
-10% .. 0%: 3
0% .. 5%: 3492
5% .. 10%: 10
10% .. 15%: 8
15% .. 20%: 10
20% .. 25%: 6
25% .. 35%: 8
35% .. 40%: 4
45% .. 50%: 3
50% .. 55%: 4
55% .. 70%: 4
70% .. 90%: 3
95% .. 105%: 3
145% .. 195%: 3
280% .. 300%: 3
545% .. 550%: 1
========= scx: dfa-opts vs dfa-opts-no-rm-sl =========
File Program States (A) States (B) States (DIFF)
-------------- ------------------ ---------- ---------- ------------------
bpf.bpf.o bpfland_enqueue 18 20 +2 (+11.11%)
bpf.bpf.o bpfland_select_cpu 83 103 +20 (+24.10%)
bpf.bpf.o flash_select_cpu 30 49 +19 (+63.33%)
bpf.bpf.o lavd_cpu_offline 303 360 +57 (+18.81%)
bpf.bpf.o lavd_cpu_online 303 360 +57 (+18.81%)
bpf.bpf.o lavd_dispatch 7065 10652 +3587 (+50.77%)
bpf.bpf.o lavd_init 480 554 +74 (+15.42%)
bpf.bpf.o lavd_running 89 94 +5 (+5.62%)
bpf.bpf.o lavd_select_cpu 451 483 +32 (+7.10%)
bpf.bpf.o layered_dispatch 501 950 +449 (+89.62%)
bpf.bpf.o layered_dump 237 258 +21 (+8.86%)
bpf.bpf.o layered_enqueue 1290 1655 +365 (+28.29%)
bpf.bpf.o layered_init 423 552 +129 (+30.50%)
bpf.bpf.o layered_select_cpu 201 311 +110 (+54.73%)
bpf.bpf.o p2dq_dispatch 53 116 +63 (+118.87%)
bpf.bpf.o rusty_init 1944 55006 +53062 (+2729.53%)
bpf.bpf.o rusty_init_task 1732 55052 +53320 (+3078.52%)
bpf.bpf.o rusty_running 19 23 +4 (+21.05%)
bpf.bpf.o rusty_select_cpu 108 227 +119 (+110.19%)
bpf.bpf.o rusty_set_cpumask 313 479 +166 (+53.04%)
scx_nest.bpf.o nest_select_cpu 49 53 +4 (+8.16%)
Total progs: 216
Old success: 186
New success: 184
States diff min: 0.00%
States diff max: 3078.52%
0% .. 5%: 186
5% .. 10%: 4
10% .. 15%: 5
15% .. 20%: 6
20% .. 25%: 3
25% .. 55%: 6
60% .. 115%: 3
115% .. 3080%: 3
next prev parent reply other threads:[~2025-03-01 4:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
2025-03-01 2:01 ` Alexei Starovoitov
2025-03-01 2:09 ` Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal Eduard Zingerman
2025-02-28 6:00 ` [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers() Eduard Zingerman
2025-03-01 2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
2025-03-01 4:40 ` Eduard Zingerman [this message]
2025-03-02 0:09 ` Alexei Starovoitov
2025-03-03 19:28 ` Eduard Zingerman
2025-03-05 9:00 ` Eduard Zingerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cc29975fbaf163d0c2ed904a9a4d6d9452177542.camel@gmail.com \
--to=eddyz87@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@linux.dev \
--cc=tj@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.