All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eduard Zingerman <eddyz87@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau	 <martin.lau@linux.dev>,
	Kernel Team <kernel-team@fb.com>,
	Yonghong Song	 <yonghong.song@linux.dev>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis
Date: Fri, 28 Feb 2025 20:40:44 -0800	[thread overview]
Message-ID: <cc29975fbaf163d0c2ed904a9a4d6d9452177542.camel@gmail.com> (raw)
In-Reply-To: <CAADnVQ+BEW_yTsm-pMYcCsHhpZ4=FhAMmGvY7AhwyiUOZ+X1Gg@mail.gmail.com>

On Fri, 2025-02-28 at 18:10 -0800, Alexei Starovoitov wrote:

[...]

> I think the end goal is to get rid of mark_reg_read() and
> switch to proper live reg analysis.
> So please include the numbers to see how much work left.

Complete removal of mark_reg_read() means that analysis needs to be
done for stack slots as well. The algorithm to handle stack slots is
much more complicated:
- it needs to track register / stack slot type to handle cases like
  "r1 = r10" and spills of the stack pointer to stack;
- it needs to track register values, at-least crudely, to handle cases
  like "r1 = r10; r1 += r2;" (array access).

The worst case scenario, as you suggested, is just to assume stack
slots live, but it is a big verification performance hit.
Exact numbers are at the end of the email.

> Also note that mark_reg_read() tracks 32 vs 64 reads separately.
> iirc we did it to support fine grain mark_insn_zext
> to help architectures where zext has to be inserted by JIT.
> I'm not sure whether new liveness has to do it as well.

As far as I understand, this is important for one check in
propagate_liveness(). And that check means something like:
"if this register was read as 64-bit value, remember that
 it needs zero extension on 32-bit load".

Meaning that either DFA would need to track this bit of information
(should be simple), or more zero extensions would be added.

---

Repository [1] shared in cover letter was used for benchmarks below.
Abbreviations are as follows:
- Name: dfa-opts
  Commit: b73005452a4a
  Meaning: DFA as shared in this patch-set + a set of small
           improvements which I decided to exclude from the
           patch-set as described in the cover letter.
- Name: dfa-opts-no-rm
  Commit: e486757fdada
  Meaning: dfa-opts + read marks are disabled for registers.
- Name: dfa-opts-no-rm-sl
  Commit: a9930e8127a9
  Meaning: dfa-opts + read marks are disabled for registers
           and stack.

[1] https://github.com/eddyz87/bpf/tree/liveregs-dfa-std-liveregs-off

Veristat output is filtered using -f "states_pct>5" -f "!insns<200".
Veristat results are followed by a histogram that accounts for all
tests.

Two comparisons are made:
- dfa-opts vs dfa-opts-no-rm (small negative impact, except two
  sched_ext programs that hit 1M instructions limit; positive impact
  would have indicated a bug);
- dfa-opts vs dfa-opts-no-rm-sl (big negative impact).

========= selftests: dfa-opts vs dfa-opts-no-rm =========

File                      Program           States (A)  States (B)  States (DIFF)
------------------------  ----------------  ----------  ----------  -------------
test_l4lb_noinline.bpf.o  balancer_ingress         219         231   +12 (+5.48%)

Total progs: 3565
Old success: 2054
New success: 2054
States diff min:    0.00%
States diff max:    5.48%
   0% ..    5%: 3564
   5% ..   10%: 1

========= scx: dfa-opts vs dfa-opts-no-rm =========

File       Program          States (A)  States (B)  States      (DIFF)
---------  ---------------  ----------  ----------  ------------------
bpf.bpf.o  rusty_init             1944       55004  +53060 (+2729.42%)
bpf.bpf.o  rusty_init_task        1732       55049  +53317 (+3078.35%)

Total progs: 216
Old success: 186
New success: 184
States diff min:    0.00%
States diff max: 3078.35%
   0% ..    5%: 214
2725% .. 3080%: 2



========= selftests: dfa-opts vs dfa-opts-no-rm-sl =========

File                              Program                               States (A)  States (B)  States     (DIFF)
--------------------------------  ------------------------------------  ----------  ----------  -----------------
arena_htab_asm.bpf.o              arena_htab_asm                                33          40       +7 (+21.21%)
bpf_cubic.bpf.o                   bpf_cubic_cong_avoid                          92          98        +6 (+6.52%)
bpf_flow.bpf.o                    flow_dissector_0                              66         125      +59 (+89.39%)
bpf_iter_ksym.bpf.o               dump_ksym                                     16          21       +5 (+31.25%)
profiler1.bpf.o                   kprobe__proc_sys_write                        84         140      +56 (+66.67%)
profiler1.bpf.o                   kprobe__vfs_link                             504         543       +39 (+7.74%)
profiler1.bpf.o                   kprobe__vfs_symlink                          238         466     +228 (+95.80%)
profiler1.bpf.o                   kprobe_ret__do_filp_open                     247         274      +27 (+10.93%)
profiler1.bpf.o                   raw_tracepoint__sched_process_exec           139         350    +211 (+151.80%)
profiler1.bpf.o                   raw_tracepoint__sched_process_exit            67          86      +19 (+28.36%)
profiler1.bpf.o                   tracepoint__syscalls__sys_enter_kill         649         758     +109 (+16.80%)
profiler2.bpf.o                   kprobe__vfs_link                             149         257     +108 (+72.48%)
profiler2.bpf.o                   kprobe_ret__do_filp_open                     106         120      +14 (+13.21%)
profiler2.bpf.o                   raw_tracepoint__sched_process_exec           126         140      +14 (+11.11%)
profiler3.bpf.o                   kprobe__vfs_link                             805        1182     +377 (+46.83%)
pyperf180.bpf.o                   on_event                                   10564       17659    +7095 (+67.16%)
pyperf50.bpf.o                    on_event                                    2489        3375     +886 (+35.60%)
pyperf600_iter.bpf.o              on_event                                     192         214      +22 (+11.46%)
pyperf_subprogs.bpf.o             on_event                                    2331        2514      +183 (+7.85%)
setget_sockopt.bpf.o              skops_sockopt                                429         458       +29 (+6.76%)
setget_sockopt.bpf.o              socket_post_create                            90          95        +5 (+5.56%)
sock_iter_batch.bpf.o             iter_tcp_soreuse                               3           5       +2 (+66.67%)
strobemeta_bpf_loop.bpf.o         on_event                                     209         331     +122 (+58.37%)
test_bpf_nf.bpf.o                 nf_skb_ct_test                                41          56      +15 (+36.59%)
test_bpf_nf.bpf.o                 nf_xdp_ct_test                                41          56      +15 (+36.59%)
test_cls_redirect.bpf.o           cls_redirect                                2175       14083  +11908 (+547.49%)
test_cls_redirect_dynptr.bpf.o    cls_redirect                                 220         327     +107 (+48.64%)
test_cls_redirect_subprogs.bpf.o  cls_redirect                                4390       17001  +12611 (+287.27%)
test_l4lb.bpf.o                   balancer_ingress                             137         256     +119 (+86.86%)
test_l4lb_noinline.bpf.o          balancer_ingress                             219         643    +424 (+193.61%)
test_l4lb_noinline_dynptr.bpf.o   balancer_ingress                              73         182    +109 (+149.32%)
test_misc_tcp_hdr_options.bpf.o   misc_estab                                    88          98      +10 (+11.36%)
test_pkt_access.bpf.o             test_pkt_access                               21          25       +4 (+19.05%)
test_sock_fields.bpf.o            egress_read_sock_fields                       20          29       +9 (+45.00%)
test_tc_neigh_fib.bpf.o           tc_dst                                        12          14       +2 (+16.67%)
test_tc_neigh_fib.bpf.o           tc_src                                        12          14       +2 (+16.67%)
test_tcp_custom_syncookie.bpf.o   tcp_custom_syncookie                         420         560     +140 (+33.33%)
test_tcp_hdr_options.bpf.o        estab                                        189         225      +36 (+19.05%)
test_xdp.bpf.o                    _xdp_tx_iptunnel                              17          18        +1 (+5.88%)
test_xdp_dynptr.bpf.o             _xdp_tx_iptunnel                              26          36      +10 (+38.46%)
test_xdp_loop.bpf.o               _xdp_tx_iptunnel                              19          20        +1 (+5.26%)
test_xdp_noinline.bpf.o           balancer_ingress_v4                          271        1080    +809 (+298.52%)
test_xdp_noinline.bpf.o           balancer_ingress_v6                          268        1030    +762 (+284.33%)
xdp_features.bpf.o                xdp_do_tx                                     10          13       +3 (+30.00%)
xdp_synproxy_kern.bpf.o           syncookie_tc                                 390         467      +77 (+19.74%)
xdp_synproxy_kern.bpf.o           syncookie_xdp                                384         450      +66 (+17.19%)

Total progs: 3565
Old success: 2054
New success: 2054
States diff min:   -9.09%
States diff max:  547.49%
 -10% ..    0%: 3
   0% ..    5%: 3492
   5% ..   10%: 10
  10% ..   15%: 8
  15% ..   20%: 10
  20% ..   25%: 6
  25% ..   35%: 8
  35% ..   40%: 4
  45% ..   50%: 3
  50% ..   55%: 4
  55% ..   70%: 4
  70% ..   90%: 3
  95% ..  105%: 3
 145% ..  195%: 3
 280% ..  300%: 3
 545% ..  550%: 1

========= scx: dfa-opts vs dfa-opts-no-rm-sl =========

File            Program             States (A)  States (B)  States      (DIFF)
--------------  ------------------  ----------  ----------  ------------------
bpf.bpf.o       bpfland_enqueue             18          20        +2 (+11.11%)
bpf.bpf.o       bpfland_select_cpu          83         103       +20 (+24.10%)
bpf.bpf.o       flash_select_cpu            30          49       +19 (+63.33%)
bpf.bpf.o       lavd_cpu_offline           303         360       +57 (+18.81%)
bpf.bpf.o       lavd_cpu_online            303         360       +57 (+18.81%)
bpf.bpf.o       lavd_dispatch             7065       10652     +3587 (+50.77%)
bpf.bpf.o       lavd_init                  480         554       +74 (+15.42%)
bpf.bpf.o       lavd_running                89          94         +5 (+5.62%)
bpf.bpf.o       lavd_select_cpu            451         483        +32 (+7.10%)
bpf.bpf.o       layered_dispatch           501         950      +449 (+89.62%)
bpf.bpf.o       layered_dump               237         258        +21 (+8.86%)
bpf.bpf.o       layered_enqueue           1290        1655      +365 (+28.29%)
bpf.bpf.o       layered_init               423         552      +129 (+30.50%)
bpf.bpf.o       layered_select_cpu         201         311      +110 (+54.73%)
bpf.bpf.o       p2dq_dispatch               53         116      +63 (+118.87%)
bpf.bpf.o       rusty_init                1944       55006  +53062 (+2729.53%)
bpf.bpf.o       rusty_init_task           1732       55052  +53320 (+3078.52%)
bpf.bpf.o       rusty_running               19          23        +4 (+21.05%)
bpf.bpf.o       rusty_select_cpu           108         227     +119 (+110.19%)
bpf.bpf.o       rusty_set_cpumask          313         479      +166 (+53.04%)
scx_nest.bpf.o  nest_select_cpu             49          53         +4 (+8.16%)

Total progs: 216
Old success: 186
New success: 184
States diff min:    0.00%
States diff max: 3078.52%
   0% ..    5%: 186
   5% ..   10%: 4
  10% ..   15%: 5
  15% ..   20%: 6
  20% ..   25%: 3
  25% ..   55%: 6
  60% ..  115%: 3
 115% .. 3080%: 3


  reply	other threads:[~2025-03-01  4:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-28  6:00 [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Eduard Zingerman
2025-02-28  6:00 ` [PATCH bpf-next v1 1/3] " Eduard Zingerman
2025-03-01  2:01   ` Alexei Starovoitov
2025-03-01  2:09     ` Eduard Zingerman
2025-02-28  6:00 ` [PATCH bpf-next v1 2/3] bpf: use register liveness information for func_states_equal Eduard Zingerman
2025-02-28  6:00 ` [PATCH bpf-next v1 3/3] selftests/bpf: test cases for compute_live_registers() Eduard Zingerman
2025-03-01  2:10 ` [PATCH bpf-next v1 0/3] bpf: simple DFA-based live registers analysis Alexei Starovoitov
2025-03-01  4:40   ` Eduard Zingerman [this message]
2025-03-02  0:09     ` Alexei Starovoitov
2025-03-03 19:28       ` Eduard Zingerman
2025-03-05  9:00       ` Eduard Zingerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc29975fbaf163d0c2ed904a9a4d6d9452177542.camel@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@linux.dev \
    --cc=tj@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.