Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Marchevsky <davemarchevsky@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: <bpf@vger.kernel.org>, <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, Yonghong Song <yhs@fb.com>
Subject: Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats
Date: Thu, 23 Sep 2021 21:27:06 -0400	[thread overview]
Message-ID: <35e837fb-ac22-3ea1-4624-2a890f6d0db0@fb.com> (raw)
In-Reply-To: <20210923205105.zufadghli5772uma@ast-mbp>

On 9/23/21 4:51 PM, Alexei Starovoitov wrote:   
> On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
>> The verifier currently logs some useful statistics in
>> print_verification_stats. Although the text log is an effective feedback
>> tool for an engineer iterating on a single application, it would also be
>> useful to enable tracking these stats in a more structured form for
>> fleetwide or historical analysis, which this patchset attempts to do.
>>
>> A concrete motivating usecase which came up in recent weeks:
>>
>> A team owns a complex BPF program, with various folks extending its
>> functionality over the years. An engineer tries to make a relatively
>> simple addition but encounters "BPF program is too large. Processed
>> 1000001 insn". 
>>
>> Their changes bumped the processed insns from 700k to over the limit and
>> there's no obvious way to simplify. They must now consider a large
>> refactor in order to incorporate the new feature. What if there was some
>> previous change which bumped processed insns from 200k->700k which
>> _could_ be modified to stress verifier less? Tracking historical
>> verifier stats for each version of the program over the years would
>> reduce manual work necessary to find such a change.
>>
>>
>> Although parsing the text log could work for this scenario, a solution
>> that's resilient to log format and other verifier changes would be
>> preferable.
>>
>> This patchset adds a bpf_prog_verif_stats struct - containing the same
>> data logged by print_verification_stats - which can be retrieved as part
>> of bpf_prog_info. Looking for general feedback on approach and a few
>> specific areas before fleshing it out further:
>>
>> * None of my usecases require storing verif_stats for the lifetime of a
>>   loaded prog, but adding to bpf_prog_aux felt more correct than trying
>>   to pass verif_stats back as part of BPF_PROG_LOAD
>> * The verif_stats are probably not generally useful enough to warrant
>>   inclusion in fdinfo, but hoping to get confirmation before removing
>>   that change in patch 1
>> * processed_insn, verification_time, and total_states are immediately
>>   useful for me, rest were added for parity with
>> 	print_verification_stats. Can remove.
>> * Perhaps a version field would be useful in verif_stats in case future
>>   verifier changes make some current stats meaningless
>> * Note: stack_depth stat was intentionally skipped to keep patch 1
>>   simple. Will add if approach looks good.
> 
> Sorry for the delay. LPC consumes a lot of mental energy :)
> 
> I see the value of exposing some of the verification stats as prog_info.
> Let's look at the list:
> struct bpf_prog_verif_stats {
>        __u64 verification_time;
>        __u32 insn_processed;
>        __u32 max_states_per_insn;
>        __u32 total_states;
>        __u32 peak_states;
>        __u32 longest_mark_read_walk;
> };
> verification_time is non deterministic. It varies with frequency
> and run-to-run. I don't see how alerting tools can use it.

Makes sense to me, will get rid of it.

> insn_processed is indeed the main verification metric.
> By now it's well known and understood.
> 
> max_states_per_insn, total_states, etc were the metrics I've studied
> carefully with pruning, back tracking and pretty much every significant
> change I did or reiviewed in the verifier. They're useful to humans
> and developers, but I don't see how alerting tools will use them.
> 
> So it feels to me that insn_processed alone will be enough to address the
> monitoring goal.

For the concrete usecase in my original message insn_processed would be 
enough. For the others - I thought there might be value in gathering
those "fleetwide" to inform verifier development, e.g.:

"Hmm, this team's libbpf program has been regressing total_states over
past few {kernel, llvm} rollouts, but they haven't been modifying it. 
Let's try to get a minimal repro, send to bpf@vger, and contribute to 
selftests if it is indeed hitting a weird verifier edge case"

So for those I'm not expecting them to be useful to alert on or be a
number that the average BPF program writer needs to care about.

Of course this is hypothetical as I haven't tried to gather such data
and look for interesting patterns. But these metrics being useful to
you when looking at significant verifier changes is a good sign. 

> It can be exposed to fd_info and printed by bpftool.
> If/when it changes with some future verifier algorithm we should be able
> to approximate it.
>

next prev parent reply	other threads:[~2021-09-24  1:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-20 15:11 [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Dave Marchevsky
2021-09-20 15:11 ` [RFC PATCH bpf-next 1/2] bpf: add verifier stats to bpf_prog_info and fdinfo Dave Marchevsky
2021-09-20 15:11 ` [RFC PATCH bpf-next 2/2] selftests/bpf: add verif_stats test Dave Marchevsky
2021-09-23 20:51 ` [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification stats Alexei Starovoitov
2021-09-24  1:27   ` Dave Marchevsky [this message]
2021-09-24  2:02     ` Andrii Nakryiko
2021-09-24 18:24       ` Dave Marchevsky
2021-09-27 18:20         ` John Fastabend
2021-09-28  0:41           ` Andrii Nakryiko
2021-09-28  1:33             ` John Fastabend
2021-10-07  9:06               ` Dave Marchevsky
2021-10-08 15:50                 ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35e837fb-ac22-3ea1-4624-2a890f6d0db0@fb.com \
    --to=davemarchevsky@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=netdev@vger.kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).