Re: [PATCH bpf-next 4/5] bpf: disambiguate SCALAR register state output in verifier logs

BPF List
 help / color / mirror / Atom feed

From: John Fastabend <john.fastabend@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	 John Fastabend <john.fastabend@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>,
	 bpf@vger.kernel.org,  ast@kernel.org,  daniel@iogearbox.net,
	 martin.lau@kernel.org,  kernel-team@meta.com
Subject: Re: [PATCH bpf-next 4/5] bpf: disambiguate SCALAR register state output in verifier logs
Date: Thu, 12 Oct 2023 09:59:34 -0700	[thread overview]
Message-ID: <652825f6df713_5644208f2@john.notmuch> (raw)
In-Reply-To: <CAEf4BzaZx3_wGXqXBFt_Y4z+2L9XF7krNan4ZN4DQ0uLLXOf_Q@mail.gmail.com>

Andrii Nakryiko wrote:
> On Wed, Oct 11, 2023 at 10:33 PM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > Andrii Nakryiko wrote:
> > > Currently the way that verifier prints SCALAR_VALUE register state (and
> > > PTR_TO_PACKET, which can have var_off and ranges info as well) is very
> > > ambiguous.
> > >
> > > In the name of brevity we are trying to eliminate "unnecessary" output
> > > of umin/umax, smin/smax, u32_min/u32_max, and s32_min/s32_max values, if
> > > possible. Current rules are that if any of those have their default
> > > value (which for mins is the minimal value of its respective types: 0,
> > > S32_MIN, or S64_MIN, while for maxs it's U32_MAX, S32_MAX, S64_MAX, or
> > > U64_MAX) *OR* if there is another min/max value that as matching value.
> > > E.g., if smin=100 and umin=100, we'll emit only umin=10, omitting smin
> > > altogether. This approach has a few problems, being both ambiguous and
> > > sort-of incorrect in some cases.
> > >
> > > Ambiguity is due to missing value could be either default value or value
> > > of umin/umax or smin/smax. This is especially confusing when we mix
> > > signed and unsigned ranges. Quite often, umin=0 and smin=0, and so we'll
> > > have only `umin=0` leaving anyone reading verifier log to guess whether
> > > smin is actually 0 or it's actually -9223372036854775808 (S64_MIN). And
> > > often times it's important to know, especially when debugging tricky
> > > issues.
> >
> > +1
> >
> > >
> > > "Sort-of incorrectness" comes from mixing negative and positive values.
> > > E.g., if umin is some large positive number, it can be equal to smin
> > > which is, interpreted as signed value, is actually some negative value.
> > > Currently, that smin will be omitted and only umin will be emitted with
> > > a large positive value, giving an impression that smin is also positive.
> > >
> > > Anyway, ambiguity is the biggest issue making it impossible to have an
> > > exact understanding of register state, preventing any sort of automated
> > > testing of verifier state based on verifier log. This patch is
> > > attempting to rectify the situation by removing ambiguity, while
> > > minimizing the verboseness of register state output.
> > >
> > > The rules are straightforward:
> > >   - if some of the values are missing, then it definitely has a default
> > >   value. I.e., `umin=0` means that umin is zero, but smin is actually
> > >   S64_MIN;
> > >   - all the various boundaries that happen to have the same value are
> > >   emitted in one equality separated sequence. E.g., if umin and smin are
> > >   both 100, we'll emit `smin=umin=100`, making this explicit;
> > >   - we do not mix negative and positive values together, and even if
> > >   they happen to have the same bit-level value, they will be emitted
> > >   separately with proper sign. I.e., if both umax and smax happen to be
> > >   0xffffffffffffffff, we'll emit them both separately as
> > >   `smax=-1,umax=18446744073709551615`;
> > >   - in the name of a bit more uniformity and consistency,
> > >   {u32,s32}_{min,max} are renamed to {s,u}{min,max}32, which seems to
> > >   improve readability.
> >
> > agree.
> >
> > >
> > > The above means that in case of all 4 ranges being, say, [50, 100] range,
> > > we'd previously see hugely ambiguous:
> > >
> > >     R1=scalar(umin=50,umax=100)
> > >
> > > Now, we'll be more explicit:
> > >
> > >     R1=scalar(smin=umin=smin32=umin32=50,smax=umax=smax32=umax32=100)
> > >
> > > This is slightly more verbose, but distinct from the case when we don't
> > > know anything about signed boundaries and 32-bit boundaries, which under
> > > new rules will match the old case:
> > >
> > >     R1=scalar(umin=50,umax=100)
> >
> > Did you consider perhaps just always printing the entire set? Its overly
> > verbose I guess but I find it easier to track state across multiple
> > steps this way.
> 
> I didn't consider that because it's way too distracting and verbose
> (IMO) in practice. For one, those default values represent the idea
> "we don't know anything", so whether we see umax=18446744073709551615
> or just don't see umax makes little difference in practice (though
> perhaps one has to come to realization that those two things are
> equivalent). But also think about seeing this:
> 
> smin=-9223372036854775807,smax=9223372036854775807,umin=0,umax=18446744073709551615,smin32=-2147483648,smax32=21474836487,umin32=0,umax32=4294967295

you could do,

smin=SMIN,smax=SMAX,umin=0,umax=UMAX,smin=SMIN,smax=SMAX,umin32=0,umax32=UMAX

but I see your point.

> 
> How verbose and distracting that is, and how much time would it take
> you to notice that this is not just "we don't know anything about this
> register", but that actually smin is not a default, it's S64_MIN+1.
> This is of course extreme example (I mostly wanted to show how verbose
> default output will be), but I think the point stands that omitting
> defaults brings out what extra information we have much better.
> 
> It's an option to do it for LOG_LEVEL_2, but I would still not do
> that, I'd find it too noisy even for level 2.

My $.02 just leave it as you have it here.

> 
> >
> > Otherwise patch LGTM.

next prev parent reply	other threads:[~2023-10-12 16:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-11 22:37 [PATCH bpf-next 0/5] BPF verifier log improvements Andrii Nakryiko
2023-10-11 22:37 ` [PATCH bpf-next 1/5] selftests/bpf: improve percpu_alloc test robustness Andrii Nakryiko
2023-10-12  6:04   ` Yafang Shao
2023-10-12 16:37     ` Andrii Nakryiko
2023-10-11 22:37 ` [PATCH bpf-next 2/5] selftests/bpf: improve missed_kprobe_recursion " Andrii Nakryiko
2023-10-12 13:22   ` Jiri Olsa
2023-10-11 22:37 ` [PATCH bpf-next 3/5] selftests/bpf: make align selftests more robust Andrii Nakryiko
2023-10-11 22:37 ` [PATCH bpf-next 4/5] bpf: disambiguate SCALAR register state output in verifier logs Andrii Nakryiko
2023-10-12  5:33   ` John Fastabend
2023-10-12 16:22     ` Andrii Nakryiko
2023-10-12 16:59       ` John Fastabend [this message]
2023-10-12 17:45         ` Andrii Nakryiko
2023-10-11 22:37 ` [PATCH bpf-next 5/5] bpf: ensure proper register state printing for cond jumps Andrii Nakryiko
2023-10-12  5:40 ` [PATCH bpf-next 0/5] BPF verifier log improvements John Fastabend
2023-10-12 15:00 ` Eduard Zingerman
2023-10-16 12:00 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=652825f6df713_5644208f2@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@meta.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox