BPF List
 help / color / mirror / Atom feed
From: Shung-Hsi Yu <shung-hsi.yu@suse.com>
To: Andrii Nakryiko <andrii@kernel.org>
Cc: Langston Barrett <langston.barrett@gmail.com>,
	Srinivas Narayana <srinivas.narayana@rutgers.edu>,
	Santosh Nagarakatte <santosh.nagarakatte@cs.rutgers.edu>,
	bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	martin.lau@kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 bpf-next 7/7] selftests/bpf: BPF register range bounds tester
Date: Thu, 19 Oct 2023 15:52:03 +0800	[thread overview]
Message-ID: <ZTDgIyzBX9oZNeFw@u94a> (raw)
In-Reply-To: <ZTDbGWHu4CnJYWAs@u94a>

On Thu, Oct 19, 2023 at 03:30:33PM +0800, Shung-Hsi Yu wrote:
> On Wed, Oct 18, 2023 at 09:24:05PM -0700, Andrii Nakryiko wrote:
> > Add tests that validate correctness and completeness of BPF verifier's
> > register range bounds.
> 
> Nitpick: in abstract-interpretation-speak, completeness seems to mean
> something different. I believe what we're trying to check here is
> soundness[1], again, in abstraction-interpretation-speak), so using
> completeness here may be misleading to some. (I'll leave explanation to
> other that understand this concept better than I do, rather than making an
> ill attempt that would probably just make things worst)
> 
> > The main bulk is a lot of auto-generated tests based on a small set of
> > seed values for lower and upper 32 bits of full 64-bit values.
> > Currently we validate only range vs const comparisons, but the idea is
> > to start validating range over range comparisons in subsequent patch set.
> 
> CC Langston Barrett who had previously send kunit-based tnum checks[2] a
> while back. If this patch is merged, perhaps we can consider adding
> validation for tnum as well in the future using similar framework.
> 
> More comments below
> 
> > When setting up initial register ranges we treat registers as one of
> > u64/s64/u32/s32 numeric types, and then independently perform conditional
> > comparisons based on a potentially different u64/s64/u32/s32 types. This
> > tests lots of tricky cases of deriving bounds information across
> > different numeric domains.
> > 
> > Given there are lots of auto-generated cases, we guard them behind
> > SLOW_TESTS=1 envvar requirement, and skip them altogether otherwise.
> > With current full set of upper/lower seed value, all supported
> > comparison operators and all the combinations of u64/s64/u32/s32 number
> > domains, we get about 7.7 million tests, which run in about 35 minutes
> > on my local qemu instance. So it's something that can be run manually
> > for exhaustive check in a reasonable time, and perhaps as a nightly CI
> > test, but certainly is too slow to run as part of a default test_progs run.
> 
> FWIW an alternative approach that speeds things up is to use model checkers
> like Z3 or CBMC. On my laptop, using Z3 to validate tnum_add() against *all*
> possible inputs takes less than 1.3 seconds[3] (based on code from [1]
> paper, but I somehow lost the link to their GitHub repository).

Found it. For reference, code used in "Sound, Precise, and Fast Abstract
Interpretation with Tristate Numbers"[1] can be found at
https://github.com/bpfverif/tnums-cgo22/blob/main/verification/tnum.py

Below is a truncated form of the above that only check tnum_add(), requires
a package called python3-z3 on most distros:

  #!/usr/bin/python3
  from uuid import uuid4
  from z3 import And, BitVec, BitVecRef, BitVecVal, Implies, prove
  
  SIZE = 64 # Working with 64-bit integers
  
  class Tnum:
      """A model of tristate number use in Linux kernel's BPF verifier.
      https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/tnum.c
      """
      val: BitVecRef
      mask: BitVecRef
  
      def __init__(self, val=None, mask=None):
          uid = uuid4() # Ensure that the BitVec are uniq, required by the Z3 solver
          self.val = BitVec(f'Tnum-val-{uid}', bv=SIZE) if val is None else val
          self.mask = BitVec(f'Tnum-mask-{uid}', bv=SIZE) if mask is None else mask
      
      def contains(self, bitvec: BitVecRef):
          # Simplified version of tnum_in()
          # https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/tnum.c#L167-L173
          return (~self.mask & bitvec) == self.val
      
      def wellformed(self):
          # Bit cannot be set in both val and mask, such tnum is not valid
          return self.val & self.mask == BitVecVal(0, bv=SIZE)
  
  # The function that we want to check
  def tnum_add(a: Tnum, b: Tnum):
      # Unmodified tnum_add()
      # https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/tnum.c#L62-L72
      sm = a.mask + b.mask
      sv = a.val + b.val
      sigma = sm + sv
      chi = sigma ^ sv
      mu = chi | a.mask | b.mask
      return Tnum(sv & ~mu, mu)
  
  t1 = Tnum()
  t2 = Tnum()
  
  x = BitVec('x', bv=SIZE) # Any possible 64-bit value
  y = BitVec('y', bv=SIZE) # same as above
  
  # Condition that needs to hold before we move forward to check tnum_add()
  premises = And(
      t1.wellformed(), # t1 and t2 is wellformed
      t2.wellformed(),
      t1.contains(x), # x is within t1, and y is within t2
      t2.contains(y),
  )
  
  # This ask Z3 solver to prove that tnum_add() work as intended
  prove(
      Implies(
          # Assuming that t1 and t2 is wellformed, x is within t1, and y is
          # within t2
          premises,
          # Below is what we'd like to check. Namely, for any random x whos
          # value is within t1, and any random y whos value is within t2,
          # (x+y) is always within the tnum produced by tnum_add(t1, t2)
          tnum_add(t1, t2).contains(x+y),
      )
  )

> One of the potential issue with [3] is that Z3Py is written in Python. So
> there's the large over head of translating the C-implementation into Python
> using Z3Py APIs each time we changed relevant code. This overhead could
> potentially be removed with CBMC, which understand C, and we had a
> precedence of using CBMC[4] within the kernel source code, though it was
> later removed[5] due because SRCU changes are still happening too fast for
> the format tests to keep up, so it looks like CBMC is not a silver-bullet.
> 
> I really meant to look into the CMBC approach for verification of ranges and
> tnum, but fails to allocate time for it, so far.
> 
> Shung-Hsi
> 
> > ...
> 
> 1: https://people.cs.rutgers.edu/~sn349/papers/cgo-2022.pdf
> 2: https://lore.kernel.org/bpf/20220430215727.113472-1-langston.barrett@gmail.com/
> 3: https://gist.github.com/shunghsiyu/a63e08e6231553d1abdece4aef29f70e
> 4: https://lore.kernel.org/all/1485295229-14081-3-git-send-email-paulmck@linux.vnet.ibm.com/

Also forgot to add the link to the removal of SRCU formal-verification tests

5: https://lore.kernel.org/all/20230717182337.1098991-2-paulmck@kernel.org/

  reply	other threads:[~2023-10-19  7:52 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-19  4:23 [PATCH v2 bpf-next 0/7] BPF register bounds logic and testing improvements Andrii Nakryiko
2023-10-19  4:23 ` [PATCH v2 bpf-next 1/7] bpf: improve JEQ/JNE branch taken logic Andrii Nakryiko
2023-10-19  4:24 ` [PATCH v2 bpf-next 2/7] bpf: derive smin/smax from umin/max bounds Andrii Nakryiko
2023-10-19  4:24 ` [PATCH v2 bpf-next 3/7] bpf: enhance subregister bounds deduction logic Andrii Nakryiko
2023-10-19  4:24 ` [PATCH v2 bpf-next 4/7] bpf: improve deduction of 64-bit bounds from 32-bit bounds Andrii Nakryiko
2023-10-19  4:24 ` [PATCH v2 bpf-next 5/7] bpf: try harder to deduce register bounds from different numeric domains Andrii Nakryiko
2023-10-19  4:24 ` [PATCH v2 bpf-next 6/7] bpf: drop knowledge-losing __reg_combine_{32,64}_into_{64,32} logic Andrii Nakryiko
2023-10-19  4:24 ` [PATCH v2 bpf-next 7/7] selftests/bpf: BPF register range bounds tester Andrii Nakryiko
2023-10-19  7:08   ` kernel test robot
2023-10-19 18:27     ` Andrii Nakryiko
2023-10-19  7:30   ` Shung-Hsi Yu
2023-10-19  7:52     ` Shung-Hsi Yu [this message]
2023-10-19 18:34       ` Andrii Nakryiko
2023-10-20 17:37         ` Srinivas Narayana Ganapathy
2023-10-22  4:42           ` Andrii Nakryiko
2023-10-23 14:05             ` Shung-Hsi Yu
2023-10-23 15:52               ` Paul Chaignon
2023-10-23 22:50                 ` Andrii Nakryiko
2023-10-24  5:51                   ` Andrii Nakryiko
2023-10-24 21:26                     ` Paul Chaignon
2023-10-26 22:47                       ` Andrii Nakryiko
2023-10-19 18:31     ` Andrii Nakryiko
2023-10-20 12:27       ` Shung-Hsi Yu
2023-10-21  4:13 ` [PATCH v2 bpf-next 0/7] BPF register bounds logic and testing improvements John Fastabend
2023-10-22  4:32   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTDgIyzBX9oZNeFw@u94a \
    --to=shung-hsi.yu@suse.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@meta.com \
    --cc=langston.barrett@gmail.com \
    --cc=martin.lau@kernel.org \
    --cc=santosh.nagarakatte@cs.rutgers.edu \
    --cc=srinivas.narayana@rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox