Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)

BPF List
 help / color / mirror / Atom feed

From: Eduard Zingerman <eddyz87@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrew Werner <awerner32@gmail.com>, bpf <bpf@vger.kernel.org>,
	Andrei Matei <andreimatei1@gmail.com>,
	Tamir Duberstein <tamird@gmail.com>,
	Joanne Koong <joannelkoong@gmail.com>,
	kernel-team@dataexmachina.dev, Song Liu <song@kernel.org>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>
Subject: Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)
Date: Mon, 25 Sep 2023 04:01:58 +0300	[thread overview]
Message-ID: <a777445dcb94c0029eb3bd3ddc96ddc493c85ad0.camel@gmail.com> (raw)
In-Reply-To: <CAEf4BzZ-NGiUVw+yCRCkrPQbJAS4wMBsT3e=eYVMuintqKDKqg@mail.gmail.com>

On Fri, 2023-09-22 at 13:52 -0700, Andrii Nakryiko wrote:
> > > I can't say I understand what exactly you are proposing and how you
> > > are going to determine these conservative precision marks. But I'd
> > > like to learn some new ideas, so please elaborate :)
> > 
> > I'll send a follow-up email, trying to figure out what to do with pointers.
> 
> Ok, sounds good.

(below explains why DFA was a bad idea, please proceed to the next
 section if that's not interesting).

My initial thought was that it would be possible to use a simple
live-variable DFA in order to compute conservative read and precise
marks. Like a textbook version:
- split code in basic blocks;
- for each instruction propagate liveness information backwards:
  - define USE(insn) to be a set of registers / stack slots read by
    this instruction;
  - define KILL(insn) to be a set of registers / stack slots written
    to by this instruction.
  - compute set of live variables before insn as:
    LIVE_in(insn) = USE(insn) U (LIVE_out(insn) / KILL(insn))
- compute basic block BB_out as a union of BB_in of each successor;
- etc... iterate in postorder until fixed point is reached.

However, this is a moot point because of presence of pointers.
For series of instructions like:
  "r1 = r10; r1 += -8; r1 = *(u64 *)(r1 + 0);"
Such algorithm would need to know possible range of values for r1.
Meaning that before executing liveness DFA some sort of value range
analysis would have to be run.
And these things might get complicated [1]
(I never implemented one, maybe it's not that bad).
So, basically that would be a mini-copy of the verifier but
implemented as a thing that computes fixed point instead of "tracing".
Probably a no-go.

[1] https://boxbase.org/entries/2017/oct/23/range_analysis_papers/

---

However, I think there are two more relatively simple options that
should give correct results.

Option A. "Exit or Loop"
------------------------

- Similar to Andrii's suggestion postpone exploration of
  bpf_iter_*_next() #N->#N+1 states for as long as there are other
  states to explore.
- When evaluating is_state_visited() for bpf_iter_*_next() #N->#N+1
  allow to use visited state with .branches > 0 to prune current
  state if:
  - states_equal() for visited and current states returns true;
  - all verfication paths starting from current instruction index with
    current iterator ref_obj_id either run to safe exit or run back to
    current instruction index with current iterator ref_obj_id.

The last rule is needed to:
- Ensure presence of read and precision marks in a state used for pruning.
- Give a meaning to the precise scalars comparisons in regsafe() which
  I think is currently missing for bpf_iter_*_next() .branches > 0 case.
  Precise scalars comparison for visited states with .branches == 0
  could be read as: "using any value from a specific range it is
  possible to reach safe exit".
  Precise scalars comparison for looping states from above
  could be read as: "using any value from a specific range it is
  possible to reach either safe exit or get back to this instruction".

This algorithm should be able to handle simple iteration patterns like:

    bpf_iter_*_new(&it1, ...)
    while (!bpf_iter_*_next(&it1)) { ... }
    bpf_iter_*_destroy(&it1)

And also handle nested iteration:

    bpf_iter_*_new(&it1, ...)
    while (!bpf_iter_*_next(&it1)) {
      bpf_iter_*_new(&it2, ...)
      while (!bpf_iter_*_next(&it2)) { ... }
      bpf_iter_*_destroy(&it2)
    }
    bpf_iter_*_destroy(&it1)

But it would fail to handle more complex patterns, e.g. interleaved
iteration:

    for (;;) {
      if (!bpf_iter_*_next(&it1))  // a
        break;
      if (!bpf_iter_*_next(&it2))  // b
        break;
      ...
    }

For any state originating in (a) there would always be a state in (b)
pending verification and vice versa. It would actually bail out even
if it1 == it2.

Option B. "Widening"
--------------------

The trivial fix for current .branches > 0 states comparison is to
force "exact" states comparisons for is_iter_next_insn() case:
1. Ignore liveness marks, as those are not finalized yet;
2. Ignore precision marks, as those are not finalized yet;
3. Use regs_exact() for scalars comparison.

This, however, is very restrictive as it fails to verify trivial
programs like (iters_looping/simplest_loop):

    sum = 0;
    bpf_iter_num_new(&it, 0, 10);
    while (!(r0 = bpf_iter_num_next()))
      sum += *(u32 *)r0;
    bpf_iter_num_destroy(&it);

Here ever increasing bounds for "sum" prevent regs_exact() from ever
returning true.

One way to lift this limitation is to borrow something like the
"widening" trick from the abstract interpretation papers mentioned
earlier:
- suppose there is current state C and a visited is_iter_next_insn()
  state V with .branches > 0;
- suppose states_equal(V, C) returns true but exact states comparison
  returns false because there are scalar values not marked as precise
  that differ between V and C.
- copy C as C' and mark these scalars as __mark_reg_unbounded() in C';
- run verification for C':
  - if verification succeeds -- problem solved;
  - if verification fails -- discard C' and proceed from C.

Such algorithm should succeed for programs that don't use widened
values in "precise" context.

Looking at testcases failing with trivial fix I think that such
limitations would be similar to those already present (e.g. the big
comment in progs/iters.c:iter_obfuscate_counter would still apply).

---

Option B appears to be simpler to implement so I'm proceeding with it.
Will share progress on Monday. Please let me know if there are any
obvious flaws with what I shared above.

Thanks,
Eduard

next prev parent reply	other threads:[~2023-09-25  1:02 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-07 14:04 [BUG] verifier escape with iteration helpers (bpf_loop, ...) Andrew Werner
2023-07-07 16:44 ` Alexei Starovoitov
2023-07-07 18:08   ` Andrii Nakryiko
2023-07-07 18:21     ` Andrew Werner
2023-09-17 21:37     ` Eduard Zingerman
2023-09-17 22:09       ` Kumar Kartikeya Dwivedi
2023-09-18 13:06         ` Eduard Zingerman
2023-09-19 16:28           ` Eduard Zingerman
2023-09-19 23:02             ` Andrii Nakryiko
2023-09-20  0:19               ` Eduard Zingerman
2023-09-20 16:20                 ` Eduard Zingerman
2023-09-20 16:57                   ` Andrii Nakryiko
2023-09-21  9:14                 ` Alexei Starovoitov
2023-09-21 11:03                   ` Eduard Zingerman
2023-09-21 12:56                     ` Alexei Starovoitov
2023-09-21 16:23                       ` Eduard Zingerman
2023-09-21 16:35                         ` Andrii Nakryiko
2023-09-21 18:16                           ` Eduard Zingerman
2023-09-22  1:01                             ` Eduard Zingerman
2023-09-22  2:48                               ` Andrii Nakryiko
2023-09-22 18:36                                 ` Eduard Zingerman
2023-09-22 20:52                                   ` Andrii Nakryiko
2023-09-25  1:01                                     ` Eduard Zingerman [this message]
2023-09-26  0:33                                       ` Andrii Nakryiko
2023-09-26 15:55                                         ` Eduard Zingerman
2023-09-26 16:25                                           ` Andrii Nakryiko
2023-09-28  1:09                                             ` Eduard Zingerman
2023-09-28 18:30                                               ` Andrii Nakryiko
2023-10-02  3:26                                                 ` Eduard Zingerman
2023-09-30  0:41                                               ` Alexei Starovoitov
2023-10-02  1:40                                                 ` Eduard Zingerman
2023-10-02 16:29                                                   ` Alexei Starovoitov
2023-10-02 17:18                                                     ` Eduard Zingerman
2023-10-03  0:05                                                       ` Alexei Starovoitov
2023-10-03  2:00                                                         ` Alexei Starovoitov
2023-10-03 15:33                                                         ` Eduard Zingerman
2023-10-03 16:07                                                           ` Alexei Starovoitov
2023-10-03 18:50                                                           ` Alexei Starovoitov
2023-10-03 21:52                                                             ` Eduard Zingerman
2023-10-03 22:03                                                               ` Eduard Zingerman
2023-10-03 23:08                                                               ` Alexei Starovoitov
2023-10-03 23:14                                                                 ` Eduard Zingerman
2023-10-04  0:22                                                                   ` Andrii Nakryiko
2023-10-04  1:05                                                                 ` Eduard Zingerman
2023-10-04  2:57                                                                   ` Alexei Starovoitov
2023-10-04  5:50                                                                     ` Alexei Starovoitov
2023-10-04  9:49                                                                       ` Eduard Zingerman
2023-10-04 11:52                                                                     ` Eduard Zingerman
2023-09-19 23:14       ` Andrii Nakryiko
2023-09-20  0:06         ` Eduard Zingerman
2023-09-20 16:37           ` Andrii Nakryiko
2023-09-20 17:13             ` Eduard Zingerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a777445dcb94c0029eb3bd3ddc96ddc493c85ad0.camel@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andreimatei1@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=awerner32@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=kernel-team@dataexmachina.dev \
    --cc=memxor@gmail.com \
    --cc=song@kernel.org \
    --cc=tamird@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox