Re: [RFC bpf-next] bpf, verifier: improve signed ranges inference for BPF_AND

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eduard Zingerman <eddyz87@gmail.com>
To: Shung-Hsi Yu <shung-hsi.yu@suse.com>,
	Xu Kuohai <xukuohai@huaweicloud.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	 Roberto Sassu <roberto.sassu@huawei.com>,
	Edward Cree <ecree.xilinx@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>,
	Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>,
	 Srinivas Narayana <srinivas.narayana@rutgers.edu>,
	Matan Shachnai <m.shachnai@rutgers.edu>
Subject: Re: [RFC bpf-next] bpf, verifier: improve signed ranges inference for BPF_AND
Date: Wed, 17 Jul 2024 14:10:35 -0700	[thread overview]
Message-ID: <be239a5581e5b7d5c6f310c2a4c11282aa5896b5.camel@gmail.com> (raw)
In-Reply-To: <ykuhustu7vt2ilwhl32kj655xfdgdlm2xkl5rff6tw2ycksovp@ss2n4gpjysnw>

On Tue, 2024-07-16 at 22:52 +0800, Shung-Hsi Yu wrote:

[...]

> To allow verification of such instruction pattern, update
> scalar*_min_max_and() to infer signed ranges directly from signed ranges
> of the operands. With BPF_AND, the resulting value always gains more
> unset '0' bit, thus it only move towards 0x0000000000000000. The
> difficulty lies with how to deal with signs. While non-negative
> (positive and zero) value simply grows smaller, a negative number can
> grows smaller, but may also underflow and become a larger value.
> 
> To better address this situation we split the signed ranges into
> negative range and non-negative range cases, ignoring the mixed sign
> cases for now; and only consider how to calculate smax_value.
> 
> Since negative range & negative range preserve the sign bit, so we know
> the result is still a negative value, thus it only move towards S64_MIN,
> but never underflow, thus a save bet is to use a value in ranges that is
> closet to 0, thus "max(dst_reg->smax_value, src->smax_value)". For
> negative range & positive range the sign bit is always cleared, thus we
> know the resulting is a non-negative, and only moves towards 0, so a
> safe bet is to use smax_value of the non-negative range. Last but not
> least, non-negative range & non-negative range is still a non-negative
> value, and only moves towards 0; however same as the unsigned range
> case, the maximum is actually capped by the lesser of the two, and thus
> min(dst_reg->smax_value, src_reg->smax_value);
> 
> Listing out the above reasoning as a table (dst_reg abbreviated as dst,
> src_reg abbreviated as src, smax_value abbrivated as smax) we get:
> 
>                         |                         src_reg
>        smax = ?         +---------------------------+---------------------------
>                         |        negative           |       non-negative
> ---------+--------------+---------------------------+---------------------------
>          | negative     | max(dst->smax, src->smax) |         src->smax
> dst_reg  +--------------+---------------------------+---------------------------
>          | non-negative |         dst->smax         | min(dst->smax, src->smax)
> 
> However this is quite complicated, luckily it can be simplified given
> the following observations
> 
>     max(dst_reg->smax_value, src_reg->smax_value) >= src_reg->smax_value
>     max(dst_reg->smax_value, src_reg->smax_value) >= dst_reg->smax_value
>     max(dst_reg->smax_value, src_reg->smax_value) >= min(dst_reg->smax_value, src_reg->smax_value)
> 
> So we could substitute the cells in the table above all with max(...),
> and arrive at:
> 
>                         |                         src_reg
>       smax' = ?         +---------------------------+---------------------------
>                         |        negative           |       non-negative
> ---------+--------------+---------------------------+---------------------------
>          | negative     | max(dst->smax, src->smax) | max(dst->smax, src->smax)
> dst_reg  +--------------+---------------------------+---------------------------
>          | non-negative | max(dst->smax, src->smax) | max(dst->smax, src->smax)
> 
> Meaning that simply using
> 
>   max(dst_reg->smax_value, src_reg->smax_value)
> 
> to calculate the resulting smax_value would work across all sign combinations.
> 
> 
> For smin_value, we know that both non-negative range & non-negative
> range and negative range & non-negative range both result in a
> non-negative value, so an easy guess is to use the minimum non-negative
> value, thus 0.
> 
>                         |                         src_reg
>        smin = ?         +----------------------------+---------------------------
>                         |          negative          |       non-negative
> ---------+--------------+----------------------------+---------------------------
>          | negative     |             ?              |             0
> dst_reg  +--------------+----------------------------+---------------------------
>          | non-negative |             0              |             0
> 
> This leave the negative range & negative range case to be considered. We
> know that negative range & negative range always yield a negative value,
> so a preliminary guess would be S64_MIN. However, that guess is too
> imprecise to help with the r0 <<= 62, r0 s>>= 63, r0 &= -13 pattern
> we're trying to deal with here.
> 
> This can be further improve with the observation that for negative range
> & negative range, the smallest possible value must be one that has
> longest _common_ most-significant set '1' bits sequence, thus we can use
> min(dst_reg->smin_value, src->smin_value) as the starting point, as the
> smaller value will be the one with the shorter most-significant set '1'
> bits sequence. But that alone is not enough, as we do not know whether
> rest of the bits would be set, so the safest guess would be one that
> clear alls bits after the most-significant set '1' bits sequence,
> something akin to bit_floor(), but for rounding to a negative power-of-2
> instead.
> 
>     negative_bit_floor(0xffff000000000003) == 0xffff000000000000
>     negative_bit_floor(0xf0ff0000ffff0000) == 0xf000000000000000
>     negative_bit_floor(0xfffffb0000000000) == 0xfffff80000000000
> 
> With negative range & negative range solve, we now have:
> 
>                         |                         src_reg
>        smin = ?         +----------------------------+---------------------------
>                         |        negative            |       non-negative
> ---------+--------------+----------------------------+---------------------------
>          |   negative   |negative_bit_floor(         |             0
>          |              |  min(dst->smin, src->smin))|
> dst_reg  +--------------+----------------------------+---------------------------
>          | non-negative |           0                |             0
> 
> This can be further simplied since min(dst->smin, src->smin) < 0 when both
> dst_reg and src_reg have a negative range. Which means using
> 
>     negative_bit_floor(min(dst_reg->smin_value, src_reg->smin_value)
> 
> to calculate the resulting smin_value would work across all sign combinations.
> 
> Together these allows us to infer the signed range of the result of BPF_AND
> operation using the signed range from its operands.


Hi Shung-Hsi,

This seems quite elegant.
As an additional check, I did a simple brute-force for all possible
ranges of 6-bit integers and bounds are computed safely.

[...]

next prev parent reply	other threads:[~2024-07-17 21:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11 11:38 [PATCH bpf-next v4 13/20] bpf, lsm: Add check for BPF LSM return value Xu Kuohai
2024-07-11 11:38 ` [PATCH bpf-next v4 14/20] bpf: Prevent tail call between progs attached to different hooks Xu Kuohai
2024-07-11 11:38 ` [PATCH bpf-next v4 15/20] bpf: Fix compare error in function retval_range_within Xu Kuohai
2024-07-11 11:38 ` [PATCH bpf-next v4 16/20] bpf: Add a special case for bitwise AND on range [-1, 0] Xu Kuohai
2024-07-15 15:29   ` Shung-Hsi Yu
2024-07-16  7:05     ` Xu Kuohai
2024-07-16 14:52       ` [RFC bpf-next] bpf, verifier: improve signed ranges inference for BPF_AND Shung-Hsi Yu
2024-07-16 15:10         ` Shung-Hsi Yu
2024-07-17 21:10         ` Eduard Zingerman [this message]
2024-07-19  8:32           ` Shung-Hsi Yu
2024-07-28 22:38         ` Harishankar Vishwanathan
2024-07-30  4:25           ` Shung-Hsi Yu
2024-08-02 21:30             ` Harishankar Vishwanathan
2024-07-16 15:19       ` [PATCH bpf-next v4 16/20] bpf: Add a special case for bitwise AND on range [-1, 0] Shung-Hsi Yu
2024-07-11 11:38 ` [PATCH bpf-next v4 17/20] selftests/bpf: Avoid load failure for token_lsm.c Xu Kuohai
2024-07-11 11:38 ` [PATCH bpf-next v4 18/20] selftests/bpf: Add return value checks for failed tests Xu Kuohai
2024-07-11 11:38 ` [PATCH bpf-next v4 19/20] selftests/bpf: Add test for lsm tail call Xu Kuohai
2024-07-11 11:38 ` [PATCH bpf-next v4 20/20] selftests/bpf: Add verifier tests for bpf lsm Xu Kuohai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be239a5581e5b7d5c6f310c2a4c11282aa5896b5.camel@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=ecree.xilinx@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=harishankar.vishwanathan@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=m.shachnai@rutgers.edu \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=roberto.sassu@huawei.com \
    --cc=santosh.nagarakatte@rutgers.edu \
    --cc=sdf@google.com \
    --cc=shung-hsi.yu@suse.com \
    --cc=song@kernel.org \
    --cc=srinivas.narayana@rutgers.edu \
    --cc=xukuohai@huaweicloud.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).