From: Jakub Sitnicki <jakub@cloudflare.com>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
bpf@vger.kernel.org, Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>
Subject: Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
Date: Wed, 09 Mar 2022 09:36:33 +0100 [thread overview]
Message-ID: <871qzbz5sa.fsf@cloudflare.com> (raw)
In-Reply-To: <8d8b464f6c2820989d67f00d24b6003b8b758274.camel@linux.ibm.com>
On Wed, Mar 09, 2022 at 12:58 AM +01, Ilya Leoshkevich wrote:
> On Tue, 2022-03-08 at 16:01 +0100, Jakub Sitnicki wrote:
>> On Tue, Feb 22, 2022 at 07:25 PM +01, Ilya Leoshkevich wrote:
>> > Verifier treats bpf_sk_lookup.remote_port as a 32-bit field for
>> > backward compatibility, regardless of what the uapi headers say.
>> > This field is mapped onto the 16-bit bpf_sk_lookup_kern.sport
>> > field.
>> > Therefore, accessing the most significant 16 bits of
>> > bpf_sk_lookup.remote_port must produce 0, which is currently not
>> > the case.
>> >
>> > The problem is that narrow loads with offset - commit 46f53a65d2de
>> > ("bpf: Allow narrow loads with offset > 0"), don't play nicely with
>> > the masking optimization - commit 239946314e57 ("bpf: possibly
>> > avoid
>> > extra masking for narrower load in verifier"). In particular, when
>> > we
>> > suppress extra masking, we suppress shifting as well, which is not
>> > correct.
>> >
>> > Fix by moving the masking suppression check to BPF_AND generation.
>> >
>> > Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
>> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
>> > ---
>> > kernel/bpf/verifier.c | 14 +++++++++-----
>> > 1 file changed, 9 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> > index d7473fee247c..195f2e9b5a47 100644
>> > --- a/kernel/bpf/verifier.c
>> > +++ b/kernel/bpf/verifier.c
>> > @@ -12848,7 +12848,7 @@ static int convert_ctx_accesses(struct
>> > bpf_verifier_env *env)
>> > return -EINVAL;
>> > }
>> >
>> > - if (is_narrower_load && size < target_size) {
>> > + if (is_narrower_load) {
>> > u8 shift = bpf_ctx_narrow_access_offset(
>> > off, size, size_default) * 8;
>> > if (shift && cnt + 1 >=
>> > ARRAY_SIZE(insn_buf)) {
>> > @@ -12860,15 +12860,19 @@ static int convert_ctx_accesses(struct
>> > bpf_verifier_env *env)
>> > insn_buf[cnt++] =
>> > BPF_ALU32_IMM(BPF_RSH,
>> >
>> > insn->dst_reg,
>> >
>> > shift);
>> > - insn_buf[cnt++] =
>> > BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
>> > - (1
>> > << size * 8) - 1);
>> > + if (size < target_size)
>> > + insn_buf[cnt++] =
>> > BPF_ALU32_IMM(
>> > + BPF_AND, insn-
>> > >dst_reg,
>> > + (1 << size * 8) -
>> > 1);
>> > } else {
>> > if (shift)
>> > insn_buf[cnt++] =
>> > BPF_ALU64_IMM(BPF_RSH,
>> >
>> > insn->dst_reg,
>> >
>> > shift);
>> > - insn_buf[cnt++] =
>> > BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
>> > -
>> > (1ULL
>> > << size * 8) - 1);
>> > + if (size < target_size)
>> > + insn_buf[cnt++] =
>> > BPF_ALU64_IMM(
>> > + BPF_AND, insn-
>> > >dst_reg,
>> > + (1ULL << size * 8)
>> > - 1);
>> > }
>> > }
>>
>> Thanks for patience. I'm coming back to this.
>>
>> This fix affects the 2-byte load from bpf_sk_lookup.remote_port.
>> Dumping the xlated BPF code confirms it.
>>
>> On LE (x86-64) things look well.
>>
>> Before this patch:
>>
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>> 0: (69) r2 = *(u16 *)(r1 +4)
>> 1: (b7) r0 = 0
>> 2: (95) exit
>>
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>> 0: (69) r2 = *(u16 *)(r1 +4)
>> 1: (b7) r0 = 0
>> 2: (95) exit
>>
>> After this patch:
>>
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>> 0: (69) r2 = *(u16 *)(r1 +4)
>> 1: (b7) r0 = 0
>> 2: (95) exit
>>
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>> 0: (69) r2 = *(u16 *)(r1 +4)
>> 1: (74) w2 >>= 16
>> 2: (b7) r0 = 0
>> 3: (95) exit
>>
>> Which works great because the JIT generates a zero-extended load
>> movzwq:
>>
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>> bpf_prog_5e4fe3dbdcb18fd3:
>> 0: nopl 0x0(%rax,%rax,1)
>> 5: xchg %ax,%ax
>> 7: push %rbp
>> 8: mov %rsp,%rbp
>> b: movzwq 0x4(%rdi),%rsi
>> 10: xor %eax,%eax
>> 12: leave
>> 13: ret
>>
>>
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>> bpf_prog_4a6336c64a340b96:
>> 0: nopl 0x0(%rax,%rax,1)
>> 5: xchg %ax,%ax
>> 7: push %rbp
>> 8: mov %rsp,%rbp
>> b: movzwq 0x4(%rdi),%rsi
>> 10: shr $0x10,%esi
>> 13: xor %eax,%eax
>> 15: leave
>> 16: ret
>>
>> Runtime checks for bpf_sk_lookup.remote_port load and the 2-bytes of
>> zero padding following it, like below, pass with flying colors:
>>
>> ok = ctx->remote_port == bpf_htons(8008);
>> if (!ok)
>> return SK_DROP;
>> ok = *((__u16 *)&ctx->remote_port + 1) == 0;
>> if (!ok)
>> return SK_DROP;
>>
>> (The above checks compile to half-word (2-byte) loads.)
>>
>>
>> On BE (s390x) things look different:
>>
>> Before the patch:
>>
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>> 0: (69) r2 = *(u16 *)(r1 +4)
>> 1: (bc) w2 = w2
>> 2: (b7) r0 = 0
>> 3: (95) exit
>>
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>> 0: (69) r2 = *(u16 *)(r1 +4)
>> 1: (bc) w2 = w2
>> 2: (b7) r0 = 0
>> 3: (95) exit
>>
>> After the patch:
>>
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>> 0: (69) r2 = *(u16 *)(r1 +4)
>> 1: (bc) w2 = w2
>> 2: (74) w2 >>= 16
>> 3: (bc) w2 = w2
>> 4: (b7) r0 = 0
>> 5: (95) exit
>>
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>> 0: (69) r2 = *(u16 *)(r1 +4)
>> 1: (bc) w2 = w2
>> 2: (b7) r0 = 0
>> 3: (95) exit
>>
>> These compile to:
>>
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>> bpf_prog_fdd58b8caca29f00:
>> 0: j 0x0000000000000006
>> 4: nopr
>> 6: stmg %r11,%r15,112(%r15)
>> c: la %r13,64(%r15)
>> 10: aghi %r15,-96
>> 14: llgh %r3,4(%r2,%r0)
>> 1a: srl %r3,16
>> 1e: llgfr %r3,%r3
>> 22: lgfi %r14,0
>> 28: lgr %r2,%r14
>> 2c: lmg %r11,%r15,208(%r15)
>> 32: br %r14
>>
>>
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>> bpf_prog_5e3d8e92223c6841:
>> 0: j 0x0000000000000006
>> 4: nopr
>> 6: stmg %r11,%r15,112(%r15)
>> c: la %r13,64(%r15)
>> 10: aghi %r15,-96
>> 14: llgh %r3,4(%r2,%r0)
>> 1a: lgfi %r14,0
>> 20: lgr %r2,%r14
>> 24: lmg %r11,%r15,208(%r15)
>> 2a: br %r14
>>
>> Now, we right shift the value when loading
>>
>> *(u16 *)(r1 +36)
>>
>> which in C BPF is equivalent to
>>
>> *((__u16 *)&ctx->remote_port + 0)
>>
>> due to how the shift is calculated by bpf_ctx_narrow_access_offset().
>
> Right, that's exactly the intention here.
> The way I see the situation is: the ABI forces us to treat remote_port
> as a 32-bit field, even though the updated header now says otherwise.
> And this:
>
> unsigned int remote_port;
> unsigned short result = *(unsigned short *)remote_port;
>
> should be the same as:
>
> unsigned short result = remote_port >> 16;
>
> on big-endian. Note that this is inherently non-portable.
>
>> This makes the expected typical use-case
>>
>> ctx->remote_port == bpf_htons(8008)
>>
>> fail on s390x because llgh (Load Logical Halfword (64<-16)) seems to
>> lay
>> out the data in the destination register so that it holds
>> 0x0000_0000_0000_1f48.
>>
>> I don't know that was the intention here, as it makes the BPF C code
>> non-portable.
>>
>> WDYT?
>
> This depends on how we define the remote_port field. I would argue that
> the definition from patch 2 - even though ugly - is the correct one.
> It is consistent with both the little-endian (1f 48 00 00) and
> big-endian (00 00 1f 48) ABIs.
>
> I don't think the current definition is correct, because it expects
> 1f 48 00 00 on big-endian, and this is not the case. We can verify this
> by taking 9a69e2^ and applying
>
> --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> @@ -417,6 +417,8 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
> return SK_DROP;
> if (LSW(ctx->remote_port, 0) != SRC_PORT)
> return SK_DROP;
> + if (ctx->remote_port != SRC_PORT)
> + return SK_DROP;
>
> /* Narrow loads from local_port field. Expect DST_PORT. */
> if (LSB(ctx->local_port, 0) != ((DST_PORT >> 0) & 0xff) ||
>
> Therefore that
>
> ctx->remote_port == bpf_htons(8008)
>
> fails without patch 2 is as expected.
>
Consider this - today the below is true on both LE and BE, right?
*(u32 *)&ctx->remote_port == *(u16 *)&ctx->remote_port
because the loads get converted to:
*(u16 *)&ctx_kern->sport == *(u16 *)&ctx_kern->sport
IOW, today, because of the bug that you are fixing here, the data layout
changes from the PoV of the BPF program depending on the load size.
With 2-byte loads, without this patch, the data layout appears as:
struct bpf_sk_lookup {
...
__be16 remote_port;
__be16 remote_port;
...
}
While for 4-byte loads, it appears as in your 2nd patch:
struct bpf_sk_lookup {
...
#if little-endian
__be16 remote_port;
__u16 :16; /* zero padding */
#elif big-endian
__u16 :16; /* zero padding */
__be16 remote_port;
#endif
...
}
Because of that I don't see how we could keep complete ABI compatiblity,
and have just one definition of struct bpf_sk_lookup that reflects
it. These are conflicting requirements.
I'd bite the bullet for 4-byte loads, for the sake of having an
endian-agnostic struct bpf_sk_lookup and struct bpf_sock definition in
the uAPI header.
The sacrifice here is that the access converter will have to keep
rewriting 4-byte access to bpf_sk_lookup.remote_port and
bpf_sock.dst_port in this unexpected, quirky manner.
The expectation is that with time users will recompile their BPF progs
against the updated bpf.h, and switch to 2-byte loads. That will make
the quirk in the access converter dead code in time.
I don't have any better ideas. Sorry.
[...]
next prev parent reply other threads:[~2022-03-09 12:09 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-22 18:25 [PATCH RFC bpf-next 0/3] bpf_sk_lookup.remote_port fixes Ilya Leoshkevich
2022-02-22 18:25 ` [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets Ilya Leoshkevich
2022-03-08 15:01 ` Jakub Sitnicki
2022-03-08 23:58 ` Ilya Leoshkevich
2022-03-09 8:36 ` Jakub Sitnicki [this message]
2022-03-09 12:34 ` Ilya Leoshkevich
2022-03-10 22:57 ` Jakub Sitnicki
2022-03-14 17:35 ` Jakub Sitnicki
2022-03-14 18:25 ` Ilya Leoshkevich
2022-03-14 20:57 ` Jakub Sitnicki
2022-02-22 18:25 ` [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian Ilya Leoshkevich
2022-02-27 2:44 ` Alexei Starovoitov
2022-02-27 20:30 ` Jakub Sitnicki
2022-02-28 10:19 ` Ilya Leoshkevich
2022-02-28 13:26 ` Jakub Sitnicki
2022-03-01 0:39 ` Ilya Leoshkevich
2022-03-01 0:40 ` Ilya Leoshkevich
2022-02-22 18:25 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Adapt bpf_sk_lookup.remote_port loads Ilya Leoshkevich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871qzbz5sa.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=andrii.nakryiko@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=iii@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.