From: Jakub Sitnicki <jakub@cloudflare.com>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
bpf@vger.kernel.org, Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>
Subject: Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
Date: Tue, 08 Mar 2022 16:01:05 +0100 [thread overview]
Message-ID: <87bkygzbg5.fsf@cloudflare.com> (raw)
In-Reply-To: <20220222182559.2865596-2-iii@linux.ibm.com>
On Tue, Feb 22, 2022 at 07:25 PM +01, Ilya Leoshkevich wrote:
> Verifier treats bpf_sk_lookup.remote_port as a 32-bit field for
> backward compatibility, regardless of what the uapi headers say.
> This field is mapped onto the 16-bit bpf_sk_lookup_kern.sport field.
> Therefore, accessing the most significant 16 bits of
> bpf_sk_lookup.remote_port must produce 0, which is currently not
> the case.
>
> The problem is that narrow loads with offset - commit 46f53a65d2de
> ("bpf: Allow narrow loads with offset > 0"), don't play nicely with
> the masking optimization - commit 239946314e57 ("bpf: possibly avoid
> extra masking for narrower load in verifier"). In particular, when we
> suppress extra masking, we suppress shifting as well, which is not
> correct.
>
> Fix by moving the masking suppression check to BPF_AND generation.
>
> Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
> kernel/bpf/verifier.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d7473fee247c..195f2e9b5a47 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12848,7 +12848,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> return -EINVAL;
> }
>
> - if (is_narrower_load && size < target_size) {
> + if (is_narrower_load) {
> u8 shift = bpf_ctx_narrow_access_offset(
> off, size, size_default) * 8;
> if (shift && cnt + 1 >= ARRAY_SIZE(insn_buf)) {
> @@ -12860,15 +12860,19 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
> insn->dst_reg,
> shift);
> - insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
> - (1 << size * 8) - 1);
> + if (size < target_size)
> + insn_buf[cnt++] = BPF_ALU32_IMM(
> + BPF_AND, insn->dst_reg,
> + (1 << size * 8) - 1);
> } else {
> if (shift)
> insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
> insn->dst_reg,
> shift);
> - insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
> - (1ULL << size * 8) - 1);
> + if (size < target_size)
> + insn_buf[cnt++] = BPF_ALU64_IMM(
> + BPF_AND, insn->dst_reg,
> + (1ULL << size * 8) - 1);
> }
> }
Thanks for patience. I'm coming back to this.
This fix affects the 2-byte load from bpf_sk_lookup.remote_port.
Dumping the xlated BPF code confirms it.
On LE (x86-64) things look well.
Before this patch:
* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
0: (69) r2 = *(u16 *)(r1 +4)
1: (b7) r0 = 0
2: (95) exit
* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
0: (69) r2 = *(u16 *)(r1 +4)
1: (b7) r0 = 0
2: (95) exit
After this patch:
* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
0: (69) r2 = *(u16 *)(r1 +4)
1: (b7) r0 = 0
2: (95) exit
* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
0: (69) r2 = *(u16 *)(r1 +4)
1: (74) w2 >>= 16
2: (b7) r0 = 0
3: (95) exit
Which works great because the JIT generates a zero-extended load movzwq:
* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
bpf_prog_5e4fe3dbdcb18fd3:
0: nopl 0x0(%rax,%rax,1)
5: xchg %ax,%ax
7: push %rbp
8: mov %rsp,%rbp
b: movzwq 0x4(%rdi),%rsi
10: xor %eax,%eax
12: leave
13: ret
* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
bpf_prog_4a6336c64a340b96:
0: nopl 0x0(%rax,%rax,1)
5: xchg %ax,%ax
7: push %rbp
8: mov %rsp,%rbp
b: movzwq 0x4(%rdi),%rsi
10: shr $0x10,%esi
13: xor %eax,%eax
15: leave
16: ret
Runtime checks for bpf_sk_lookup.remote_port load and the 2-bytes of
zero padding following it, like below, pass with flying colors:
ok = ctx->remote_port == bpf_htons(8008);
if (!ok)
return SK_DROP;
ok = *((__u16 *)&ctx->remote_port + 1) == 0;
if (!ok)
return SK_DROP;
(The above checks compile to half-word (2-byte) loads.)
On BE (s390x) things look different:
Before the patch:
* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
0: (69) r2 = *(u16 *)(r1 +4)
1: (bc) w2 = w2
2: (b7) r0 = 0
3: (95) exit
* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
0: (69) r2 = *(u16 *)(r1 +4)
1: (bc) w2 = w2
2: (b7) r0 = 0
3: (95) exit
After the patch:
* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
0: (69) r2 = *(u16 *)(r1 +4)
1: (bc) w2 = w2
2: (74) w2 >>= 16
3: (bc) w2 = w2
4: (b7) r0 = 0
5: (95) exit
* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
0: (69) r2 = *(u16 *)(r1 +4)
1: (bc) w2 = w2
2: (b7) r0 = 0
3: (95) exit
These compile to:
* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
bpf_prog_fdd58b8caca29f00:
0: j 0x0000000000000006
4: nopr
6: stmg %r11,%r15,112(%r15)
c: la %r13,64(%r15)
10: aghi %r15,-96
14: llgh %r3,4(%r2,%r0)
1a: srl %r3,16
1e: llgfr %r3,%r3
22: lgfi %r14,0
28: lgr %r2,%r14
2c: lmg %r11,%r15,208(%r15)
32: br %r14
* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
bpf_prog_5e3d8e92223c6841:
0: j 0x0000000000000006
4: nopr
6: stmg %r11,%r15,112(%r15)
c: la %r13,64(%r15)
10: aghi %r15,-96
14: llgh %r3,4(%r2,%r0)
1a: lgfi %r14,0
20: lgr %r2,%r14
24: lmg %r11,%r15,208(%r15)
2a: br %r14
Now, we right shift the value when loading
*(u16 *)(r1 +36)
which in C BPF is equivalent to
*((__u16 *)&ctx->remote_port + 0)
due to how the shift is calculated by bpf_ctx_narrow_access_offset().
This makes the expected typical use-case
ctx->remote_port == bpf_htons(8008)
fail on s390x because llgh (Load Logical Halfword (64<-16)) seems to lay
out the data in the destination register so that it holds
0x0000_0000_0000_1f48.
I don't know that was the intention here, as it makes the BPF C code
non-portable.
WDYT?
BTW. Out of curiosity, how does a Logical Load Halfword (llgh) differ
differ from a non-logical Load Halfword (lgh) on s390x? Compiler
Explorer generates a non-logical load for similar C code.
next prev parent reply other threads:[~2022-03-08 15:54 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-22 18:25 [PATCH RFC bpf-next 0/3] bpf_sk_lookup.remote_port fixes Ilya Leoshkevich
2022-02-22 18:25 ` [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets Ilya Leoshkevich
2022-03-08 15:01 ` Jakub Sitnicki [this message]
2022-03-08 23:58 ` Ilya Leoshkevich
2022-03-09 8:36 ` Jakub Sitnicki
2022-03-09 12:34 ` Ilya Leoshkevich
2022-03-10 22:57 ` Jakub Sitnicki
2022-03-14 17:35 ` Jakub Sitnicki
2022-03-14 18:25 ` Ilya Leoshkevich
2022-03-14 20:57 ` Jakub Sitnicki
2022-02-22 18:25 ` [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian Ilya Leoshkevich
2022-02-27 2:44 ` Alexei Starovoitov
2022-02-27 20:30 ` Jakub Sitnicki
2022-02-28 10:19 ` Ilya Leoshkevich
2022-02-28 13:26 ` Jakub Sitnicki
2022-03-01 0:39 ` Ilya Leoshkevich
2022-03-01 0:40 ` Ilya Leoshkevich
2022-02-22 18:25 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Adapt bpf_sk_lookup.remote_port loads Ilya Leoshkevich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bkygzbg5.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=andrii.nakryiko@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=iii@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox