All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	kernel-team@cloudflare.com,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>
Subject: Re: [PATCH bpf-next] selftests/bpf: Fix implementation-defined behavior in sk_lookup test
Date: Tue, 22 Feb 2022 15:53:00 +0100	[thread overview]
Message-ID: <87fsoakjj6.fsf@cloudflare.com> (raw)
In-Reply-To: <0eeac90306f03b4fdb2b028ffb509e4d20121aec.camel@linux.ibm.com>

On Tue, Feb 22, 2022 at 03:22 AM +01, Ilya Leoshkevich wrote:
> On Tue, 2022-02-22 at 01:43 +0100, Ilya Leoshkevich wrote:
>> On Mon, 2022-02-21 at 22:39 +0100, Ilya Leoshkevich wrote:
>> > On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote:
>> > > Shifting 16-bit type by 16 bits is implementation-defined for BPF
>> > > programs.
>> > > Don't rely on it in case it is causing the test failures we are
>> > > seeing on
>> > > s390x z15 target.
>> > > 
>> > > Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load from
>> > > remote_port in bpf_sk_lookup")
>> > > Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
>> > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> > > ---
>> > > 
>> > > I don't have a dev env for s390x/z15 set up yet, so can't
>> > > definitely
>> > > confirm the fix.
>> > > That said, it seems worth fixing either way.
>> > > 
>> > >  tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 ++-
>> > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > > 
>> > > diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > index bf5b7caefdd0..7d47276a8964 100644
>> > > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
>> > > @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A = SERVER_A;
>> > >  static const __u32 KEY_SERVER_B = SERVER_B;
>> > >  
>> > >  static const __u16 SRC_PORT = bpf_htons(8008);
>> > > +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U << 16);
>> > >  static const __u32 SRC_IP4 = IP4(127, 0, 0, 2);
>> > >  static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0,
>> > > 0x00000002);
>> > >  
>> > > @@ -421,7 +422,7 @@ int ctx_narrow_access(struct bpf_sk_lookup
>> > > *ctx)
>> > >  
>> > >         /* Load from remote_port field with zero padding
>> > > (backward
>> > > compatibility) */
>> > >         val_u32 = *(__u32 *)&ctx->remote_port;
>> > > -       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
>> > > +       if (val_u32 != SRC_PORT_U32)
>> > >                 return SK_DROP;
>> > >  
>> > >         /* Narrow loads from local_port field. Expect DST_PORT.
>> > > */
>> > 
>> > Unfortunately this doesn't help with the s390 problem.
>> > I'll try to debug this.
>> 
>> I have to admit I have a hard time wrapping my head around the
>> requirements here.
>> 
>> Based on the pre-9a69e2b385f4 code, do I understand correctly that
>> for the following input
>> 
>> Port:     0x1f48
>> SRC_PORT: 0x481f
>> 
>> we expect the following results for different kinds of loads:
>> 
>> Size   Offset  LE      BE
>> BPF_B  0       0x1f    0
>> BPF_B  1       0x48    0
>> BPF_B  2       0       0x48
>> BPF_B  3       0       0x1f
>> BPF_H  0       0x481f  0
>> BPF_H  1       0       0x481f
>> BPF_W  0       0x481f  0x481f
>> 
>> and this is guaranteed by the struct bpf_sk_lookup ABI? Because then
>> it
>> looks as if 9a69e2b385f4 breaks it on big-endian as follows:
>> 
>> Size   Offset  BE-9a69e2b385f4
>> BPF_B  0       0x48
>> BPF_B  1       0x1f
>> BPF_B  2       0
>> BPF_B  3       0
>> BPF_H  0       0x481f
>> BPF_H  1       0
>> BPF_W  0       0x481f0000
>
> Sorry, I worded this incorrectly: 9a69e2b385f4 did not change the
> kernel behavior, the ABI is not broken and the old compiled code should
> continue to work.
> What the second table really shows are what the results should be
> according to the 9a69e2b385f4 struct bpf_sk_lookup definition, which I
> still think is broken on big-endian and needs to be adjusted to match
> the ABI.
>
> I noticed one other strange thing in the meantime: loads from
> *(__u32 *)&ctx->remote_port, *(__u16 *)&ctx->remote_port and
> *((__u16 *)&ctx->remote_port + 1) all produce 8008 on s390, which is
> clearly inconsistent. It looks as if convert_ctx_accesses() needs to be
> adjusted to handle combinations like ctx_field_size == 4 && size == 2
> && target_size == 2. I will continue with this tomorrow.
>
>> Or is the old behavior a bug and this new one is desirable?
>> 9a69e2b385f4 has no Fixes: tag, so I assume that's the former :-(
>> 
>> In which case, would it make sense to fix it by swapping remote_port
>> and :16 in bpf_sk_lookup on big-endian?

Thanks for looking into it.

When it comes to requirements, my intention was to keep the same
behavior as before the split up of the remote_port field in 9a69e2b385f4
("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide").

9a69e2b385f4 was supposed to be a formality, after a similar change in
4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit
wide"), which went in earlier.

In 4421a582718a I've provided a bit more context. I understand that the
remote_port value, even before the type changed from u32 to u16,
appeared to the BPF program as if laid out in memory like so:

      offsetof(struct bpf_sk_lookup, remote_port) +0  <port MSB>
                                                  +1  <port LSB>
                                                  +2  0x00
                                                  +3  0x00

Translating it to your handy table format, I expect should result in
loads as so if port is 8008 = 0x1f48:

      Size   Offset  LE      BE
      BPF_B  0       0x1f    0x1f
      BPF_B  1       0x48    0x48
      BPF_B  2       0       0
      BPF_B  3       0       0
      BPF_H  0       0x481f  0x1f48
      BPF_H  1       0       0
      BPF_W  0       0x481f  0x1f480000

But since the fix does not work, there must be a mistake somewhere in my
reasoning.

I expect I should be able to get virtme for s390 working sometime this
week to check my math. I've seen your collegue had some luck with it
[1].

Looking forward to your findings.

[1] https://github.com/cilium/ebpf/issues/86#issuecomment-623945549

  reply	other threads:[~2022-02-22 15:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 18:03 [PATCH bpf-next] selftests/bpf: Fix implementation-defined behavior in sk_lookup test Jakub Sitnicki
2022-02-21 21:39 ` Ilya Leoshkevich
2022-02-22  0:43   ` Ilya Leoshkevich
2022-02-22  2:22     ` Ilya Leoshkevich
2022-02-22 14:53       ` Jakub Sitnicki [this message]
2022-02-22 17:42         ` Ilya Leoshkevich
2022-02-22 18:24           ` Jakub Sitnicki
2022-02-22 21:51             ` Ilya Leoshkevich
2022-02-25 10:01               ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fsoakjj6.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=iii@linux.ibm.com \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.