public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Ilya Leoshkevich <iii@linux.ibm.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	bpf@vger.kernel.org, Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>
Subject: Re: [PATCH bpf-next 00/24] Support bpf trampoline for s390x
Date: Fri, 27 Jan 2023 23:50:55 +0100	[thread overview]
Message-ID: <bed8d554582c171765083c849750af37656aa253.camel@linux.ibm.com> (raw)
In-Reply-To: <CAEf4Bzb3uiSHtUbgVWmkWuJ5Sw1UZd4c_iuS4QXtUkXmTTtXuQ@mail.gmail.com>

On Fri, 2023-01-27 at 09:24 -0800, Andrii Nakryiko wrote:
> On Fri, Jan 27, 2023 at 8:51 AM Ilya Leoshkevich <iii@linux.ibm.com>
> wrote:
> > 
> > On Wed, 2023-01-25 at 16:45 -0800, Andrii Nakryiko wrote:
> > > On Wed, Jan 25, 2023 at 1:39 PM Ilya Leoshkevich
> > > <iii@linux.ibm.com>
> > > wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > This series implements poke, trampoline, kfunc, mixing subprogs
> > > > and
> > > > tailcalls, and fixes a number of tests on s390x.
> > > > 
> > > > The following failures still remain:
> > > > 
> > > > #52      core_read_macros:FAIL
> > > > Uses BPF_PROBE_READ(), shouldn't there be
> > > > BPF_PROBE_READ_KERNEL()?
> > > 
> > > BPF_PROBE_READ(), similarly to BPF_CORE_READ() both use
> > > bpf_probe_read_kernel() internally, as it's most common use case.
> > > We
> > > have separate BPF_PROBE_READ_USER() and BPF_CORE_READ_USER() for
> > > when
> > > bpf_probe_read_user() has to be used.
> > 
> > At least purely from the code perspective, BPF_PROBE_READ() seems
> > to
> > delegate to bpf_probe_read(). The following therefore helps with
> > this
> > test:
> > 
> > --- a/tools/lib/bpf/bpf_core_read.h
> > +++ b/tools/lib/bpf/bpf_core_read.h
> > @@ -364,7 +364,7 @@ enum bpf_enum_value_kind {
> > 
> >  /* Non-CO-RE variant of BPF_CORE_READ_INTO() */
> >  #define BPF_PROBE_READ_INTO(dst, src, a, ...) ({
> > \
> > -       ___core_read(bpf_probe_read, bpf_probe_read,
> > \
> > +       ___core_read(bpf_probe_read_kernel, bpf_probe_read_kernel,
> > \
> >                      dst, (src), a, ##__VA_ARGS__)
> > \
> >  })
> > 
> > @@ -400,7 +400,7 @@ enum bpf_enum_value_kind {
> > 
> >  /* Non-CO-RE variant of BPF_CORE_READ_STR_INTO() */
> >  #define BPF_PROBE_READ_STR_INTO(dst, src, a, ...) ({
> > \
> > -       ___core_read(bpf_probe_read_str, bpf_probe_read,
> > \
> > +       ___core_read(bpf_probe_read_kernel_str,
> > bpf_probe_read_kernel,
> > \
> >                      dst, (src), a, ##__VA_ARGS__)
> > \
> >  })
> > 
> > but I'm not sure if there are backward compatibility concerns, or
> > if
> > libbpf is supposed to rewrite this when
> > !ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
> 
> Oh, this is just a bug, it was an omission when we converted
> BPF_CORE_READ to bpf_probe_read_kernel. If you look at bpf_core_read
> definition, it is using bpf_probe_read_kernel, which is also the
> intent for BPF_PROBE_READ. Let's fix this.
> 
> And there is no backwards compat concerns because libbpf will
> automatically convert calls to bpf_probe_read_{kernel,user}[_str] to
> bpf_probe_read[_str] if host kernel doesn't yet support kernel or
> user
> specific variants.

Thanks for confirming! I will include the fix in v2.

> > > > #82      get_stack_raw_tp:FAIL
> > > > get_stack_print_output:FAIL:user_stack corrupted user stack
> > > > Known issue:
> > > > We cannot reliably unwind userspace on s390x without DWARF.
> > > 
> > > like in principle, or frame pointers (or some equivalent) needs
> > > to be
> > > configured for this to work?
> > > 
> > > Asking also in the context of [0], where s390x was excluded. If
> > > there
> > > is actually a way to enable frame pointer-based stack unwinding
> > > on
> > > s390x, would be nice to hear from an expert.
> > > 
> > >   [0] https://pagure.io/fesco/issue/2923
> > 
> > For DWARFless unwinding we have -mbackchain (not to be confused
> > with
> > -fno-omit-frame-pointer, which we also have, but which just hurts
> > performance without providing tangible benefits).
> > -mbackchain has a few problems though:
> > 
> > - It's not atomic. Here is a typical prologue with -mbackchain:
> > 
> >         1: stmg    %r11,%r15,88(%r15)  # save non-volatile
> > registers
> >         2: lgr     %r14,%r15           # %r14 = sp
> >         3: lay     %r15,-160(%r15)     # sp -= 160
> >         4: stg     %r14,0(%r15)        # *(void **)sp = %r14
> > 
> >   The invariant here is that *(void **)%r15 is always a pointer to
> > the
> >   next frame. This means that if we unwind from (4), we are totally
> >   broken. This does not happen if we unwind from any other
> > instruction,
> >   but still.
> > 
> > - Unwinding from (1)-(3) is not particularly good either. PSW
> > points to
> >   the callee, but R15 points to the caller's frame.
> > 
> > - Unwinding leaf functions is like the previous problem, but worse:
> >   they often do not establish a stack frame at all, so PSW and R15
> > are
> >   out of sync for the entire duration of the call.
> > 
> > Therefore .eh_frame-based unwinding is preferred, since it covers
> > all
> > these corner cases and is therefore reliable. From what I
> > understand,
> > adding .eh_frame unwinding to the kernel is not desirable. In an
> > internal discussion we had another idea though: would it be
> > possible to
> > have an eBPF program that does .eh_frame parsing and unwinding? I
> > understand that it can be technically challenging at the moment,
> > but
> > the end result would not be exploitable by crafting malicious
> > .eh_frame sections, won't loop endlessly, will have good
> > performance,
> > etc.
> 
> Thanks for details. This was all discussed at length in Fedora
> -fno-omit-frame-pointer discussion I linked above, so no real need to
> go over this again. .eh_frame-based unwinding on BPF side is
> possible,
> but only for processes that you knew about (and preprocessed) before
> you started profiling session. Pre-processing is memory and
> cpu-intensive operation on busy systems, and they will miss any
> processes started during profiling. So as a general approach for
> system-wide profiling it leaves a lot to be desired.

Thanks for the explanation, I'll read the thread and come back if I
get some new ideas not listed here.

> Should we enable -mbackchain in our CI for s390x to be able to
> capture
> stack traces (even if on some instructions they might be incomplete
> or
> outright broken)?

Let's do it, I don't have anything against this.

[...]
> > > 
> 

> > Here is the full log:
> > 
> > https://gist.github.com/iii-i/8e20100c33ab6f0dffb5e6e51d1330e8
> > 
> > Apparently we do indeed lose a constraint established by
> > if (hdr->tcp_len < sizeof(*hdr->tcp)). But the naive
> 
> The test is too big and unfamiliar for me to figure this out. And the
> problem is not upper bound, but lower bound. hdr->tcp_len is not
> proven to be strictly greater than zero, which is what verifier
> complains about. Not sure how it works on other arches right now.
> 
> 
> But I see that bpf_csum_diff defines size arguments as
> ARG_CONST_SIZE_OR_ZERO while bpf_tcp_raw_gen_syncookie_ipv4 has
> ARG_CONST_SIZE. I generally found ARG_CONST_SIZE way too problematic
> in practice, I'd say we should change it to ARG_CONST_SIZE_OR_ZERO.

Yes, this helps, and doesn't seem to introduce issues, since the 
minimum length is enforced inside this function anyway. I will include
the change for bpf_tcp_raw_gen_syncookie_ipv{4,6} in v2; I guess some
other functions may benefit from this as well, but this is outside the
scope of this series.

[...]

      reply	other threads:[~2023-01-27 22:51 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 21:37 [PATCH bpf-next 00/24] Support bpf trampoline for s390x Ilya Leoshkevich
2023-01-25 21:37 ` [PATCH bpf-next 01/24] selftests/bpf: Fix liburandom_read.so linker error Ilya Leoshkevich
2023-01-26  1:07   ` Andrii Nakryiko
2023-01-26 13:30     ` Ilya Leoshkevich
2023-01-25 21:37 ` [PATCH bpf-next 02/24] selftests/bpf: Fix symlink creation error Ilya Leoshkevich
2023-01-25 21:37 ` [PATCH bpf-next 03/24] selftests/bpf: Fix fexit_stress on s390x Ilya Leoshkevich
2023-01-25 21:37 ` [PATCH bpf-next 04/24] selftests/bpf: Fix trampoline_count " Ilya Leoshkevich
2023-01-25 21:37 ` [PATCH bpf-next 05/24] selftests/bpf: Fix kfree_skb " Ilya Leoshkevich
2023-01-25 21:37 ` [PATCH bpf-next 06/24] selftests/bpf: Set errno when urand_spawn() fails Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 07/24] selftests/bpf: Fix decap_sanity_ns cleanup Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 08/24] selftests/bpf: Fix verify_pkcs7_sig on s390x Ilya Leoshkevich
2023-01-26  1:06   ` Andrii Nakryiko
2023-01-27 12:36     ` Ilya Leoshkevich
2023-01-27 17:26       ` Andrii Nakryiko
2023-01-25 21:38 ` [PATCH bpf-next 09/24] selftests/bpf: Fix xdp_do_redirect " Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 10/24] selftests/bpf: Fix cgrp_local_storage " Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 11/24] selftests/bpf: Check stack_mprotect() return value Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 12/24] selftests/bpf: Increase SIZEOF_BPF_LOCAL_STORAGE_ELEM on s390x Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 13/24] selftests/bpf: Add a sign-extension test for kfuncs Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 14/24] selftests/bpf: Fix test_lsm on s390x Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 15/24] selftests/bpf: Fix test_xdp_adjust_tail_grow2 " Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 16/24] selftests/bpf: Fix vmlinux test " Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 17/24] libbpf: Read usdt arg spec with bpf_probe_read_kernel() Ilya Leoshkevich
2023-01-26  0:26   ` Andrii Nakryiko
2023-01-26 11:41     ` Ilya Leoshkevich
2023-01-26 19:03       ` Andrii Nakryiko
2023-01-27 11:01         ` Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 18/24] s390/bpf: Fix a typo in a comment Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 19/24] s390/bpf: Add expoline to tail calls Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 20/24] s390/bpf: Implement bpf_arch_text_poke() Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 21/24] bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 22/24] s390/bpf: Implement arch_prepare_bpf_trampoline() Ilya Leoshkevich
2023-01-26  1:15   ` Andrii Nakryiko
2023-01-26 14:30     ` Ilya Leoshkevich
2023-01-26 19:06       ` Andrii Nakryiko
2023-01-27 11:15         ` Ilya Leoshkevich
2023-01-27 17:30           ` Andrii Nakryiko
2023-01-25 21:38 ` [PATCH bpf-next 23/24] s390/bpf: Implement bpf_jit_supports_subprog_tailcalls() Ilya Leoshkevich
2023-01-25 21:38 ` [PATCH bpf-next 24/24] s390/bpf: Implement bpf_jit_supports_kfunc_call() Ilya Leoshkevich
2023-01-26  1:28   ` Alexei Starovoitov
2023-01-27 11:36     ` Ilya Leoshkevich
2023-01-27 16:04       ` Alexei Starovoitov
2023-01-26  0:45 ` [PATCH bpf-next 00/24] Support bpf trampoline for s390x Andrii Nakryiko
2023-01-27 16:51   ` Ilya Leoshkevich
2023-01-27 17:24     ` Andrii Nakryiko
2023-01-27 22:50       ` Ilya Leoshkevich [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bed8d554582c171765083c849750af37656aa253.camel@linux.ibm.com \
    --to=iii@linux.ibm.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox