From: Mark Rutland <mark.rutland@arm.com>
To: Xu Kuohai <xukuohai@huawei.com>
Cc: bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-kselftest@vger.kernel.org,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
Ingo Molnar <mingo@redhat.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <ast@kernel.org>,
Zi Shen Lim <zlim.lnx@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
"David S . Miller" <davem@davemloft.net>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
David Ahern <dsahern@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, hpa@zytor.com, Shuah Khan <shuah@kernel.org>,
Jakub Kicinski <kuba@kernel.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Ard Biesheuvel <ardb@kernel.org>,
Daniel Kiss <daniel.kiss@arm.com>,
Steven Price <steven.price@arm.com>,
Sudeep Holla <sudeep.holla@arm.com>,
Marc Zyngier <maz@kernel.org>,
Peter Collingbourne <pcc@google.com>,
Mark Brown <broonie@kernel.org>, Delyan Kratunov <delyank@fb.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Wang ShaoBo <bobo.shaobowang@huawei.com>,
cj.chengjian@huawei.com, huawei.libin@huawei.com,
xiexiuqi@huawei.com, liwei391@huawei.com
Subject: Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
Date: Mon, 6 Jun 2022 17:35:57 +0100 [thread overview]
Message-ID: <Yp4s7eNGvb2CNtPp@FVFF77S0Q05N.cambridge.arm.com> (raw)
In-Reply-To: <40fda0b0-0efc-ea1b-96d5-e51a4d1593dd@huawei.com>
On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote:
> On 5/26/2022 6:06 PM, Mark Rutland wrote:
> > On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
> >> On 5/25/2022 9:38 PM, Mark Rutland wrote:
> >>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> >>>> Add ftrace direct support for arm64.
> >>>>
> >>>> 1. When there is custom trampoline only, replace the fentry nop to a
> >>>> jump instruction that jumps directly to the custom trampoline.
> >>>>
> >>>> 2. When ftrace trampoline and custom trampoline coexist, jump from
> >>>> fentry to ftrace trampoline first, then jump to custom trampoline
> >>>> when ftrace trampoline exits. The current unused register
> >>>> pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
> >>>> trampoline to custom trampoline.
> >>>
> >>> For those of us not all that familiar with BPF, can you explain *why* you want
> >>> this? The above explains what the patch implements, but not why that's useful.
> >>>
> >>> e.g. is this just to avoid the overhead of the ops list processing in the
> >>> regular ftrace code, or is the custom trampoline there to allow you to do
> >>> something special?
> >>
> >> IIUC, ftrace direct call was designed to *remove* the unnecessary
> >> overhead of saving regs completely [1][2].
> >
> > Ok. My plan is to get rid of most of the register saving generally, so I think
> > that aspect can be solved without direct calls.
> Looking forward to your new solution.
For the register saving rework, I have a WIP branch on my kernel.org repo:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/minimal-regs
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/minimal-regs
I'm working on that at the moment along with a per-callsite ops implementaiton
that would avoid most of the need for custom trampolines (and work with branch
range limitations):
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/per-callsite-ops
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/per-callsite-ops
> >> [1]
> >> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
> >> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/
> >>
> >> This patch itself is just a variant of [3].
> >>
> >> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/
> >>
> >>>
> >>> There is another patch series on the list from some of your colleagues which
> >>> uses dynamic trampolines to try to avoid that ops list overhead, and it's not
> >>> clear to me whether these are trying to solve the largely same problem or
> >>> something different. That other thread is at:
> >>>
> >>> https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
> >>>
> >>> ... and I've added the relevant parties to CC here, since there doesn't seem to
> >>> be any overlap in the CC lists of the two threads.
> >>
> >> We're not working to solve the same problem. The trampoline introduced
> >> in this series helps us to monitor kernel function or another bpf prog
> >> with bpf, and also helps us to use bpf prog like a normal kernel
> >> function pointer.
> >
> > Ok, but why is it necessary to have a special trampoline?
> >
> > Is that *just* to avoid overhead, or do you need to do something special that
> > the regular trampoline won't do?
> >
>
> Sorry for not explaining the problem. The main bpf prog accepts only a
> single argument 'ctx' in r1, so to allow kernel code to call bpf prog
> transparently, we need a trampoline to convert native calling convention
> into BPF calling convention [1].
>
> [1] https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org/
Thanks for the pointer; I'll go page that in.
> For example,
>
> SEC("struct_ops/dctcp_state")
> void BPF_PROG(dctcp_state, struct sock *sk, __u8 new_state)
> {
> // do something
> }
>
> The above bpf prog will be compiled to something like this:
>
> dctcp_state:
> r2 = *(u64 *)(r1 + 8) // new_state
> r1 = *(u64 *)(r1 + 0) // sk
> ...
>
> It accepts only one argument 'ctx' in r1, and loads the actual arugment
> 'sk' and 'new_state' from r1 + 0 and r1 + 8, resepectively. So before
> calling this prog, we need to construct 'ctx' and store its address to r1.
>
> >>>
> >>> In that other thread I've suggested a general approach we could follow at:
> >>>
> >>> https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
> >>
> >> Is it possible for a kernel function to take a long jump to common
> >> trampoline when we get a huge kernel image?
> >
> > It is possible, but only where the kernel Image itself is massive and the .text
> > section exceeeds 128MiB, at which point other things break anyway. Practically
> > speaking, this doesn't happen for production kernels, or reasonable test
> > kernels.
>
> So even for normal kernel functions, we need some way to construct and
> destruct long jumps atomically and safely.
My point was that case is unrealistic for production kernels, and utterly
broken anyway (and as below I intend to make ftrace detect this and mark itself
as broken).
FWIW, an allmodconfig kernel built with GCC 12.1.0 has a ~30MB .text segment,
so for realistic kernels we have plenty of headroom for normal functions to
reach the in-kernel trampoline.
> > I've been meaning to add some logic to detect this at boot time and idsable
> > ftrace (or at build time), since live patching would also be broken in that
> > case.
> >>>> As noted in that thread, I have a few concerns which equally apply here:
> >>>
> >>> * Due to the limited range of BL instructions, it's not always possible to
> >>> patch an ftrace call-site to branch to an arbitrary trampoline. The way this
> >>> works for ftrace today relies upon knowingthe set of trampolines at
> >>> compile-time, and allocating module PLTs for those, and that approach cannot
> >>> work reliably for dynanically allocated trampolines.
> >>
> >> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
> >> bpf trampoline is constructed for out of range patch-site:
> >>
> >> if (is_long_jump(orig_call, image))
> >> return -ENOTSUPP;
> >
> > Sure, my point is that in practice that means that (from the user's PoV) this
> > may randomly fail to work, and I'd like something that we can ensure works
> > consistently.
> >
>
> OK, should I suspend this work until you finish refactoring ftrace?
Yes; I'd appreciate if we could hold on this for a bit.
I think with some ground work we can avoid most of the painful edge cases and
might be able to avoid the need for custom trampolines.
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-06-06 16:37 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-18 13:16 [PATCH bpf-next v5 0/6] bpf trampoline for arm64 Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support Xu Kuohai
2022-05-23 1:39 ` KP Singh
2022-05-25 13:38 ` Mark Rutland
2022-05-26 9:45 ` Xu Kuohai
2022-05-26 10:06 ` Mark Rutland
2022-05-26 14:48 ` Xu Kuohai
2022-06-06 16:35 ` Mark Rutland [this message]
2022-06-09 4:27 ` Xu Kuohai
2022-08-09 17:03 ` Florent Revest
2022-08-10 8:10 ` Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 2/6] ftrace: Fix deadloop caused by direct call in ftrace selftest Xu Kuohai
2022-05-25 13:43 ` Mark Rutland
2022-05-26 9:45 ` Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags() Xu Kuohai
2022-05-25 13:45 ` Mark Rutland
2022-05-26 9:45 ` Xu Kuohai
2022-05-26 10:12 ` Mark Rutland
2022-05-26 14:46 ` Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64 Xu Kuohai
2022-05-23 1:41 ` KP Singh
2022-05-25 14:10 ` Mark Rutland
2022-05-26 9:45 ` Xu Kuohai
2022-05-26 10:34 ` Mark Rutland
2022-05-26 14:47 ` Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline " Xu Kuohai
2022-05-20 21:18 ` Alexei Starovoitov
2022-05-23 16:09 ` Mark Rutland
2022-05-23 1:36 ` KP Singh
2022-05-18 13:16 ` [PATCH bpf-next v5 6/6] selftests/bpf: Fix trivial typo in fentry_fexit.c Xu Kuohai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yp4s7eNGvb2CNtPp@FVFF77S0Q05N.cambridge.arm.com \
--to=mark.rutland@arm.com \
--cc=andrii@kernel.org \
--cc=ardb@kernel.org \
--cc=ast@kernel.org \
--cc=bobo.shaobowang@huawei.com \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cj.chengjian@huawei.com \
--cc=daniel.kiss@arm.com \
--cc=daniel@iogearbox.net \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=delyank@fb.com \
--cc=dsahern@kernel.org \
--cc=hawk@kernel.org \
--cc=hpa@zytor.com \
--cc=huawei.libin@huawei.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=liwei391@huawei.com \
--cc=maz@kernel.org \
--cc=memxor@gmail.com \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=pcc@google.com \
--cc=rostedt@goodmis.org \
--cc=shuah@kernel.org \
--cc=songliubraving@fb.com \
--cc=steven.price@arm.com \
--cc=sudeep.holla@arm.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=xiexiuqi@huawei.com \
--cc=xukuohai@huawei.com \
--cc=yhs@fb.com \
--cc=yoshfuji@linux-ipv6.org \
--cc=zlim.lnx@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox