netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Xu Kuohai <xukuohai@huawei.com>
Cc: bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Zi Shen Lim <zlim.lnx@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, hpa@zytor.com, Shuah Khan <shuah@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Daniel Kiss <daniel.kiss@arm.com>,
	Steven Price <steven.price@arm.com>,
	Sudeep Holla <sudeep.holla@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Peter Collingbourne <pcc@google.com>,
	Mark Brown <broonie@kernel.org>, Delyan Kratunov <delyank@fb.com>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>
Subject: Re: [PATCH bpf-next v3 4/7] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
Date: Mon, 16 May 2022 08:18:32 +0100	[thread overview]
Message-ID: <YoH6yAtmzPQtWiFM@FVFF77S0Q05N> (raw)
In-Reply-To: <264ecbe1-4514-d6c8-182b-3af4babb457e@huawei.com>

On Mon, May 16, 2022 at 02:55:46PM +0800, Xu Kuohai wrote:
> On 5/13/2022 10:59 PM, Mark Rutland wrote:
> > On Sun, Apr 24, 2022 at 11:40:25AM -0400, Xu Kuohai wrote:
> >> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
> >> it to replace nop with jump, or replace jump with nop.
> >>
> >> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> >> Acked-by: Song Liu <songliubraving@fb.com>
> >> ---
> >>  arch/arm64/net/bpf_jit_comp.c | 63 +++++++++++++++++++++++++++++++++++
> >>  1 file changed, 63 insertions(+)
> >>
> >> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> >> index 8ab4035dea27..3f9bdfec54c4 100644
> >> --- a/arch/arm64/net/bpf_jit_comp.c
> >> +++ b/arch/arm64/net/bpf_jit_comp.c
> >> @@ -9,6 +9,7 @@
> >>  
> >>  #include <linux/bitfield.h>
> >>  #include <linux/bpf.h>
> >> +#include <linux/memory.h>
> >>  #include <linux/filter.h>
> >>  #include <linux/printk.h>
> >>  #include <linux/slab.h>
> >> @@ -18,6 +19,7 @@
> >>  #include <asm/cacheflush.h>
> >>  #include <asm/debug-monitors.h>
> >>  #include <asm/insn.h>
> >> +#include <asm/patching.h>
> >>  #include <asm/set_memory.h>
> >>  
> >>  #include "bpf_jit.h"
> >> @@ -1529,3 +1531,64 @@ void bpf_jit_free_exec(void *addr)
> >>  {
> >>  	return vfree(addr);
> >>  }
> >> +
> >> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
> >> +			     void *addr, u32 *insn)
> >> +{
> >> +	if (!addr)
> >> +		*insn = aarch64_insn_gen_nop();
> >> +	else
> >> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
> >> +						    (unsigned long)addr,
> >> +						    type);
> >> +
> >> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
> >> +}
> >> +
> >> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
> >> +		       void *old_addr, void *new_addr)
> >> +{
> >> +	int ret;
> >> +	u32 old_insn;
> >> +	u32 new_insn;
> >> +	u32 replaced;
> >> +	enum aarch64_insn_branch_type branch_type;
> >> +
> >> +	if (!is_bpf_text_address((long)ip))
> >> +		/* Only poking bpf text is supported. Since kernel function
> >> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
> >> +		 * functions. For kernel funcitons, bpf_arch_text_poke() is only
> >> +		 * called after a failed poke with ftrace. In this case, there
> >> +		 * is probably something wrong with fentry, so there is nothing
> >> +		 * we can do here. See register_fentry, unregister_fentry and
> >> +		 * modify_fentry for details.
> >> +		 */
> >> +		return -EINVAL;
> > 
> > If you rely on ftrace to poke functions, why do you need to patch text
> > at all? Why does the rest of this function exist?
> > 
> > I really don't like having another piece of code outside of ftrace
> > patching the ftrace patch-site; this needs a much better explanation.
> > 
> 
> Sorry for the incorrect explaination in the comment. I don't think it's
> reasonable to patch ftrace patch-site without ftrace code either.
> 
> The patching logic in register_fentry, unregister_fentry and
> modify_fentry is as follows:
> 
> if (tr->func.ftrace_managed)
>         ret = register_ftrace_direct((long)ip, (long)new_addr);
> else
>         ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr,
>                                  true);
> 
> ftrace patch-site is patched by ftrace code. bpf_arch_text_poke() is
> only used to patch bpf prog and bpf trampoline, which are not managed by
> ftrace.

Sorry, I had misunderstood. Thanks for the correction!

I'll have another look with that in mind.

> >> +
> >> +	if (poke_type == BPF_MOD_CALL)
> >> +		branch_type = AARCH64_INSN_BRANCH_LINK;
> >> +	else
> >> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
> >> +
> >> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
> >> +		return -EFAULT;
> >> +
> >> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
> >> +		return -EFAULT;
> >> +
> >> +	mutex_lock(&text_mutex);
> >> +	if (aarch64_insn_read(ip, &replaced)) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	if (replaced != old_insn) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	ret = aarch64_insn_patch_text_nosync((void *)ip, new_insn);
> > 
> > ... and where does the actual synchronization come from in this case?
> 
> aarch64_insn_patch_text_nosync() replaces an instruction atomically, so
> no other CPUs will fetch a half-new and half-old instruction.
> 
> The scenario here is that there is a chance that another CPU fetches the
> old instruction after bpf_arch_text_poke() finishes, that is, different
> CPUs may execute different versions of instructions at the same time.
> 
> 1. When a new trampoline is attached, it doesn't seem to be an issue for
> different CPUs to jump to different trampolines temporarily.
>
> 2. When an old trampoline is freed, we should wait for all other CPUs to
> exit the trampoline and make sure the trampoline is no longer reachable,
> IIUC, bpf_tramp_image_put() function already uses percpu_ref and rcu
> tasks to do this.

It would be good to have a comment for these points.

Thanks,
Mark.

  reply	other threads:[~2022-05-16  7:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-24 15:40 [PATCH bpf-next v3 0/7] bpf trampoline for arm64 Xu Kuohai
2022-04-24 15:40 ` [PATCH bpf-next v3 1/7] arm64: ftrace: Add ftrace direct call support Xu Kuohai
2022-04-24 15:40 ` [PATCH bpf-next v3 2/7] ftrace: Fix deadloop caused by direct call in ftrace selftest Xu Kuohai
2022-04-25 15:05   ` Steven Rostedt
2022-04-26  7:36     ` Xu Kuohai
2022-04-24 15:40 ` [PATCH bpf-next v3 3/7] bpf: Move is_valid_bpf_tramp_flags() to the public trampoline code Xu Kuohai
2022-04-24 15:40 ` [PATCH bpf-next v3 4/7] bpf, arm64: Impelment bpf_arch_text_poke() for arm64 Xu Kuohai
2022-05-10 11:45   ` Jakub Sitnicki
2022-05-11  3:18     ` Xu Kuohai
2022-05-13 14:59   ` Mark Rutland
2022-05-16  6:55     ` Xu Kuohai
2022-05-16  7:18       ` Mark Rutland [this message]
2022-05-16  7:58         ` Xu Kuohai
2022-04-24 15:40 ` [PATCH bpf-next v3 5/7] bpf, arm64: Support to poke bpf prog Xu Kuohai
2022-05-10  9:36   ` Jakub Sitnicki
2022-05-11  3:12     ` Xu Kuohai
2022-05-12 10:54       ` Jakub Sitnicki
2022-04-24 15:40 ` [PATCH bpf-next v3 6/7] bpf, arm64: bpf trampoline for arm64 Xu Kuohai
2022-04-24 15:40 ` [PATCH bpf-next v3 7/7] selftests/bpf: Fix trivial typo in fentry_fexit.c Xu Kuohai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YoH6yAtmzPQtWiFM@FVFF77S0Q05N \
    --to=mark.rutland@arm.com \
    --cc=andrii@kernel.org \
    --cc=ardb@kernel.org \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=daniel.kiss@arm.com \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=delyank@fb.com \
    --cc=dsahern@kernel.org \
    --cc=hawk@kernel.org \
    --cc=hpa@zytor.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=memxor@gmail.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=pcc@google.com \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=steven.price@arm.com \
    --cc=sudeep.holla@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=xukuohai@huawei.com \
    --cc=yhs@fb.com \
    --cc=yoshfuji@linux-ipv6.org \
    --cc=zlim.lnx@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).