From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B281C4332F for ; Mon, 16 May 2022 07:58:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241392AbiEPH6v (ORCPT ); Mon, 16 May 2022 03:58:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232688AbiEPH6o (ORCPT ); Mon, 16 May 2022 03:58:44 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 653D22AC4F; Mon, 16 May 2022 00:58:41 -0700 (PDT) Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4L1s2L3zc3zCskB; Mon, 16 May 2022 15:53:46 +0800 (CST) Received: from [10.67.111.192] (10.67.111.192) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 16 May 2022 15:58:38 +0800 Message-ID: <06b33393-8af5-9faa-6faa-acb5111865f6@huawei.com> Date: Mon, 16 May 2022 15:58:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH bpf-next v3 4/7] bpf, arm64: Impelment bpf_arch_text_poke() for arm64 Content-Language: en-US To: Mark Rutland CC: , , , , , Catalin Marinas , Will Deacon , Steven Rostedt , Ingo Molnar , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S . Miller" , Hideaki YOSHIFUJI , David Ahern , Thomas Gleixner , Borislav Petkov , Dave Hansen , , , Shuah Khan , Jakub Kicinski , Jesper Dangaard Brouer , Pasha Tatashin , Ard Biesheuvel , Daniel Kiss , Steven Price , Sudeep Holla , Marc Zyngier , Peter Collingbourne , Mark Brown , Delyan Kratunov , Kumar Kartikeya Dwivedi References: <20220424154028.1698685-1-xukuohai@huawei.com> <20220424154028.1698685-5-xukuohai@huawei.com> <264ecbe1-4514-d6c8-182b-3af4babb457e@huawei.com> From: Xu Kuohai In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.111.192] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 5/16/2022 3:18 PM, Mark Rutland wrote: > On Mon, May 16, 2022 at 02:55:46PM +0800, Xu Kuohai wrote: >> On 5/13/2022 10:59 PM, Mark Rutland wrote: >>> On Sun, Apr 24, 2022 at 11:40:25AM -0400, Xu Kuohai wrote: >>>> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use >>>> it to replace nop with jump, or replace jump with nop. >>>> >>>> Signed-off-by: Xu Kuohai >>>> Acked-by: Song Liu >>>> --- >>>> arch/arm64/net/bpf_jit_comp.c | 63 +++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 63 insertions(+) >>>> >>>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c >>>> index 8ab4035dea27..3f9bdfec54c4 100644 >>>> --- a/arch/arm64/net/bpf_jit_comp.c >>>> +++ b/arch/arm64/net/bpf_jit_comp.c >>>> @@ -9,6 +9,7 @@ >>>> >>>> #include >>>> #include >>>> +#include >>>> #include >>>> #include >>>> #include >>>> @@ -18,6 +19,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> #include >>>> >>>> #include "bpf_jit.h" >>>> @@ -1529,3 +1531,64 @@ void bpf_jit_free_exec(void *addr) >>>> { >>>> return vfree(addr); >>>> } >>>> + >>>> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip, >>>> + void *addr, u32 *insn) >>>> +{ >>>> + if (!addr) >>>> + *insn = aarch64_insn_gen_nop(); >>>> + else >>>> + *insn = aarch64_insn_gen_branch_imm((unsigned long)ip, >>>> + (unsigned long)addr, >>>> + type); >>>> + >>>> + return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT; >>>> +} >>>> + >>>> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type, >>>> + void *old_addr, void *new_addr) >>>> +{ >>>> + int ret; >>>> + u32 old_insn; >>>> + u32 new_insn; >>>> + u32 replaced; >>>> + enum aarch64_insn_branch_type branch_type; >>>> + >>>> + if (!is_bpf_text_address((long)ip)) >>>> + /* Only poking bpf text is supported. Since kernel function >>>> + * entry is set up by ftrace, we reply on ftrace to poke kernel >>>> + * functions. For kernel funcitons, bpf_arch_text_poke() is only >>>> + * called after a failed poke with ftrace. In this case, there >>>> + * is probably something wrong with fentry, so there is nothing >>>> + * we can do here. See register_fentry, unregister_fentry and >>>> + * modify_fentry for details. >>>> + */ >>>> + return -EINVAL; >>> >>> If you rely on ftrace to poke functions, why do you need to patch text >>> at all? Why does the rest of this function exist? >>> >>> I really don't like having another piece of code outside of ftrace >>> patching the ftrace patch-site; this needs a much better explanation. >>> >> >> Sorry for the incorrect explaination in the comment. I don't think it's >> reasonable to patch ftrace patch-site without ftrace code either. >> >> The patching logic in register_fentry, unregister_fentry and >> modify_fentry is as follows: >> >> if (tr->func.ftrace_managed) >> ret = register_ftrace_direct((long)ip, (long)new_addr); >> else >> ret = bpf_arch_text_poke(ip, BPF_MOD_CALL, NULL, new_addr, >> true); >> >> ftrace patch-site is patched by ftrace code. bpf_arch_text_poke() is >> only used to patch bpf prog and bpf trampoline, which are not managed by >> ftrace. > > Sorry, I had misunderstood. Thanks for the correction! > > I'll have another look with that in mind. >>>>> + >>>> + if (poke_type == BPF_MOD_CALL) >>>> + branch_type = AARCH64_INSN_BRANCH_LINK; >>>> + else >>>> + branch_type = AARCH64_INSN_BRANCH_NOLINK; >>>> + >>>> + if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0) >>>> + return -EFAULT; >>>> + >>>> + if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0) >>>> + return -EFAULT; >>>> + >>>> + mutex_lock(&text_mutex); >>>> + if (aarch64_insn_read(ip, &replaced)) { >>>> + ret = -EFAULT; >>>> + goto out; >>>> + } >>>> + >>>> + if (replaced != old_insn) { >>>> + ret = -EFAULT; >>>> + goto out; >>>> + } >>>> + >>>> + ret = aarch64_insn_patch_text_nosync((void *)ip, new_insn); >>> >>> ... and where does the actual synchronization come from in this case? >> >> aarch64_insn_patch_text_nosync() replaces an instruction atomically, so >> no other CPUs will fetch a half-new and half-old instruction. >> >> The scenario here is that there is a chance that another CPU fetches the >> old instruction after bpf_arch_text_poke() finishes, that is, different >> CPUs may execute different versions of instructions at the same time. >> >> 1. When a new trampoline is attached, it doesn't seem to be an issue for >> different CPUs to jump to different trampolines temporarily. >> >> 2. When an old trampoline is freed, we should wait for all other CPUs to >> exit the trampoline and make sure the trampoline is no longer reachable, >> IIUC, bpf_tramp_image_put() function already uses percpu_ref and rcu >> tasks to do this. > > It would be good to have a comment for these points> will add a comment for this in v4, thanks! > Thanks, > Mark. > .