From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63C26158DD0 for ; Fri, 24 Jan 2025 18:23:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737743000; cv=none; b=sPuzw2GAxA50mjcl3fiL/kjjd3+AqeqQU2XjYTRyNU0NvA6Ur6VqOmQieMH7M4PrQsxYXy3eHDoz3FhyyE/EPPdpR3KbgX54XIYK0MsMnkskweDbsUBZZjgo3oF9LNsL72dKcX9UErl0wx9YOolXBv1nFBpS0Z3MgknzQ3fJ2UY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737743000; c=relaxed/simple; bh=y3iIsWeWGHo0ltkA9gy7eXOdJFOX+uwzZPJtLIXikVU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Xe53mjtVNvVRgpQuWTDEs+4GASakMgwxjCyLm9yScUwxfZ4hS/lH0oUSx2EcVy8WvYSoaOCbtksYJ7IPdXsA3iO6t4oFTuWBZZidn5uKPESyRjWnQv1QtPLkzlaUy/7P0Em1ByXYsH/fTf2sTiuwMcT2sBPhZRwTmhzEvIumJBM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com; spf=pass smtp.mailfrom=rivosinc.com; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b=xcsJZOn3; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b="xcsJZOn3" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-21669fd5c7cso43849655ad.3 for ; Fri, 24 Jan 2025 10:23:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1737742997; x=1738347797; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=TmhY1Ao+VmFSDo1BmTvqptG8gc/CxBrrxfn8sS/HsAc=; b=xcsJZOn3IIPnhUO+FKM6173rYtDEfFtu0NnBouU0FVmm0wZedBlSn7cLq10DQGUgHB 5kQf/Lm5ExyuzehXE7E+44lyEIokVCsYKqUouRtEYsFrRDKOZroBMTUBryjul2ZqGDgO DFkdR2JSGAB+7Hnvfzc8TVs2v5MpHAkOGWwQZda/oRhEtEYTeTuI32RgN5BksbO8D0ec RiZ+jmoaApstJgmRBeIDWEqcNIrJvjezJ4+P6Jyd9HJa1GP2KEheQifYCU0uBSJeTht7 6ZimJ5cwMGEsNULrtzrAJ6MsD5XDayyj9Q1FTwGGPuVx7L+zyyXCpC25EBQQZo5l5+eX N5NA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737742997; x=1738347797; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TmhY1Ao+VmFSDo1BmTvqptG8gc/CxBrrxfn8sS/HsAc=; b=G8JEk0qI1WrlKb4SYy0fHavb2pK0EQYoQPEPFfRZ4XC9xNXdGq68oJzi5cNru87gvQ D8J16DSVx3egsSv9UidfepBAa4CscGJCIgeaCVHtZqHO9T8m12QGQ5PSEmtU1jgcrN9S rTxjfE1lk8tS2w/Q+zmhDXIVl2e9UrwUkheANCBBdIGtteJHnovfOWgQHkDx+XyRvMoh HHDZSWy86mc60RSETmb4rCD+J+uG/Mrnb50hpMV2IXcMK7PcaEwF3IgKsc3VicXOFZ+l JaCbty8abzLn1iy8t1s1l3lcN2CgDX1e8CTB/blbGWy7PK2ixffMqLnFQWgadg6ObXHi BdOA== X-Forwarded-Encrypted: i=1; AJvYcCWR4Iz8Nn8CxDGBWKD8s5oJGYAPyA6a/fbBfg+u9+D6HhzEqx/ejnmYny3j7uDaamEEQiHPJPbK1BdH0iI=@vger.kernel.org X-Gm-Message-State: AOJu0Yy8JR7vQ7UGQ0Gb8ITgeMbsa67JjCFhFY+6BpGbkJ5zhCueDqhO bRAircLXM/udEql9AebuGG4FrC9A1NA5/a9vlHcCqK8u++2DjwfBB/87qPKxfQE= X-Gm-Gg: ASbGncvJOMlEQ7cSMEIsdDv74xMpPtJxSQ13iGalYZeJNcIM8o5yp0v7OkbFQS5zsi0 2VqVcPOLp2uSso4YJjoJqePtRbdVbGfQvu3YvobzXsWY80MqhH1jIhCKIdl0by/UQqVB0GPAQ1L 3+Anp2Xq3CEA/mY5EaaJyfzKu4ko2TnFUUA28s/c2+v9OEQbQ/DwGY666VprTXvWkq11qgg2A00 FrARJRsfm3ZfW5/g/CH0v89V15tcD2jq6xnHKBOu87ZLJa6c28HHL94hdIafA+VANunU5tJRbU= X-Google-Smtp-Source: AGHT+IE6FKbO9CSqOPIUS3pPhPqeJw2iYJFTBLHuyDPPhL8KY4QiJJLGFriia3fJ1HeHTCC8w5ilww== X-Received: by 2002:a17:902:ea03:b0:215:b33b:e26d with SMTP id d9443c01a7336-21c3550c5d3mr520132085ad.21.1737742997486; Fri, 24 Jan 2025 10:23:17 -0800 (PST) Received: from ghost ([2601:647:6700:64d0:63b:8ac9:503a:b81d]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21da413f474sm19579925ad.130.2025.01.24.10.23.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Jan 2025 10:23:16 -0800 (PST) Date: Fri, 24 Jan 2025 10:23:14 -0800 From: Charlie Jenkins To: Brian Gerst Cc: Paul Walmsley , Palmer Dabbelt , Huacai Chen , WANG Xuerui , Thomas Gleixner , Peter Zijlstra , Andy Lutomirski , Alexandre Ghiti , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev Subject: Re: [PATCH v2 1/4] riscv: entry: Convert ret_from_fork() to C Message-ID: References: <20250123-riscv_optimize_entry-v2-0-7c259492d508@rivosinc.com> <20250123-riscv_optimize_entry-v2-1-7c259492d508@rivosinc.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Fri, Jan 24, 2025 at 08:14:08AM -0500, Brian Gerst wrote: > On Thu, Jan 23, 2025 at 2:15 PM Charlie Jenkins wrote: > > > > Move the main section of ret_from_fork() to C to allow inlining of > > syscall_exit_to_user_mode(). > > > > Signed-off-by: Charlie Jenkins > > --- > > arch/riscv/include/asm/asm-prototypes.h | 1 + > > arch/riscv/kernel/entry.S | 15 ++++++--------- > > arch/riscv/kernel/process.c | 14 ++++++++++++-- > > 3 files changed, 19 insertions(+), 11 deletions(-) > > > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h > > index cd627ec289f163a630b73dd03dd52a6b28692997..733ff609778797001006c33bba9e3cc5b1f15387 100644 > > --- a/arch/riscv/include/asm/asm-prototypes.h > > +++ b/arch/riscv/include/asm/asm-prototypes.h > > @@ -52,6 +52,7 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s); > > DECLARE_DO_ERROR_INFO(do_trap_ecall_m); > > DECLARE_DO_ERROR_INFO(do_trap_break); > > > > +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs); > > asmlinkage void handle_bad_stack(struct pt_regs *regs); > > asmlinkage void do_page_fault(struct pt_regs *regs); > > asmlinkage void do_irq(struct pt_regs *regs); > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S > > index 33a5a9f2a0d4e1eeccfb3621b9e518b88e1b0704..9225c322279aa90e737b1d7144db084319cf8103 100644 > > --- a/arch/riscv/kernel/entry.S > > +++ b/arch/riscv/kernel/entry.S > > @@ -319,17 +319,14 @@ SYM_CODE_END(handle_kernel_stack_overflow) > > ASM_NOKPROBE(handle_kernel_stack_overflow) > > #endif > > > > -SYM_CODE_START(ret_from_fork) > > +SYM_CODE_START(ret_from_fork_asm) > > call schedule_tail > > - beqz s0, 1f /* not from kernel thread */ > > - /* Call fn(arg) */ > > - move a0, s1 > > - jalr s0 > > -1: > > - move a0, sp /* pt_regs */ > > - call syscall_exit_to_user_mode > > + move a0, s1 /* fn */ > > + move a1, s0 /* fn_arg */ > > + move a2, sp /* pt_regs */ > > + call ret_from_fork > > j ret_from_exception > > -SYM_CODE_END(ret_from_fork) > > +SYM_CODE_END(ret_from_fork_asm) > > > > #ifdef CONFIG_IRQ_STACKS > > /* > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c > > index 58b6482c2bf662bf5224ca50c8e21a68760a6b41..0d07e6d8f6b57beba438dbba5e8c74a014582bee 100644 > > --- a/arch/riscv/kernel/process.c > > +++ b/arch/riscv/kernel/process.c > > @@ -17,7 +17,9 @@ > > #include > > #include > > #include > > +#include > > > > +#include > > #include > > #include > > #include > > @@ -36,7 +38,7 @@ unsigned long __stack_chk_guard __read_mostly; > > EXPORT_SYMBOL(__stack_chk_guard); > > #endif > > > > -extern asmlinkage void ret_from_fork(void); > > +extern asmlinkage void ret_from_fork_asm(void); > > > > void noinstr arch_cpu_idle(void) > > { > > @@ -206,6 +208,14 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) > > return 0; > > } > > > > +asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs) > > +{ > > + if (unlikely(fn)) > > + fn(fn_arg); > > + > > + syscall_exit_to_user_mode(regs); > > +} > > + > > int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > > { > > unsigned long clone_flags = args->flags; > > @@ -242,7 +252,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > > p->thread.riscv_v_flags = 0; > > if (has_vector()) > > riscv_v_thread_alloc(p); > > - p->thread.ra = (unsigned long)ret_from_fork; > > + p->thread.ra = (unsigned long)ret_from_fork_asm; > > p->thread.sp = (unsigned long)childregs; /* kernel sp */ > > return 0; > > } > > > > -- > > 2.43.0 > > > > > > Is there a specific reason you didn't move the call to schedule_tail() > to the C function, like on x86? Yes, the generated code ends up being dramatically worse if schedule_tail() is moved into C. This is because the arg for schedule_tail() is already in a0 so the extra stack manipulation instructions end up taking up a lot of instructions. With this change: : ff65b097 auipc ra,0xff65b 1ee080e7 jalr 494(ra) # ffffffff8005038a 8526 mv a0,s1 85a2 mv a1,s0 860a mv a2,sp ff61b097 auipc ra,0xff61b 606080e7 jalr 1542(ra) # ffffffff800107b0 b5f5 j ffffffff809f509e : 1101 addi sp,sp,-32 e822 sd s0,16(sp) ec06 sd ra,24(sp) 1000 addi s0,sp,32 e991 bnez a1,ffffffff800107cc 8532 mv a0,a2 009db097 auipc ra,0x9db a32080e7 jalr -1486(ra) # ffffffff809eb1ee 60e2 ld ra,24(sp) 6442 ld s0,16(sp) 6105 addi sp,sp,32 8082 ret fec43423 sd a2,-24(s0) 9582 jalr a1 fe843603 ld a2,-24(s0) 8532 mv a0,a2 009db097 auipc ra,0x9db a16080e7 jalr -1514(ra) # ffffffff809eb1ee 60e2 ld ra,24(sp) 6442 ld s0,16(sp) 6105 addi sp,sp,32 8082 ret Contrasted with what this looks like if schedule_tail() is called from C. : 85a6 mv a1,s1 8622 mv a2,s0 868a mv a3,sp ff61b097 auipc ra,0xff61b 60e080e7 jalr 1550(ra) # ffffffff800107b0 bdd5 j ffffffff809f509e : 7179 addi sp,sp,-48 f022 sd s0,32(sp) ec26 sd s1,24(sp) e84a sd s2,16(sp) e44e sd s3,8(sp) f406 sd ra,40(sp) 1800 addi s0,sp,48 84b2 mv s1,a2 89ae mv s3,a1 8936 mv s2,a3 3c73f0ef jal ffffffff8005038a ec89 bnez s1,ffffffff800107e2 854a mv a0,s2 009db097 auipc ra,0x9db a22080e7 jalr -1502(ra) # ffffffff809eb1ee 70a2 ld ra,40(sp) 7402 ld s0,32(sp) 64e2 ld s1,24(sp) 6942 ld s2,16(sp) 69a2 ld s3,8(sp) 6145 addi sp,sp,48 8082 ret 854e mv a0,s3 9482 jalr s1 b7d5 j ffffffff800107ca ret_from_fork_asm ends up being 2 instructions more when calling from asm, but the user fork ret_from_fork ends up being only 12 instructions rather than 22 instructions when calling from C. If we were able to mix asm and C code in a naked function we would be able to get rid of the stack manipulation and still be able to inline C but we don't live in that world... - Charlie > > > Brian Gerst