From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1509F1DB121 for ; Fri, 24 Jan 2025 07:53:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737705205; cv=none; b=u7HK1o0HQgav3lW4BNXsd/JS7S3f2a1zpPN7ER41oMzJ9TQEkfIJ/Y0IuORWSuTnrj3wTBq7RSzDxCkQ8w2maBappJBLjh56t0YNO7HfDnxh2a3KRqudKCByDffb69YTmyuBhrRNE9YbNh6fpRepqFoaaj8DRsBIelZ3q/2ZQlY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737705205; c=relaxed/simple; bh=msYJij1VXYle5WNOAAnf0IBmpKOoFO//xfnWo3osKhY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=doXhNpVgEok1FtZex21JUnR8AQND2kap5EusqwHydKDgZMO4ecrINdYf6jZQWNlAlYmUPbFniuiM43I9QbK9iAAnwxKceMHBymO1487WXnr8B7bE801HCkZpmyDGPcQ2QolEw3b4do/IzywG85MPAN2+FFuXoQl0SJLNzknv/hE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com; spf=pass smtp.mailfrom=rivosinc.com; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b=XSmwfHq/; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b="XSmwfHq/" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2167141dfa1so33415635ad.1 for ; Thu, 23 Jan 2025 23:53:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1737705202; x=1738310002; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=1D6xSUMx6yY9EExV7q645xrMDEE4y5ZutPMhix8rPCw=; b=XSmwfHq/Bn+u1s42JBQpHZHaKsrTCOp4+YfNLCeCgMC8o+WLwHdAUwGx6lwttrdJUR MPjrdQpIXVmAJUw77P/wyN3wFMwHQo6eb5aWLeAmibco03v61SFRrrR9HhSa99IcQuii 5pl/Sf7A8/YhsXU61OFFqGAKcdYuKzr9hqdF8AHD5CDdhhCHP3RAwla6bZtlzR0DECuS tBhP2NDEV7znGHWjdFgNWJXjqabMOKVBJTYKTJXqfeNE/cbFvLu+KYzRRYXzk+RJhkTf FUlmlj0u8H73HHhcM3WcF447vCJ3SFTrZ7eEfARaVISix/g1ou6wYjbZJtVfaneqXz2w CHSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737705202; x=1738310002; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=1D6xSUMx6yY9EExV7q645xrMDEE4y5ZutPMhix8rPCw=; b=aJx+lWgklCbzT1gN8R2Q7lO7Qh8310s3b85Q29vRwl6IVwNHQa8iXrg0wh78Ta/zcQ ioiDhigYhbbWKZLyi5JL5YSgplx8LEV5xijMo1tOzq3wqoVVBPVTyzGiZYtKIv1pH0OB dHrM3e02Xr6ICiePeWj+aB2eEjmXQHw4WStstJlGzBaJiLyj+w+ocsr25CNTR53o9psN kdQ3uIDfm0GnmTG4c62sw8qOssIEItvZa7EpKArRdCQp3MqTaXNd2ZbWoR7P6xy1pLIv 0CAdp6wfBOp24Yw5HahBtWPIQYgzA/gXv8S3ik+qO6gb25s9ymN2khlJ5+7RNRQw2JSC B0RQ== X-Forwarded-Encrypted: i=1; AJvYcCXQhT8yDWQf/RZGyG8wVQfbTE7dtj7it2oyS/MYuMAwU9ZQoOT/Y5dnKclXBRwkIXt37NkxIB/sPbL1Hrs=@vger.kernel.org X-Gm-Message-State: AOJu0YzgtiwUEkPSpp6e7abMwbzZIxZEgvPyE2gwLiEHNiLcwamfrzNt Ap+NLy8KQTIPyYsPK4RAJlvZHMm0YG2HCwGm3bGmmoRP9SOROgeDPM9PD07eX0Y= X-Gm-Gg: ASbGncs1SJHh94uvMYqp7ZHhtm1AJqsa9Pw8Qchk4AV31rAvFL2Fc/ojHxHzvxYBkhw f/Xmk45oDrKFkPphNNhUSuP3id5hiyRmH1DfzPJQzPxd6WXNzfyhYahYMjaMbuZT6QOM7BqjK1u pjQV9gZscsRc5n7ElzqpEf54WcfodcvApESG9S1kV8qoXNnO90hmkwXWMI1mhP1JjRDOFFo64kh YlrJ5ZKT9L13LG0OQoyffXKhYTh19eVaqOe/wWePh43ZEsv3zgWKDTTWJrn1+i09OQMnUlKeBpw X-Google-Smtp-Source: AGHT+IHHGxH/HpkilKLbQogSJE/TNcVINRyD5SPXZk/okYZGkzNq3qPlVSt7eLNhXfhsaqVlbYLkJg== X-Received: by 2002:a17:902:fc4e:b0:215:8847:4377 with SMTP id d9443c01a7336-21d9936e91bmr100372865ad.15.1737705202331; Thu, 23 Jan 2025 23:53:22 -0800 (PST) Received: from ghost ([2601:647:6700:64d0:db66:c77e:13aa:17a4]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21da424e7a6sm10521215ad.231.2025.01.23.23.53.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 23:53:21 -0800 (PST) Date: Thu, 23 Jan 2025 23:53:19 -0800 From: Charlie Jenkins To: Alexandre Ghiti Cc: Paul Walmsley , Palmer Dabbelt , Huacai Chen , WANG Xuerui , Thomas Gleixner , Peter Zijlstra , Andy Lutomirski , Alexandre Ghiti , linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev Subject: Re: [PATCH v2 2/4] riscv: entry: Split ret_from_fork() into user and kernel Message-ID: References: <20250123-riscv_optimize_entry-v2-0-7c259492d508@rivosinc.com> <20250123-riscv_optimize_entry-v2-2-7c259492d508@rivosinc.com> <6ebe8b7c-7df3-4bea-9175-35512e9960a7@ghiti.fr> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6ebe8b7c-7df3-4bea-9175-35512e9960a7@ghiti.fr> On Fri, Jan 24, 2025 at 08:19:18AM +0100, Alexandre Ghiti wrote: > Hi Charlie, > > On 23/01/2025 20:14, Charlie Jenkins wrote: > > This function was unified into a single function in commit ab9164dae273 > > ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork"). > > However that imposed a performance degradation. Partially reverting this > > commit to have ret_from_fork() split again results in a 1% increase on > > the number of times fork is able to be called per second. > > > > Signed-off-by: Charlie Jenkins > > --- > > arch/riscv/include/asm/asm-prototypes.h | 3 ++- > > arch/riscv/kernel/entry.S | 13 ++++++++++--- > > arch/riscv/kernel/process.c | 17 +++++++++++------ > > 3 files changed, 23 insertions(+), 10 deletions(-) > > > > diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h > > index 733ff609778797001006c33bba9e3cc5b1f15387..bfc8ea5f9319b19449ec59493b45b926df888832 100644 > > --- a/arch/riscv/include/asm/asm-prototypes.h > > +++ b/arch/riscv/include/asm/asm-prototypes.h > > @@ -52,7 +52,8 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_s); > > DECLARE_DO_ERROR_INFO(do_trap_ecall_m); > > DECLARE_DO_ERROR_INFO(do_trap_break); > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs); > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs); > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs); > > asmlinkage void handle_bad_stack(struct pt_regs *regs); > > asmlinkage void do_page_fault(struct pt_regs *regs); > > asmlinkage void do_irq(struct pt_regs *regs); > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S > > index 9225c322279aa90e737b1d7144db084319cf8103..9386ef7444267f0b9bf8a0550f4e31deaeb85881 100644 > > --- a/arch/riscv/kernel/entry.S > > +++ b/arch/riscv/kernel/entry.S > > @@ -319,14 +319,21 @@ SYM_CODE_END(handle_kernel_stack_overflow) > > ASM_NOKPROBE(handle_kernel_stack_overflow) > > #endif > > -SYM_CODE_START(ret_from_fork_asm) > > +SYM_CODE_START(ret_from_fork_kernel_asm) > > call schedule_tail > > move a0, s1 /* fn */ > > move a1, s0 /* fn_arg */ > > move a2, sp /* pt_regs */ > > - call ret_from_fork > > + call ret_from_fork_kernel > > j ret_from_exception > > -SYM_CODE_END(ret_from_fork_asm) > > +SYM_CODE_END(ret_from_fork_kernel_asm) > > + > > +SYM_CODE_START(ret_from_fork_user_asm) > > + call schedule_tail > > + move a0, sp /* pt_regs */ > > + call ret_from_fork_user > > + j ret_from_exception > > +SYM_CODE_END(ret_from_fork_user_asm) > > #ifdef CONFIG_IRQ_STACKS > > /* > > diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c > > index 0d07e6d8f6b57beba438dbba5e8c74a014582bee..5f15236cb526bd9fe61636ed372b4b76c94df946 100644 > > --- a/arch/riscv/kernel/process.c > > +++ b/arch/riscv/kernel/process.c > > @@ -38,7 +38,8 @@ unsigned long __stack_chk_guard __read_mostly; > > EXPORT_SYMBOL(__stack_chk_guard); > > #endif > > -extern asmlinkage void ret_from_fork_asm(void); > > +extern asmlinkage void ret_from_fork_kernel_asm(void); > > +extern asmlinkage void ret_from_fork_user_asm(void); > > void noinstr arch_cpu_idle(void) > > { > > @@ -208,14 +209,18 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) > > return 0; > > } > > -asmlinkage void ret_from_fork(void *fn_arg, int (*fn)(void *), struct pt_regs *regs) > > +asmlinkage void ret_from_fork_kernel(void *fn_arg, int (*fn)(void *), struct pt_regs *regs) > > { > > - if (unlikely(fn)) > > - fn(fn_arg); > > + fn(fn_arg); > > syscall_exit_to_user_mode(regs); > > } > > +asmlinkage void ret_from_fork_user(struct pt_regs *regs) > > +{ > > + syscall_exit_to_user_mode(regs); > > +} > > + > > int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > > { > > unsigned long clone_flags = args->flags; > > @@ -238,6 +243,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > > p->thread.s[0] = (unsigned long)args->fn; > > p->thread.s[1] = (unsigned long)args->fn_arg; > > + p->thread.ra = (unsigned long)ret_from_fork_kernel_asm; > > } else { > > *childregs = *(current_pt_regs()); > > /* Turn off status.VS */ > > @@ -247,12 +253,11 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) > > if (clone_flags & CLONE_SETTLS) > > childregs->tp = tls; > > childregs->a0 = 0; /* Return value of fork() */ > > - p->thread.s[0] = 0; > > + p->thread.ra = (unsigned long)ret_from_fork_user_asm; > > } > > p->thread.riscv_v_flags = 0; > > if (has_vector()) > > riscv_v_thread_alloc(p); > > - p->thread.ra = (unsigned long)ret_from_fork_asm; > > p->thread.sp = (unsigned long)childregs; /* kernel sp */ > > return 0; > > } > > > > Can you benchmark this change on some HW? I'm not sure we would indeed gain > this 1%. It reduces the syscall path by 3 instructions, two for not needing to move the fn and fn_args from: move a0, s1 /* fn */ move a1, s0 /* fn_arg */ And one for not needing to do the conditional. This one is also saved on kernel threads. It's a very small improvement, but there is only something like 100 instructions along the direct syscall path so it ends up being a large percentage. On hardware moving registers is very cheap and this branch will be almost always be correctly predicted so the cost is close to zero. I just figured that since I am making changes around here it would be nice if it was optimal instead of being close to optimal. - Charlie > > Thanks, > > Alex >