From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-oo1-f47.google.com (mail-oo1-f47.google.com [209.85.161.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 598473398B for ; Fri, 18 Jul 2025 14:34:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752849249; cv=none; b=L7KMbJpP2qOsjFAoTVc6zsw116OhiJueU5WqjitkM39UOA5YeLYM5wlkxeMxN7wm2ZNeSrY0lHlL6/wY9gUGkkm0PkOUzqFsbKoAb1g6UJXleHL7hOCZ17wvvPIBYmOEa2rwe0s6qHaNXGwUKkpZtWi1ATRXNxU1F9YAZcYIIEg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752849249; c=relaxed/simple; bh=iz1GmugjNYaAjPevS3vCG9GNc21f7NJwOlozcRQg+gg=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=orx6ETakyJ9EIZ0SZtzpKHP4xi/n8IrV67ITchjX+QiIZ+C0HS4PmFrYz9vzLmrz4hJtJZLe7V55vaE0jNh8I3j9+bzTq3t/2qU59SSRwxanwo9AkINoVWIcsKxiQIGPBFrx8X72HQzaZ/pIqMR1Bw3a0UppcYSlLFJk3YNprxc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=LhvIlzGO; arc=none smtp.client-ip=209.85.161.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="LhvIlzGO" Received: by mail-oo1-f47.google.com with SMTP id 006d021491bc7-615913ff67fso1365971eaf.0 for ; Fri, 18 Jul 2025 07:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1752849246; x=1753454046; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=bWzhDhdiFr0536BRqzyoEm8becCS9ZtrzmuFNx2GoNA=; b=LhvIlzGOzOi2aKKaxbxkILOyBYmFHvY4fLRBli6Wrz+iFpCKwBaXW43dRrAJjddudb hstzHT76T71geDFg4yldLg75aYxnimDl3FwpNs1tRR9kG+uuULr6F7479kbM3v7yNLD0 vwGIX21QghcGKvJMuTR/aVL59RQzg3XypdjlnjwKbvvEYdikCsXvYROPVXEgbN34xHFx Iouzaq5xWTnELOI9WtMUCFIMVFU4T1Z/4c+tXY4PGA/fQM8W4T90mBS6CWkreGb+H5dk ZmKlPxkzdnsdtagRNcYuTphT9HIyLSk1J+wEW3KUlWXv8S5rrM3ZMsuat7yanpbb4sqY ZSsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752849246; x=1753454046; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bWzhDhdiFr0536BRqzyoEm8becCS9ZtrzmuFNx2GoNA=; b=d1PUZuWL0TpIl0VyeA6xgPqRwyRdeOLa4x3AuBEoHT1Y0AwDn0oqqhu45iQuylulvL lkCa6Mbjx38VbBhvFGWhnEcjL8XeSp7kHRT+xeZhP+2ZPxbjr6VZIRnDilTrO0zrMFhe KCaWCyAbl1bj8OQSxE6vVTpvMMBDdTbq2+FEEsUMDxwIrDidxnupQ6vfWPZHbHCl9jR4 eZcrxpN4dpnMmjRlTWjuG2NRHB/3F/EEXi9cSwyLI0wBr2ExkHQRHE6fcs//GmGFaUXX F9a0t5QxDWj9TKenbwjCD9sPQZ8oZAzvjLKTviFh+WBakP1Z/HWNGAtJauiImrrqCSVz EbbQ== X-Forwarded-Encrypted: i=1; AJvYcCViNGVxhjIP0iAHjDy0+9Cs6vC2VIMKRVz/hFL4CF8F5Cm+JtsR+vEROqmADEE3Wg/lfizfOLBZKLL2Lqw=@vger.kernel.org X-Gm-Message-State: AOJu0YzX5iemPCZCk+pZ8IWD2+fwwen312LwbVLck32LcWY6R8Xh+fep +B9otrYidMBoj98unSC5hJPtxUoGnIuZmpSZXFnoLkYV9xcr2Rr+hE1E55hiLjkADJox15Q+IR7 6I0MO0uD06aiJ/wEm4PzDD4RcLlE1d0GUA7dbRvoazQ== X-Gm-Gg: ASbGnctM+/2msDo9XlQNnnOEuZ1XcSHMcGRHtla0JWr9G85ffp1TCcJvgIVldP/YyOv JOsSWmjOjsBYU9n73p7fxMl4jvy8LbStp+G/T4Xk853dWJuvLPx5Upvt6wueHwQ7dCpnlW7n7HS xsoRFLOrw6s5n9LOmnbSde0MyhFEnGm4MPS7tRXT2PHFq/nI/z9D/kmteiKndmuU2vHYz5R6LcL JEZhgl/SA== X-Google-Smtp-Source: AGHT+IEHkurpOPtUp6OR3kn00/uY/HZ2UhpECPN3W/Tg9e2NmMIkABeCYHxhO9+UZXpYExhWb/47mYBBDH6C49i9MLg= X-Received: by 2002:a05:6820:4de7:b0:611:adce:2cb4 with SMTP id 006d021491bc7-615accb4d53mr4933477eaf.3.1752849246312; Fri, 18 Jul 2025 07:34:06 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20250618034328.21904-1-cuiyunhui@bytedance.com> <20250618034328.21904-2-cuiyunhui@bytedance.com> <404d38d7-f21b-4c97-b851-8b331deb3f8a@ghiti.fr> In-Reply-To: <404d38d7-f21b-4c97-b851-8b331deb3f8a@ghiti.fr> From: yunhui cui Date: Fri, 18 Jul 2025 22:33:55 +0800 X-Gm-Features: Ac12FXxpxLAOCgYhedquCSjid6kKOf5qoRGghZ8OElxrVL9g9z3nooKBwWcqkL4 Message-ID: Subject: Re: [External] Re: [PATCH RFC 2/2] riscv: introduce percpu.h into include/asm To: Alexandre Ghiti Cc: yury.norov@gmail.com, linux@rasmusvillemoes.dk, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, dennis@kernel.org, tj@kernel.org, cl@gentwo.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Alex=EF=BC=8C On Fri, Jul 18, 2025 at 10:23=E2=80=AFPM Alexandre Ghiti wr= ote: > > Hi Yunhui, > > On 7/18/25 08:40, yunhui cui wrote: > > Hi Alex, > > > > On Thu, Jul 17, 2025 at 9:06=E2=80=AFPM Alexandre Ghiti = wrote: > >> On 7/17/25 15:04, Alexandre Ghiti wrote: > >>> Hi Yunhui, > >>> > >>> On 6/18/25 05:43, Yunhui Cui wrote: > >>>> Current percpu operations rely on generic implementations, where > >>>> raw_local_irq_save() introduces substantial overhead. Optimization > >>>> is achieved through atomic operations and preemption disabling. > >>>> > >>>> Signed-off-by: Yunhui Cui > >>>> --- > >>>> arch/riscv/include/asm/percpu.h | 138 +++++++++++++++++++++++++++= +++++ > >>>> 1 file changed, 138 insertions(+) > >>>> create mode 100644 arch/riscv/include/asm/percpu.h > >>>> > >>>> diff --git a/arch/riscv/include/asm/percpu.h > >>>> b/arch/riscv/include/asm/percpu.h > >>>> new file mode 100644 > >>>> index 0000000000000..423c0d01f874c > >>>> --- /dev/null > >>>> +++ b/arch/riscv/include/asm/percpu.h > >>>> @@ -0,0 +1,138 @@ > >>>> +/* SPDX-License-Identifier: GPL-2.0-only */ > >>>> + > >>>> +#ifndef __ASM_PERCPU_H > >>>> +#define __ASM_PERCPU_H > >>>> + > >>>> +#include > >>>> + > >>>> +#define PERCPU_RW_OPS(sz) \ > >>>> +static inline unsigned long __percpu_read_##sz(void *ptr) \ > >>>> +{ \ > >>>> + return READ_ONCE(*(u##sz *)ptr); \ > >>>> +} \ > >>>> + \ > >>>> +static inline void __percpu_write_##sz(void *ptr, unsigned long > >>>> val) \ > >>>> +{ \ > >>>> + WRITE_ONCE(*(u##sz *)ptr, (u##sz)val); \ > >>>> +} > >>>> + > >>>> +#define __PERCPU_AMO_OP_CASE(sfx, name, sz, amo_insn) \ > >>>> +static inline void \ > >>>> +__percpu_##name##_amo_case_##sz(void *ptr, unsigned long val) = \ > >>>> +{ \ > >>>> + asm volatile ( \ > >>>> + "amo" #amo_insn #sfx " zero, %[val], %[ptr]" \ > >>>> + : [ptr] "+A" (*(u##sz *)ptr) \ > >>>> + : [val] "r" ((u##sz)(val)) \ > >>>> + : "memory"); \ > >>>> +} > >>>> + > >>>> +#define __PERCPU_AMO_RET_OP_CASE(sfx, name, sz, amo_insn) \ > >>>> +static inline u##sz \ > >>>> +__percpu_##name##_return_amo_case_##sz(void *ptr, unsigned long > >>>> val) \ > >>>> +{ \ > >>>> + register u##sz ret; \ > >>>> + \ > >>>> + asm volatile ( \ > >>>> + "amo" #amo_insn #sfx " %[ret], %[val], %[ptr]" \ > >>>> + : [ptr] "+A" (*(u##sz *)ptr), [ret] "=3Dr" (ret) \ > >>>> + : [val] "r" ((u##sz)(val)) \ > >>>> + : "memory"); \ > >>>> + \ > >>>> + return ret + val; \ > >>>> +} > >>>> + > >>>> +#define PERCPU_OP(name, amo_insn) \ > >>>> + __PERCPU_AMO_OP_CASE(.b, name, 8, amo_insn) \ > >>>> + __PERCPU_AMO_OP_CASE(.h, name, 16, amo_insn) \ > >>>> + __PERCPU_AMO_OP_CASE(.w, name, 32, amo_insn) \ > >>>> + __PERCPU_AMO_OP_CASE(.d, name, 64, amo_insn) \ > >>>> + > >>>> +#define PERCPU_RET_OP(name, amo_insn) \ > >>>> + __PERCPU_AMO_RET_OP_CASE(.b, name, 8, amo_insn) \ > >>>> + __PERCPU_AMO_RET_OP_CASE(.h, name, 16, amo_insn) \ > >>>> + __PERCPU_AMO_RET_OP_CASE(.w, name, 32, amo_insn) \ > >>>> + __PERCPU_AMO_RET_OP_CASE(.d, name, 64, amo_insn) > >>>> + > >>>> +PERCPU_RW_OPS(8) > >>>> +PERCPU_RW_OPS(16) > >>>> +PERCPU_RW_OPS(32) > >>>> +PERCPU_RW_OPS(64) > >>>> + > >>>> +PERCPU_OP(add, add) > >>>> +PERCPU_OP(andnot, and) > >>>> +PERCPU_OP(or, or) > >>>> +PERCPU_RET_OP(add, add) > >>>> + > >>>> +#undef PERCPU_RW_OPS > >>>> +#undef __PERCPU_AMO_OP_CASE > >>>> +#undef __PERCPU_AMO_RET_OP_CASE > >>>> +#undef PERCPU_OP > >>>> +#undef PERCPU_RET_OP > >>>> + > >>>> +#define _pcp_protect(op, pcp, ...) \ > >>>> +({ \ > >>>> + preempt_disable_notrace(); \ > >>>> + op(raw_cpu_ptr(&(pcp)), __VA_ARGS__); \ > >>>> + preempt_enable_notrace(); \ > >>>> +}) > >>>> + > >>>> +#define _pcp_protect_return(op, pcp, args...) \ > >>>> +({ \ > >>>> + typeof(pcp) __retval; \ > >>>> + preempt_disable_notrace(); \ > >>>> + __retval =3D (typeof(pcp))op(raw_cpu_ptr(&(pcp)), ##args); \ > >>>> + preempt_enable_notrace(); \ > >>>> + __retval; \ > >>>> +}) > >>>> + > >>>> +#define this_cpu_read_1(pcp) _pcp_protect_return(__percpu_read_8, p= cp) > >>>> +#define this_cpu_read_2(pcp) _pcp_protect_return(__percpu_read_16, = pcp) > >>>> +#define this_cpu_read_4(pcp) _pcp_protect_return(__percpu_read_32, = pcp) > >>>> +#define this_cpu_read_8(pcp) _pcp_protect_return(__percpu_read_64, = pcp) > >>>> + > >>>> +#define this_cpu_write_1(pcp, val) _pcp_protect(__percpu_write_8, > >>>> pcp, (unsigned long)val) > >>>> +#define this_cpu_write_2(pcp, val) _pcp_protect(__percpu_write_16, > >>>> pcp, (unsigned long)val) > >>>> +#define this_cpu_write_4(pcp, val) _pcp_protect(__percpu_write_32, > >>>> pcp, (unsigned long)val) > >>>> +#define this_cpu_write_8(pcp, val) _pcp_protect(__percpu_write_64, > >>>> pcp, (unsigned long)val) > >>>> + > >>>> +#define this_cpu_add_1(pcp, val) > >>>> _pcp_protect(__percpu_add_amo_case_8, pcp, val) > >>>> +#define this_cpu_add_2(pcp, val) > >>>> _pcp_protect(__percpu_add_amo_case_16, pcp, val) > >>>> +#define this_cpu_add_4(pcp, val) > >>>> _pcp_protect(__percpu_add_amo_case_32, pcp, val) > >>>> +#define this_cpu_add_8(pcp, val) > >>>> _pcp_protect(__percpu_add_amo_case_64, pcp, val) > >>>> + > >>>> +#define this_cpu_add_return_1(pcp, val) \ > >>>> +_pcp_protect_return(__percpu_add_return_amo_case_8, pcp, val) > >>>> + > >>>> +#define this_cpu_add_return_2(pcp, val) \ > >>>> +_pcp_protect_return(__percpu_add_return_amo_case_16, pcp, val) > >>>> + > >>>> +#define this_cpu_add_return_4(pcp, val) \ > >>>> +_pcp_protect_return(__percpu_add_return_amo_case_32, pcp, val) > >>>> + > >>>> +#define this_cpu_add_return_8(pcp, val) \ > >>>> +_pcp_protect_return(__percpu_add_return_amo_case_64, pcp, val) > >>>> + > >>>> +#define this_cpu_and_1(pcp, val) > >>>> _pcp_protect(__percpu_andnot_amo_case_8, pcp, ~val) > >>>> +#define this_cpu_and_2(pcp, val) > >>>> _pcp_protect(__percpu_andnot_amo_case_16, pcp, ~val) > >>>> +#define this_cpu_and_4(pcp, val) > >>>> _pcp_protect(__percpu_andnot_amo_case_32, pcp, ~val) > >>>> +#define this_cpu_and_8(pcp, val) > >>>> _pcp_protect(__percpu_andnot_amo_case_64, pcp, ~val) > >>> > >>> Why do we define __percpu_andnot based on amoand, and use > >>> __percpu_andnot with ~val here? Can't we just define __percpu_and? > > > What about that ^? > > > >>> > >>>> + > >>>> +#define this_cpu_or_1(pcp, val) _pcp_protect(__percpu_or_amo_case_8= , > >>>> pcp, val) > >>>> +#define this_cpu_or_2(pcp, val) > >>>> _pcp_protect(__percpu_or_amo_case_16, pcp, val) > >>>> +#define this_cpu_or_4(pcp, val) > >>>> _pcp_protect(__percpu_or_amo_case_32, pcp, val) > >>>> +#define this_cpu_or_8(pcp, val) > >>>> _pcp_protect(__percpu_or_amo_case_64, pcp, val) > >>>> + > >>>> +#define this_cpu_xchg_1(pcp, val) _pcp_protect_return(xchg_relaxed, > >>>> pcp, val) > >>>> +#define this_cpu_xchg_2(pcp, val) _pcp_protect_return(xchg_relaxed, > >>>> pcp, val) > >>>> +#define this_cpu_xchg_4(pcp, val) _pcp_protect_return(xchg_relaxed, > >>>> pcp, val) > >>>> +#define this_cpu_xchg_8(pcp, val) _pcp_protect_return(xchg_relaxed, > >>>> pcp, val) > >>>> + > >>>> +#define this_cpu_cmpxchg_1(pcp, o, n) > >>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > >>>> +#define this_cpu_cmpxchg_2(pcp, o, n) > >>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > >>>> +#define this_cpu_cmpxchg_4(pcp, o, n) > >>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > >>>> +#define this_cpu_cmpxchg_8(pcp, o, n) > >>>> _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) > >>>> + > >>>> +#include > >>>> + > >>>> +#endif /* __ASM_PERCPU_H */ > >>> > >>> It all looks good to me, just one thing, can you also implement > >>> this_cpu_cmpxchg64/128()? > >>> > >> One last thing sorry, can you add a cover letter too? > > Okay. > > > >> Thanks! > >> > >> Alex > >> > >> > >>> And since this is almost a copy/paste from arm64, either mention it a= t > >>> the top of the file or (better) merge both implementations somewhere > >>> to avoid redefining existing code :) But up to you. > > Actually, there's a concern here. We should account for scenarios > > where ZABHA isn't supported. Given that xxx_8() and xxx_16() are > > rarely used in practice, could we initially support only xxx_32() and > > xxx_64()? For xxx_8() and xxx_16(), we could default to the generic > > implementation. > > > Why isn't lr/sc enough? If I'm not mistaken, the current RISC-V does not support lr.bh/sc.bh, is that right? > > > > > > > >>> Reviewed-by: Alexandre Ghiti > >>> > >>> Thanks, > >>> > >>> Alex > >>> > >>> > >>> > > Thanks, > > Yunhui > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv Thanks, Yunhui