* Re: [PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k [not found] ` <20230910082911.3378782-5-guoren@kernel.org> @ 2023-09-11 2:35 ` Waiman Long [not found] ` <CAJF2gTSbUUdLhN8PFdFzQd0M1T2MVOL1cdZn46WKq1S8MuQYHw@mail.gmail.com> 0 siblings, 1 reply; 9+ messages in thread From: Waiman Long @ 2023-09-11 2:35 UTC (permalink / raw) To: guoren, paul.walmsley, anup, peterz, mingo, will, palmer, boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas, conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook, greentime.hu, ajones, jszhang, wefu, wuwei2016, leobras Cc: linux-arch, Guo Ren, kvm, linux-doc, linux-csky, virtualization, linux-riscv On 9/10/23 04:28, guoren@kernel.org wrote: > From: Guo Ren <guoren@linux.alibaba.com> > > The target of xchg_tail is to write the tail to the lock value, so > adding prefetchw could help the next cmpxchg step, which may > decrease the cmpxchg retry loops of xchg_tail. Some processors may > utilize this feature to give a forward guarantee, e.g., RISC-V > XuanTie processors would block the snoop channel & irq for several > cycles when prefetch.w instruction (from Zicbop extension) retired, > which guarantees the next cmpxchg succeeds. > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > Signed-off-by: Guo Ren <guoren@kernel.org> > --- > kernel/locking/qspinlock.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index d3f99060b60f..96b54e2ade86 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -223,7 +223,10 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) > */ > static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) > { > - u32 old, new, val = atomic_read(&lock->val); > + u32 old, new, val; > + > + prefetchw(&lock->val); > + val = atomic_read(&lock->val); > > for (;;) { > new = (val & _Q_LOCKED_PENDING_MASK) | tail; That looks a bit weird. You pre-fetch and then immediately read it. How much performance gain you get by this change alone? Maybe you can define an arch specific primitive that default back to atomic_read() if not defined. Cheers, Longman _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CAJF2gTSbUUdLhN8PFdFzQd0M1T2MVOL1cdZn46WKq1S8MuQYHw@mail.gmail.com>]
* Re: [PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k [not found] ` <CAJF2gTSbUUdLhN8PFdFzQd0M1T2MVOL1cdZn46WKq1S8MuQYHw@mail.gmail.com> @ 2023-09-11 13:03 ` Waiman Long [not found] ` <CAJF2gTQ3Q7f+FGorVTR66c6TGWsSeeKVvLF+LH1_m3kSHrm0yA@mail.gmail.com> 0 siblings, 1 reply; 9+ messages in thread From: Waiman Long @ 2023-09-11 13:03 UTC (permalink / raw) To: Guo Ren Cc: Guo Ren, kvm, linux-doc, peterz, catalin.marinas, bjorn, palmer, virtualization, conor.dooley, jszhang, linux-riscv, will, keescook, linux-arch, anup, linux-csky, xiaoguang.xing, mingo, greentime.hu, ajones, alexghiti, paulmck, boqun.feng, rostedt, leobras, paul.walmsley, tglx, rdunlap, wuwei2016, wefu On 9/10/23 23:09, Guo Ren wrote: > On Mon, Sep 11, 2023 at 10:35 AM Waiman Long <longman@redhat.com> wrote: >> >> On 9/10/23 04:28, guoren@kernel.org wrote: >>> From: Guo Ren <guoren@linux.alibaba.com> >>> >>> The target of xchg_tail is to write the tail to the lock value, so >>> adding prefetchw could help the next cmpxchg step, which may >>> decrease the cmpxchg retry loops of xchg_tail. Some processors may >>> utilize this feature to give a forward guarantee, e.g., RISC-V >>> XuanTie processors would block the snoop channel & irq for several >>> cycles when prefetch.w instruction (from Zicbop extension) retired, >>> which guarantees the next cmpxchg succeeds. >>> >>> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> >>> Signed-off-by: Guo Ren <guoren@kernel.org> >>> --- >>> kernel/locking/qspinlock.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c >>> index d3f99060b60f..96b54e2ade86 100644 >>> --- a/kernel/locking/qspinlock.c >>> +++ b/kernel/locking/qspinlock.c >>> @@ -223,7 +223,10 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) >>> */ >>> static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) >>> { >>> - u32 old, new, val = atomic_read(&lock->val); >>> + u32 old, new, val; >>> + >>> + prefetchw(&lock->val); >>> + val = atomic_read(&lock->val); >>> >>> for (;;) { >>> new = (val & _Q_LOCKED_PENDING_MASK) | tail; >> That looks a bit weird. You pre-fetch and then immediately read it. How >> much performance gain you get by this change alone? >> >> Maybe you can define an arch specific primitive that default back to >> atomic_read() if not defined. > Thx for the reply. This is a generic optimization point I would like > to talk about with you. > > First, prefetchw() makes cacheline an exclusive state and serves for > the next cmpxchg loop semantic, which writes the idx_tail part of > arch_spin_lock. The atomic_read only makes cacheline in the shared > state, which couldn't give any guarantee for the next cmpxchg loop > semantic. Micro-architecture could utilize prefetchw() to provide a > strong forward progress guarantee for the xchg_tail, e.g., the T-HEAD > XuanTie processor would hold the exclusive cacheline state until the > next cmpxchg write success. > > In the end, Let's go back to the principle: the xchg_tail is an atomic > swap operation that contains write eventually, so giving a prefetchw() > at the beginning is acceptable for all architectures.. > •••••••••••• I did realize afterward that prefetchw gets the cacheline in exclusive state. I will suggest you mention that in your commit log as well as adding a comment about its purpose in the code. Thanks, Longman >> Cheers, >> Longman >> > _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CAJF2gTQ3Q7f+FGorVTR66c6TGWsSeeKVvLF+LH1_m3kSHrm0yA@mail.gmail.com>]
[parent not found: <ZQF49GIZoFceUGYH@redhat.com>]
[parent not found: <CAJF2gTTHdCr-FQVSGUc+LapkJPmDiEYYa_1P6T86uCjRujgnTg@mail.gmail.com>]
* Re: [PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k [not found] ` <CAJF2gTTHdCr-FQVSGUc+LapkJPmDiEYYa_1P6T86uCjRujgnTg@mail.gmail.com> @ 2023-09-13 13:06 ` Waiman Long 0 siblings, 0 replies; 9+ messages in thread From: Waiman Long @ 2023-09-13 13:06 UTC (permalink / raw) To: Guo Ren, Leonardo Bras Cc: Guo Ren, kvm, linux-doc, peterz, catalin.marinas, bjorn, palmer, virtualization, conor.dooley, jszhang, linux-riscv, will, keescook, linux-arch, anup, linux-csky, xiaoguang.xing, mingo, greentime.hu, ajones, alexghiti, paulmck, boqun.feng, rostedt, paul.walmsley, tglx, rdunlap, wuwei2016, wefu On 9/13/23 08:52, Guo Ren wrote: > On Wed, Sep 13, 2023 at 4:55 PM Leonardo Bras <leobras@redhat.com> wrote: >> On Tue, Sep 12, 2023 at 09:10:08AM +0800, Guo Ren wrote: >>> On Mon, Sep 11, 2023 at 9:03 PM Waiman Long <longman@redhat.com> wrote: >>>> On 9/10/23 23:09, Guo Ren wrote: >>>>> On Mon, Sep 11, 2023 at 10:35 AM Waiman Long <longman@redhat.com> wrote: >>>>>> On 9/10/23 04:28, guoren@kernel.org wrote: >>>>>>> From: Guo Ren <guoren@linux.alibaba.com> >>>>>>> >>>>>>> The target of xchg_tail is to write the tail to the lock value, so >>>>>>> adding prefetchw could help the next cmpxchg step, which may >>>>>>> decrease the cmpxchg retry loops of xchg_tail. Some processors may >>>>>>> utilize this feature to give a forward guarantee, e.g., RISC-V >>>>>>> XuanTie processors would block the snoop channel & irq for several >>>>>>> cycles when prefetch.w instruction (from Zicbop extension) retired, >>>>>>> which guarantees the next cmpxchg succeeds. >>>>>>> >>>>>>> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> >>>>>>> Signed-off-by: Guo Ren <guoren@kernel.org> >>>>>>> --- >>>>>>> kernel/locking/qspinlock.c | 5 ++++- >>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c >>>>>>> index d3f99060b60f..96b54e2ade86 100644 >>>>>>> --- a/kernel/locking/qspinlock.c >>>>>>> +++ b/kernel/locking/qspinlock.c >>>>>>> @@ -223,7 +223,10 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock) >>>>>>> */ >>>>>>> static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail) >>>>>>> { >>>>>>> - u32 old, new, val = atomic_read(&lock->val); >>>>>>> + u32 old, new, val; >>>>>>> + >>>>>>> + prefetchw(&lock->val); >>>>>>> + val = atomic_read(&lock->val); >>>>>>> >>>>>>> for (;;) { >>>>>>> new = (val & _Q_LOCKED_PENDING_MASK) | tail; >>>>>> That looks a bit weird. You pre-fetch and then immediately read it. How >>>>>> much performance gain you get by this change alone? >>>>>> >>>>>> Maybe you can define an arch specific primitive that default back to >>>>>> atomic_read() if not defined. >>>>> Thx for the reply. This is a generic optimization point I would like >>>>> to talk about with you. >>>>> >>>>> First, prefetchw() makes cacheline an exclusive state and serves for >>>>> the next cmpxchg loop semantic, which writes the idx_tail part of >>>>> arch_spin_lock. The atomic_read only makes cacheline in the shared >>>>> state, which couldn't give any guarantee for the next cmpxchg loop >>>>> semantic. Micro-architecture could utilize prefetchw() to provide a >>>>> strong forward progress guarantee for the xchg_tail, e.g., the T-HEAD >>>>> XuanTie processor would hold the exclusive cacheline state until the >>>>> next cmpxchg write success. >>>>> >>>>> In the end, Let's go back to the principle: the xchg_tail is an atomic >>>>> swap operation that contains write eventually, so giving a prefetchw() >>>>> at the beginning is acceptable for all architectures.. >>>>> •••••••••••• >>>> I did realize afterward that prefetchw gets the cacheline in exclusive >>>> state. I will suggest you mention that in your commit log as well as >>>> adding a comment about its purpose in the code. >>> Okay, I would do that in v12, thx. >> I would suggest adding a snippet from the ISA Extenstion doc: >> >> "A prefetch.w instruction indicates to hardware that the cache block whose >> effective address is the sum of the base address specified in rs1 and the >> sign-extended offset encoded in imm[11:0], where imm[4:0] equals 0b00000, >> is likely to be accessed by a data write (i.e. store) in the near future." > Good point, thx. qspinlock is generic code. I suppose this is for the RISCV architecture. You can mention that in the commit log as an example, but I prefer more generic comment especially in the code. Cheers, Longman _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20230910082911.3378782-8-guoren@kernel.org>]
* Re: [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line [not found] ` <20230910082911.3378782-8-guoren@kernel.org> @ 2023-09-11 15:22 ` Waiman Long 2023-09-11 15:34 ` Waiman Long 1 sibling, 0 replies; 9+ messages in thread From: Waiman Long @ 2023-09-11 15:22 UTC (permalink / raw) To: guoren, paul.walmsley, anup, peterz, mingo, will, palmer, boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas, conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook, greentime.hu, ajones, jszhang, wefu, wuwei2016, leobras Cc: linux-arch, Guo Ren, kvm, linux-doc, linux-csky, virtualization, linux-riscv On 9/10/23 04:29, guoren@kernel.org wrote: > From: Guo Ren <guoren@linux.alibaba.com> > > Allow cmdline to force the kernel to use queued_spinlock when > CONFIG_RISCV_COMBO_SPINLOCKS=y. > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > Signed-off-by: Guo Ren <guoren@kernel.org> > --- > Documentation/admin-guide/kernel-parameters.txt | 2 ++ > arch/riscv/kernel/setup.c | 16 +++++++++++++++- > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 7dfb540c4f6c..61cacb8dfd0e 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -4693,6 +4693,8 @@ > [KNL] Number of legacy pty's. Overwrites compiled-in > default number. > > + qspinlock [RISCV] Force to use qspinlock or auto-detect spinlock. > + > qspinlock.numa_spinlock_threshold_ns= [NUMA, PV_OPS] > Set the time threshold in nanoseconds for the > number of intra-node lock hand-offs before the Your patch series is still based on top of numa-aware qspinlock patchset which isn't upstream yet. Please rebase it without that as that will cause merge conflict during upstream merge. Cheers, Longman _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line [not found] ` <20230910082911.3378782-8-guoren@kernel.org> 2023-09-11 15:22 ` [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line Waiman Long @ 2023-09-11 15:34 ` Waiman Long [not found] ` <CAJF2gTT2hRxgnQt+WJ9P0YBWnUaZJ1-9g3ZE9tOz_MiLSsUjwQ@mail.gmail.com> 1 sibling, 1 reply; 9+ messages in thread From: Waiman Long @ 2023-09-11 15:34 UTC (permalink / raw) To: guoren, paul.walmsley, anup, peterz, mingo, will, palmer, boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas, conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook, greentime.hu, ajones, jszhang, wefu, wuwei2016, leobras Cc: linux-arch, Guo Ren, kvm, linux-doc, linux-csky, virtualization, linux-riscv On 9/10/23 04:29, guoren@kernel.org wrote: > From: Guo Ren <guoren@linux.alibaba.com> > > Allow cmdline to force the kernel to use queued_spinlock when > CONFIG_RISCV_COMBO_SPINLOCKS=y. > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > Signed-off-by: Guo Ren <guoren@kernel.org> > --- > Documentation/admin-guide/kernel-parameters.txt | 2 ++ > arch/riscv/kernel/setup.c | 16 +++++++++++++++- > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 7dfb540c4f6c..61cacb8dfd0e 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -4693,6 +4693,8 @@ > [KNL] Number of legacy pty's. Overwrites compiled-in > default number. > > + qspinlock [RISCV] Force to use qspinlock or auto-detect spinlock. > + > qspinlock.numa_spinlock_threshold_ns= [NUMA, PV_OPS] > Set the time threshold in nanoseconds for the > number of intra-node lock hand-offs before the > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > index a447cf360a18..0f084f037651 100644 > --- a/arch/riscv/kernel/setup.c > +++ b/arch/riscv/kernel/setup.c > @@ -270,6 +270,15 @@ static void __init parse_dtb(void) > } > > #ifdef CONFIG_RISCV_COMBO_SPINLOCKS > +bool enable_qspinlock_key = false; You can use __ro_after_init qualifier for enable_qspinlock_key. BTW, this is not a static key, just a simple flag. So what is the point of the _key suffix? Cheers, Longman _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CAJF2gTT2hRxgnQt+WJ9P0YBWnUaZJ1-9g3ZE9tOz_MiLSsUjwQ@mail.gmail.com>]
[parent not found: <ZQK2-CIL9U_QdMjh@redhat.com>]
* Re: [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line [not found] ` <ZQK2-CIL9U_QdMjh@redhat.com> @ 2023-09-14 17:23 ` Waiman Long 0 siblings, 0 replies; 9+ messages in thread From: Waiman Long @ 2023-09-14 17:23 UTC (permalink / raw) To: Leonardo Bras, Guo Ren Cc: Guo Ren, kvm, linux-doc, peterz, catalin.marinas, bjorn, palmer, virtualization, conor.dooley, jszhang, linux-riscv, will, keescook, linux-arch, anup, linux-csky, xiaoguang.xing, mingo, greentime.hu, ajones, alexghiti, paulmck, boqun.feng, rostedt, paul.walmsley, tglx, rdunlap, wuwei2016, wefu On 9/14/23 03:32, Leonardo Bras wrote: > On Tue, Sep 12, 2023 at 09:08:34AM +0800, Guo Ren wrote: >> On Mon, Sep 11, 2023 at 11:34 PM Waiman Long <longman@redhat.com> wrote: >>> On 9/10/23 04:29, guoren@kernel.org wrote: >>>> From: Guo Ren <guoren@linux.alibaba.com> >>>> >>>> Allow cmdline to force the kernel to use queued_spinlock when >>>> CONFIG_RISCV_COMBO_SPINLOCKS=y. >>>> >>>> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> >>>> Signed-off-by: Guo Ren <guoren@kernel.org> >>>> --- >>>> Documentation/admin-guide/kernel-parameters.txt | 2 ++ >>>> arch/riscv/kernel/setup.c | 16 +++++++++++++++- >>>> 2 files changed, 17 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >>>> index 7dfb540c4f6c..61cacb8dfd0e 100644 >>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>> @@ -4693,6 +4693,8 @@ >>>> [KNL] Number of legacy pty's. Overwrites compiled-in >>>> default number. >>>> >>>> + qspinlock [RISCV] Force to use qspinlock or auto-detect spinlock. >>>> + >>>> qspinlock.numa_spinlock_threshold_ns= [NUMA, PV_OPS] >>>> Set the time threshold in nanoseconds for the >>>> number of intra-node lock hand-offs before the >>>> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c >>>> index a447cf360a18..0f084f037651 100644 >>>> --- a/arch/riscv/kernel/setup.c >>>> +++ b/arch/riscv/kernel/setup.c >>>> @@ -270,6 +270,15 @@ static void __init parse_dtb(void) >>>> } >>>> >>>> #ifdef CONFIG_RISCV_COMBO_SPINLOCKS >>>> +bool enable_qspinlock_key = false; >>> You can use __ro_after_init qualifier for enable_qspinlock_key. BTW, >>> this is not a static key, just a simple flag. So what is the point of >>> the _key suffix? >> Okay, I would change it to: >> bool enable_qspinlock_flag __ro_after_init = false; > IIUC, this bool / flag is used in a single file, so it makes sense for it > to be static. Being static means it does not need to be initialized to > false, as it's standard to zero-fill this areas. > > Also, since it's a bool, it does not need to be called _flag. > > I would go with: > > static bool enable_qspinlock __ro_after_init; I actually was thinking about the same suggestion to add static. Then I realized that the flag was also used in another file in a later patch. Of course, if it turns out that this flag is no longer needed outside of this file, it should be static. Cheers, Longman _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <ZUlPwQVG4OTkighB@redhat.com>]
* Re: [PATCH V11 00/17] riscv: Add Native/Paravirt qspinlock support [not found] ` <ZUlPwQVG4OTkighB@redhat.com> @ 2023-11-12 4:23 ` Guo Ren 2023-11-13 10:19 ` Leonardo Bras Soares Passos 0 siblings, 1 reply; 9+ messages in thread From: Guo Ren @ 2023-11-12 4:23 UTC (permalink / raw) To: Leonardo Bras Cc: paul.walmsley, anup, peterz, mingo, will, palmer, longman, boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas, conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook, greentime.hu, ajones, jszhang, wefu, wuwei2016, linux-arch, linux-riscv, linux-doc, kvm, virtualization, linux-csky, Guo Ren On Mon, Nov 6, 2023 at 3:42 PM Leonardo Bras <leobras@redhat.com> wrote: > > On Sun, Sep 10, 2023 at 04:28:54AM -0400, guoren@kernel.org wrote: > > From: Guo Ren <guoren@linux.alibaba.com> > > > > patch[1 - 10]: Native qspinlock > > patch[11 -17]: Paravirt qspinlock > > > > patch[4]: Add prefetchw in qspinlock's xchg_tail when cpus >= 16k > > > > This series based on: > > - [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h > > https://lore.kernel.org/linux-riscv/20230810040349.92279-2-leobras@redhat.com/ > > - [PATCH V3] asm-generic: ticket-lock: Optimize arch_spin_value_unlocked > > https://lore.kernel.org/linux-riscv/20230908154339.3250567-1-guoren@kernel.org/ > > > > I merge them into sg2042-master branch, then you could directly try it on > > sg2042 hardware platform: > > > > https://github.com/guoren83/linux/tree/sg2042-master-qspinlock-64ilp32_v5 > > > > Use sophgo_mango_ubuntu_defconfig for sg2042 64/128 cores hardware > > platform. > > > > Native qspinlock > > ================ > > > > This time we've proved the qspinlock on th1520 [1] & sg2042 [2], which > > gives stability and performance improvement. All T-HEAD processors have > > a strong LR/SC forward progress guarantee than the requirements of the > > ISA, which could satisfy the xchg_tail of native_qspinlock. Now, > > qspinlock has been run with us for more than 1 year, and we have enough > > confidence to enable it for all the T-HEAD processors. Of causes, we > > found a livelock problem with the qspinlock lock torture test from the > > CPU store merge buffer delay mechanism, which caused the queued spinlock > > becomes a dead ring and RCU warning to come out. We introduce a custom > > WRITE_ONCE to solve this. Do we need explicit ISA instruction to signal > > it? Or let hardware handle this. > > > > We've tested the patch on SOPHGO sg2042 & th1520 and passed the stress > > test on Fedora & Ubuntu & OpenEuler ... Here is the performance > > comparison between qspinlock and ticket_lock on sg2042 (64 cores): > > > > sysbench test=threads threads=32 yields=100 lock=8 (+13.8%): > > queued_spinlock 0.5109/0.00 > > ticket_spinlock 0.5814/0.00 > > > > perf futex/hash (+6.7%): > > queued_spinlock 1444393 operations/sec (+- 0.09%) > > ticket_spinlock 1353215 operations/sec (+- 0.15%) > > > > perf futex/wake-parallel (+8.6%): > > queued_spinlock (waking 1/64 threads) in 0.0253 ms (+-2.90%) > > ticket_spinlock (waking 1/64 threads) in 0.0275 ms (+-3.12%) > > > > perf futex/requeue (+4.2%): > > queued_spinlock Requeued 64 of 64 threads in 0.0785 ms (+-0.55%) > > ticket_spinlock Requeued 64 of 64 threads in 0.0818 ms (+-4.12%) > > > > System Benchmarks (+6.4%) > > queued_spinlock: > > System Benchmarks Index Values BASELINE RESULT INDEX > > Dhrystone 2 using register variables 116700.0 628613745.4 53865.8 > > Double-Precision Whetstone 55.0 182422.8 33167.8 > > Execl Throughput 43.0 13116.6 3050.4 > > File Copy 1024 bufsize 2000 maxblocks 3960.0 7762306.2 19601.8 > > File Copy 256 bufsize 500 maxblocks 1655.0 3417556.8 20649.9 > > File Copy 4096 bufsize 8000 maxblocks 5800.0 7427995.7 12806.9 > > Pipe Throughput 12440.0 23058600.5 18535.9 > > Pipe-based Context Switching 4000.0 2835617.7 7089.0 > > Process Creation 126.0 12537.3 995.0 > > Shell Scripts (1 concurrent) 42.4 57057.4 13456.9 > > Shell Scripts (8 concurrent) 6.0 7367.1 12278.5 > > System Call Overhead 15000.0 33308301.3 22205.5 > > ======== > > System Benchmarks Index Score 12426.1 > > > > ticket_spinlock: > > System Benchmarks Index Values BASELINE RESULT INDEX > > Dhrystone 2 using register variables 116700.0 626541701.9 53688.2 > > Double-Precision Whetstone 55.0 181921.0 33076.5 > > Execl Throughput 43.0 12625.1 2936.1 > > File Copy 1024 bufsize 2000 maxblocks 3960.0 6553792.9 16550.0 > > File Copy 256 bufsize 500 maxblocks 1655.0 3189231.6 19270.3 > > File Copy 4096 bufsize 8000 maxblocks 5800.0 7221277.0 12450.5 > > Pipe Throughput 12440.0 20594018.7 16554.7 > > Pipe-based Context Switching 4000.0 2571117.7 6427.8 > > Process Creation 126.0 10798.4 857.0 > > Shell Scripts (1 concurrent) 42.4 57227.5 13497.1 > > Shell Scripts (8 concurrent) 6.0 7329.2 12215.3 > > System Call Overhead 15000.0 30766778.4 20511.2 > > ======== > > System Benchmarks Index Score 11670.7 > > > > The qspinlock has a significant improvement on SOPHGO SG2042 64 > > cores platform than the ticket_lock. > > > > Paravirt qspinlock > > ================== > > > > We implemented kvm_kick_cpu/kvm_wait_cpu and add tracepoints to observe the > > behaviors. Also, introduce a new SBI extension SBI_EXT_PVLOCK (0xAB0401). If the > > name and number are approved, I will send a formal proposal to the SBI spec. > > > > Hello Guo Ren, > > Any update on this series? Found a nested virtualization problem, and I'm solving that. After that, I'll update v12. > > Thanks! > Leo > > > > Changlog: > > V11: > > - Based on Leonardo Bras's cmpxchg_small patches v5. > > - Based on Guo Ren's Optimize arch_spin_value_unlocked patch v3. > > - Remove abusing alternative framework and use jump_label instead. > > - Introduce prefetch.w to improve T-HEAD processors' LR/SC forward progress > > guarantee. > > - Optimize qspinlock xchg_tail when NR_CPUS >= 16K. > > > > V10: > > https://lore.kernel.org/linux-riscv/20230802164701.192791-1-guoren@kernel.org/ > > - Using an alternative framework instead of static_key_branch in the > > asm/spinlock.h. > > - Fixup store merge buffer problem, which causes qspinlock lock > > torture test livelock. > > - Add paravirt qspinlock support, include KVM backend > > - Add Compact NUMA-awared qspinlock support > > > > V9: > > https://lore.kernel.org/linux-riscv/20220808071318.3335746-1-guoren@kernel.org/ > > - Cleanup generic ticket-lock code, (Using smp_mb__after_spinlock as > > RCsc) > > - Add qspinlock and combo-lock for riscv > > - Add qspinlock to openrisc > > - Use generic header in csky > > - Optimize cmpxchg & atomic code > > > > V8: > > https://lore.kernel.org/linux-riscv/20220724122517.1019187-1-guoren@kernel.org/ > > - Coding convention ticket fixup > > - Move combo spinlock into riscv and simply asm-generic/spinlock.h > > - Fixup xchg16 with wrong return value > > - Add csky qspinlock > > - Add combo & qspinlock & ticket-lock comparison > > - Clean up unnecessary riscv acquire and release definitions > > - Enable ARCH_INLINE_READ*/WRITE*/SPIN* for riscv & csky > > > > V7: > > https://lore.kernel.org/linux-riscv/20220628081946.1999419-1-guoren@kernel.org/ > > - Add combo spinlock (ticket & queued) support > > - Rename ticket_spinlock.h > > - Remove unnecessary atomic_read in ticket_spin_value_unlocked > > > > V6: > > https://lore.kernel.org/linux-riscv/20220621144920.2945595-1-guoren@kernel.org/ > > - Fixup Clang compile problem Reported-by: kernel test robot > > - Cleanup asm-generic/spinlock.h > > - Remove changelog in patch main comment part, suggested by > > Conor.Dooley > > - Remove "default y if NUMA" in Kconfig > > > > V5: > > https://lore.kernel.org/linux-riscv/20220620155404.1968739-1-guoren@kernel.org/ > > - Update comment with RISC-V forward guarantee feature. > > - Back to V3 direction and optimize asm code. > > > > V4: > > https://lore.kernel.org/linux-riscv/1616868399-82848-4-git-send-email-guoren@kernel.org/ > > - Remove custom sub-word xchg implementation > > - Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32 in locking/qspinlock > > > > V3: > > https://lore.kernel.org/linux-riscv/1616658937-82063-1-git-send-email-guoren@kernel.org/ > > - Coding convention by Peter Zijlstra's advices > > > > V2: > > https://lore.kernel.org/linux-riscv/1606225437-22948-2-git-send-email-guoren@kernel.org/ > > - Coding convention in cmpxchg.h > > - Re-implement short xchg > > - Remove char & cmpxchg implementations > > > > V1: > > https://lore.kernel.org/linux-riscv/20190211043829.30096-1-michaeljclark@mac.com/ > > - Using cmpxchg loop to implement sub-word atomic > > > > > > Guo Ren (17): > > asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock > > asm-generic: ticket-lock: Move into ticket_spinlock.h > > riscv: Use Zicbop in arch_xchg when available > > locking/qspinlock: Improve xchg_tail for number of cpus >= 16k > > riscv: qspinlock: Add basic queued_spinlock support > > riscv: qspinlock: Introduce combo spinlock > > riscv: qspinlock: Introduce qspinlock param for command line > > riscv: qspinlock: Add virt_spin_lock() support for KVM guest > > riscv: qspinlock: errata: Add ERRATA_THEAD_WRITE_ONCE fixup > > riscv: qspinlock: errata: Enable qspinlock for T-HEAD processors > > RISC-V: paravirt: pvqspinlock: Add paravirt qspinlock skeleton > > RISC-V: paravirt: pvqspinlock: Add nopvspin kernel parameter > > RISC-V: paravirt: pvqspinlock: Add SBI implementation > > RISC-V: paravirt: pvqspinlock: Add kconfig entry > > RISC-V: paravirt: pvqspinlock: Add trace point for pv_kick/wait > > RISC-V: paravirt: pvqspinlock: KVM: Add paravirt qspinlock skeleton > > RISC-V: paravirt: pvqspinlock: KVM: Implement > > kvm_sbi_ext_pvlock_kick_cpu() > > > > .../admin-guide/kernel-parameters.txt | 8 +- > > arch/riscv/Kconfig | 50 ++++++++ > > arch/riscv/Kconfig.errata | 19 +++ > > arch/riscv/errata/thead/errata.c | 29 +++++ > > arch/riscv/include/asm/Kbuild | 2 +- > > arch/riscv/include/asm/cmpxchg.h | 4 +- > > arch/riscv/include/asm/errata_list.h | 13 -- > > arch/riscv/include/asm/hwcap.h | 1 + > > arch/riscv/include/asm/insn-def.h | 5 + > > arch/riscv/include/asm/kvm_vcpu_sbi.h | 1 + > > arch/riscv/include/asm/processor.h | 13 ++ > > arch/riscv/include/asm/qspinlock.h | 35 ++++++ > > arch/riscv/include/asm/qspinlock_paravirt.h | 29 +++++ > > arch/riscv/include/asm/rwonce.h | 24 ++++ > > arch/riscv/include/asm/sbi.h | 14 +++ > > arch/riscv/include/asm/spinlock.h | 113 ++++++++++++++++++ > > arch/riscv/include/asm/vendorid_list.h | 14 +++ > > arch/riscv/include/uapi/asm/kvm.h | 1 + > > arch/riscv/kernel/Makefile | 1 + > > arch/riscv/kernel/cpufeature.c | 1 + > > arch/riscv/kernel/qspinlock_paravirt.c | 83 +++++++++++++ > > arch/riscv/kernel/sbi.c | 2 +- > > arch/riscv/kernel/setup.c | 60 ++++++++++ > > .../kernel/trace_events_filter_paravirt.h | 60 ++++++++++ > > arch/riscv/kvm/Makefile | 1 + > > arch/riscv/kvm/vcpu_sbi.c | 4 + > > arch/riscv/kvm/vcpu_sbi_pvlock.c | 57 +++++++++ > > include/asm-generic/rwonce.h | 2 + > > include/asm-generic/spinlock.h | 87 +------------- > > include/asm-generic/spinlock_types.h | 12 +- > > include/asm-generic/ticket_spinlock.h | 103 ++++++++++++++++ > > kernel/locking/qspinlock.c | 5 +- > > 32 files changed, 739 insertions(+), 114 deletions(-) > > create mode 100644 arch/riscv/include/asm/qspinlock.h > > create mode 100644 arch/riscv/include/asm/qspinlock_paravirt.h > > create mode 100644 arch/riscv/include/asm/rwonce.h > > create mode 100644 arch/riscv/include/asm/spinlock.h > > create mode 100644 arch/riscv/kernel/qspinlock_paravirt.c > > create mode 100644 arch/riscv/kernel/trace_events_filter_paravirt.h > > create mode 100644 arch/riscv/kvm/vcpu_sbi_pvlock.c > > create mode 100644 include/asm-generic/ticket_spinlock.h > > > > -- > > 2.36.1 > > > -- Best Regards Guo Ren ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V11 00/17] riscv: Add Native/Paravirt qspinlock support 2023-11-12 4:23 ` [PATCH V11 00/17] riscv: Add Native/Paravirt qspinlock support Guo Ren @ 2023-11-13 10:19 ` Leonardo Bras Soares Passos 0 siblings, 0 replies; 9+ messages in thread From: Leonardo Bras Soares Passos @ 2023-11-13 10:19 UTC (permalink / raw) To: Guo Ren Cc: paul.walmsley, anup, peterz, mingo, will, palmer, longman, boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas, conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook, greentime.hu, ajones, jszhang, wefu, wuwei2016, linux-arch, linux-riscv, linux-doc, kvm, virtualization, linux-csky, Guo Ren On Sun, Nov 12, 2023 at 1:24 AM Guo Ren <guoren@kernel.org> wrote: > > On Mon, Nov 6, 2023 at 3:42 PM Leonardo Bras <leobras@redhat.com> wrote: > > > > On Sun, Sep 10, 2023 at 04:28:54AM -0400, guoren@kernel.org wrote: > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > > > patch[1 - 10]: Native qspinlock > > > patch[11 -17]: Paravirt qspinlock > > > > > > patch[4]: Add prefetchw in qspinlock's xchg_tail when cpus >= 16k > > > > > > This series based on: > > > - [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h > > > https://lore.kernel.org/linux-riscv/20230810040349.92279-2-leobras@redhat.com/ > > > - [PATCH V3] asm-generic: ticket-lock: Optimize arch_spin_value_unlocked > > > https://lore.kernel.org/linux-riscv/20230908154339.3250567-1-guoren@kernel.org/ > > > > > > I merge them into sg2042-master branch, then you could directly try it on > > > sg2042 hardware platform: > > > > > > https://github.com/guoren83/linux/tree/sg2042-master-qspinlock-64ilp32_v5 > > > > > > Use sophgo_mango_ubuntu_defconfig for sg2042 64/128 cores hardware > > > platform. > > > > > > Native qspinlock > > > ================ > > > > > > This time we've proved the qspinlock on th1520 [1] & sg2042 [2], which > > > gives stability and performance improvement. All T-HEAD processors have > > > a strong LR/SC forward progress guarantee than the requirements of the > > > ISA, which could satisfy the xchg_tail of native_qspinlock. Now, > > > qspinlock has been run with us for more than 1 year, and we have enough > > > confidence to enable it for all the T-HEAD processors. Of causes, we > > > found a livelock problem with the qspinlock lock torture test from the > > > CPU store merge buffer delay mechanism, which caused the queued spinlock > > > becomes a dead ring and RCU warning to come out. We introduce a custom > > > WRITE_ONCE to solve this. Do we need explicit ISA instruction to signal > > > it? Or let hardware handle this. > > > > > > We've tested the patch on SOPHGO sg2042 & th1520 and passed the stress > > > test on Fedora & Ubuntu & OpenEuler ... Here is the performance > > > comparison between qspinlock and ticket_lock on sg2042 (64 cores): > > > > > > sysbench test=threads threads=32 yields=100 lock=8 (+13.8%): > > > queued_spinlock 0.5109/0.00 > > > ticket_spinlock 0.5814/0.00 > > > > > > perf futex/hash (+6.7%): > > > queued_spinlock 1444393 operations/sec (+- 0.09%) > > > ticket_spinlock 1353215 operations/sec (+- 0.15%) > > > > > > perf futex/wake-parallel (+8.6%): > > > queued_spinlock (waking 1/64 threads) in 0.0253 ms (+-2.90%) > > > ticket_spinlock (waking 1/64 threads) in 0.0275 ms (+-3.12%) > > > > > > perf futex/requeue (+4.2%): > > > queued_spinlock Requeued 64 of 64 threads in 0.0785 ms (+-0.55%) > > > ticket_spinlock Requeued 64 of 64 threads in 0.0818 ms (+-4.12%) > > > > > > System Benchmarks (+6.4%) > > > queued_spinlock: > > > System Benchmarks Index Values BASELINE RESULT INDEX > > > Dhrystone 2 using register variables 116700.0 628613745.4 53865.8 > > > Double-Precision Whetstone 55.0 182422.8 33167.8 > > > Execl Throughput 43.0 13116.6 3050.4 > > > File Copy 1024 bufsize 2000 maxblocks 3960.0 7762306.2 19601.8 > > > File Copy 256 bufsize 500 maxblocks 1655.0 3417556.8 20649.9 > > > File Copy 4096 bufsize 8000 maxblocks 5800.0 7427995.7 12806.9 > > > Pipe Throughput 12440.0 23058600.5 18535.9 > > > Pipe-based Context Switching 4000.0 2835617.7 7089.0 > > > Process Creation 126.0 12537.3 995.0 > > > Shell Scripts (1 concurrent) 42.4 57057.4 13456.9 > > > Shell Scripts (8 concurrent) 6.0 7367.1 12278.5 > > > System Call Overhead 15000.0 33308301.3 22205.5 > > > ======== > > > System Benchmarks Index Score 12426.1 > > > > > > ticket_spinlock: > > > System Benchmarks Index Values BASELINE RESULT INDEX > > > Dhrystone 2 using register variables 116700.0 626541701.9 53688.2 > > > Double-Precision Whetstone 55.0 181921.0 33076.5 > > > Execl Throughput 43.0 12625.1 2936.1 > > > File Copy 1024 bufsize 2000 maxblocks 3960.0 6553792.9 16550.0 > > > File Copy 256 bufsize 500 maxblocks 1655.0 3189231.6 19270.3 > > > File Copy 4096 bufsize 8000 maxblocks 5800.0 7221277.0 12450.5 > > > Pipe Throughput 12440.0 20594018.7 16554.7 > > > Pipe-based Context Switching 4000.0 2571117.7 6427.8 > > > Process Creation 126.0 10798.4 857.0 > > > Shell Scripts (1 concurrent) 42.4 57227.5 13497.1 > > > Shell Scripts (8 concurrent) 6.0 7329.2 12215.3 > > > System Call Overhead 15000.0 30766778.4 20511.2 > > > ======== > > > System Benchmarks Index Score 11670.7 > > > > > > The qspinlock has a significant improvement on SOPHGO SG2042 64 > > > cores platform than the ticket_lock. > > > > > > Paravirt qspinlock > > > ================== > > > > > > We implemented kvm_kick_cpu/kvm_wait_cpu and add tracepoints to observe the > > > behaviors. Also, introduce a new SBI extension SBI_EXT_PVLOCK (0xAB0401). If the > > > name and number are approved, I will send a formal proposal to the SBI spec. > > > > > > > Hello Guo Ren, > > > > Any update on this series? > Found a nested virtualization problem, and I'm solving that. After > that, I'll update v12. Oh, nice to hear :) I am very excited about this series, please let me know of any update. Thanks! Leo > > > > > Thanks! > > Leo > > > > > > > Changlog: > > > V11: > > > - Based on Leonardo Bras's cmpxchg_small patches v5. > > > - Based on Guo Ren's Optimize arch_spin_value_unlocked patch v3. > > > - Remove abusing alternative framework and use jump_label instead. > > > - Introduce prefetch.w to improve T-HEAD processors' LR/SC forward progress > > > guarantee. > > > - Optimize qspinlock xchg_tail when NR_CPUS >= 16K. > > > > > > V10: > > > https://lore.kernel.org/linux-riscv/20230802164701.192791-1-guoren@kernel.org/ > > > - Using an alternative framework instead of static_key_branch in the > > > asm/spinlock.h. > > > - Fixup store merge buffer problem, which causes qspinlock lock > > > torture test livelock. > > > - Add paravirt qspinlock support, include KVM backend > > > - Add Compact NUMA-awared qspinlock support > > > > > > V9: > > > https://lore.kernel.org/linux-riscv/20220808071318.3335746-1-guoren@kernel.org/ > > > - Cleanup generic ticket-lock code, (Using smp_mb__after_spinlock as > > > RCsc) > > > - Add qspinlock and combo-lock for riscv > > > - Add qspinlock to openrisc > > > - Use generic header in csky > > > - Optimize cmpxchg & atomic code > > > > > > V8: > > > https://lore.kernel.org/linux-riscv/20220724122517.1019187-1-guoren@kernel.org/ > > > - Coding convention ticket fixup > > > - Move combo spinlock into riscv and simply asm-generic/spinlock.h > > > - Fixup xchg16 with wrong return value > > > - Add csky qspinlock > > > - Add combo & qspinlock & ticket-lock comparison > > > - Clean up unnecessary riscv acquire and release definitions > > > - Enable ARCH_INLINE_READ*/WRITE*/SPIN* for riscv & csky > > > > > > V7: > > > https://lore.kernel.org/linux-riscv/20220628081946.1999419-1-guoren@kernel.org/ > > > - Add combo spinlock (ticket & queued) support > > > - Rename ticket_spinlock.h > > > - Remove unnecessary atomic_read in ticket_spin_value_unlocked > > > > > > V6: > > > https://lore.kernel.org/linux-riscv/20220621144920.2945595-1-guoren@kernel.org/ > > > - Fixup Clang compile problem Reported-by: kernel test robot > > > - Cleanup asm-generic/spinlock.h > > > - Remove changelog in patch main comment part, suggested by > > > Conor.Dooley > > > - Remove "default y if NUMA" in Kconfig > > > > > > V5: > > > https://lore.kernel.org/linux-riscv/20220620155404.1968739-1-guoren@kernel.org/ > > > - Update comment with RISC-V forward guarantee feature. > > > - Back to V3 direction and optimize asm code. > > > > > > V4: > > > https://lore.kernel.org/linux-riscv/1616868399-82848-4-git-send-email-guoren@kernel.org/ > > > - Remove custom sub-word xchg implementation > > > - Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32 in locking/qspinlock > > > > > > V3: > > > https://lore.kernel.org/linux-riscv/1616658937-82063-1-git-send-email-guoren@kernel.org/ > > > - Coding convention by Peter Zijlstra's advices > > > > > > V2: > > > https://lore.kernel.org/linux-riscv/1606225437-22948-2-git-send-email-guoren@kernel.org/ > > > - Coding convention in cmpxchg.h > > > - Re-implement short xchg > > > - Remove char & cmpxchg implementations > > > > > > V1: > > > https://lore.kernel.org/linux-riscv/20190211043829.30096-1-michaeljclark@mac.com/ > > > - Using cmpxchg loop to implement sub-word atomic > > > > > > > > > Guo Ren (17): > > > asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock > > > asm-generic: ticket-lock: Move into ticket_spinlock.h > > > riscv: Use Zicbop in arch_xchg when available > > > locking/qspinlock: Improve xchg_tail for number of cpus >= 16k > > > riscv: qspinlock: Add basic queued_spinlock support > > > riscv: qspinlock: Introduce combo spinlock > > > riscv: qspinlock: Introduce qspinlock param for command line > > > riscv: qspinlock: Add virt_spin_lock() support for KVM guest > > > riscv: qspinlock: errata: Add ERRATA_THEAD_WRITE_ONCE fixup > > > riscv: qspinlock: errata: Enable qspinlock for T-HEAD processors > > > RISC-V: paravirt: pvqspinlock: Add paravirt qspinlock skeleton > > > RISC-V: paravirt: pvqspinlock: Add nopvspin kernel parameter > > > RISC-V: paravirt: pvqspinlock: Add SBI implementation > > > RISC-V: paravirt: pvqspinlock: Add kconfig entry > > > RISC-V: paravirt: pvqspinlock: Add trace point for pv_kick/wait > > > RISC-V: paravirt: pvqspinlock: KVM: Add paravirt qspinlock skeleton > > > RISC-V: paravirt: pvqspinlock: KVM: Implement > > > kvm_sbi_ext_pvlock_kick_cpu() > > > > > > .../admin-guide/kernel-parameters.txt | 8 +- > > > arch/riscv/Kconfig | 50 ++++++++ > > > arch/riscv/Kconfig.errata | 19 +++ > > > arch/riscv/errata/thead/errata.c | 29 +++++ > > > arch/riscv/include/asm/Kbuild | 2 +- > > > arch/riscv/include/asm/cmpxchg.h | 4 +- > > > arch/riscv/include/asm/errata_list.h | 13 -- > > > arch/riscv/include/asm/hwcap.h | 1 + > > > arch/riscv/include/asm/insn-def.h | 5 + > > > arch/riscv/include/asm/kvm_vcpu_sbi.h | 1 + > > > arch/riscv/include/asm/processor.h | 13 ++ > > > arch/riscv/include/asm/qspinlock.h | 35 ++++++ > > > arch/riscv/include/asm/qspinlock_paravirt.h | 29 +++++ > > > arch/riscv/include/asm/rwonce.h | 24 ++++ > > > arch/riscv/include/asm/sbi.h | 14 +++ > > > arch/riscv/include/asm/spinlock.h | 113 ++++++++++++++++++ > > > arch/riscv/include/asm/vendorid_list.h | 14 +++ > > > arch/riscv/include/uapi/asm/kvm.h | 1 + > > > arch/riscv/kernel/Makefile | 1 + > > > arch/riscv/kernel/cpufeature.c | 1 + > > > arch/riscv/kernel/qspinlock_paravirt.c | 83 +++++++++++++ > > > arch/riscv/kernel/sbi.c | 2 +- > > > arch/riscv/kernel/setup.c | 60 ++++++++++ > > > .../kernel/trace_events_filter_paravirt.h | 60 ++++++++++ > > > arch/riscv/kvm/Makefile | 1 + > > > arch/riscv/kvm/vcpu_sbi.c | 4 + > > > arch/riscv/kvm/vcpu_sbi_pvlock.c | 57 +++++++++ > > > include/asm-generic/rwonce.h | 2 + > > > include/asm-generic/spinlock.h | 87 +------------- > > > include/asm-generic/spinlock_types.h | 12 +- > > > include/asm-generic/ticket_spinlock.h | 103 ++++++++++++++++ > > > kernel/locking/qspinlock.c | 5 +- > > > 32 files changed, 739 insertions(+), 114 deletions(-) > > > create mode 100644 arch/riscv/include/asm/qspinlock.h > > > create mode 100644 arch/riscv/include/asm/qspinlock_paravirt.h > > > create mode 100644 arch/riscv/include/asm/rwonce.h > > > create mode 100644 arch/riscv/include/asm/spinlock.h > > > create mode 100644 arch/riscv/kernel/qspinlock_paravirt.c > > > create mode 100644 arch/riscv/kernel/trace_events_filter_paravirt.h > > > create mode 100644 arch/riscv/kvm/vcpu_sbi_pvlock.c > > > create mode 100644 include/asm-generic/ticket_spinlock.h > > > > > > -- > > > 2.36.1 > > > > > > > > -- > Best Regards > Guo Ren > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH V11 03/17] riscv: Use Zicbop in arch_xchg when available [not found] <20230910082911.3378782-1-guoren@kernel.org> ` (2 preceding siblings ...) [not found] ` <ZUlPwQVG4OTkighB@redhat.com> @ 2023-12-31 8:29 ` guoren 3 siblings, 0 replies; 9+ messages in thread From: guoren @ 2023-12-31 8:29 UTC (permalink / raw) To: paul.walmsley, palmer, guoren, panqinglin2020, bjorn, conor.dooley, leobras, peterz, keescook, wuwei2016, xiaoguang.xing, chao.wei, unicorn_wang, uwu, jszhang, wefu, atishp, ajones, anup, mingo, will, palmer, longman, boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas, alexghiti, greentime.hu Cc: linux-riscv, linux-kernel, linux-arch, linux-doc, kvm, virtualization, linux-csky, Guo Ren From: Guo Ren <guoren@linux.alibaba.com> Cache-block prefetch instructions are HINTs to the hardware to indicate that software intends to perform a particular type of memory access in the near future. Enable ARCH_HAS_PREFETCHW and improve the arch_xchg for qspinlock xchg_tail. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> --- arch/riscv/Kconfig | 15 +++++++++++++++ arch/riscv/include/asm/cmpxchg.h | 4 +++- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/include/asm/insn-def.h | 5 +++++ arch/riscv/include/asm/processor.h | 13 +++++++++++++ arch/riscv/kernel/cpufeature.c | 1 + 6 files changed, 38 insertions(+), 1 deletion(-) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index e9ae6fa232c3..2c346fe169c1 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -617,6 +617,21 @@ config RISCV_ISA_ZICBOZ If you don't know what to do here, say Y. +config RISCV_ISA_ZICBOP + bool "Zicbop extension support for cache block prefetch" + depends on MMU + depends on RISCV_ALTERNATIVE + default y + help + Adds support to dynamically detect the presence of the ZICBOP + extension (Cache Block Prefetch Operations) and enable its + usage. + + The Zicbop extension can be used to prefetch cache block for + read/write/instruction fetch. + + If you don't know what to do here, say Y. + config TOOLCHAIN_HAS_ZIHINTPAUSE bool default y diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h index 702725727671..56eff7a9d2d2 100644 --- a/arch/riscv/include/asm/cmpxchg.h +++ b/arch/riscv/include/asm/cmpxchg.h @@ -11,6 +11,7 @@ #include <asm/barrier.h> #include <asm/fence.h> +#include <asm/processor.h> #define __arch_xchg_masked(prepend, append, r, p, n) \ ({ \ @@ -25,6 +26,7 @@ \ __asm__ __volatile__ ( \ prepend \ + PREFETCHW_ASM(%5) \ "0: lr.w %0, %2\n" \ " and %1, %0, %z4\n" \ " or %1, %1, %z3\n" \ @@ -32,7 +34,7 @@ " bnez %1, 0b\n" \ append \ : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ - : "rJ" (__newx), "rJ" (~__mask) \ + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ : "memory"); \ \ r = (__typeof__(*(p)))((__retx & __mask) >> __s); \ diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index b7b58258f6c7..78b7b8b53778 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -58,6 +58,7 @@ #define RISCV_ISA_EXT_ZICSR 40 #define RISCV_ISA_EXT_ZIFENCEI 41 #define RISCV_ISA_EXT_ZIHPM 42 +#define RISCV_ISA_EXT_ZICBOP 43 #define RISCV_ISA_EXT_MAX 64 diff --git a/arch/riscv/include/asm/insn-def.h b/arch/riscv/include/asm/insn-def.h index 6960beb75f32..dc590d331894 100644 --- a/arch/riscv/include/asm/insn-def.h +++ b/arch/riscv/include/asm/insn-def.h @@ -134,6 +134,7 @@ #define RV_OPCODE_MISC_MEM RV_OPCODE(15) #define RV_OPCODE_SYSTEM RV_OPCODE(115) +#define RV_OPCODE_PREFETCH RV_OPCODE(19) #define HFENCE_VVMA(vaddr, asid) \ INSN_R(OPCODE_SYSTEM, FUNC3(0), FUNC7(17), \ @@ -196,4 +197,8 @@ INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0), \ RS1(base), SIMM12(4)) +#define CBO_prefetchw(base) \ + INSN_R(OPCODE_PREFETCH, FUNC3(6), FUNC7(0), \ + RD(x0), RS1(base), RS2(x0)) + #endif /* __ASM_INSN_DEF_H */ diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index de9da852f78d..7ad3a24212e8 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -12,6 +12,8 @@ #include <vdso/processor.h> #include <asm/ptrace.h> +#include <asm/insn-def.h> +#include <asm/hwcap.h> #ifdef CONFIG_64BIT #define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1)) @@ -103,6 +105,17 @@ static inline void arch_thread_struct_whitelist(unsigned long *offset, #define KSTK_EIP(tsk) (ulong)(task_pt_regs(tsk)->epc) #define KSTK_ESP(tsk) (ulong)(task_pt_regs(tsk)->sp) +#define ARCH_HAS_PREFETCHW +#define PREFETCHW_ASM(base) ALTERNATIVE(__nops(1), \ + CBO_prefetchw(base), \ + 0, \ + RISCV_ISA_EXT_ZICBOP, \ + CONFIG_RISCV_ISA_ZICBOP) +static inline void prefetchw(const void *ptr) +{ + asm volatile(PREFETCHW_ASM(%0) + : : "r" (ptr) : "memory"); +} /* Do necessary setup to start up a newly executed thread. */ extern void start_thread(struct pt_regs *regs, diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index ef7b4fd9e876..e0b897db0b97 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -159,6 +159,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = { __RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h), __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM), __RISCV_ISA_EXT_DATA(zicboz, RISCV_ISA_EXT_ZICBOZ), + __RISCV_ISA_EXT_DATA(zicbop, RISCV_ISA_EXT_ZICBOP), __RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR), __RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR), __RISCV_ISA_EXT_DATA(zifencei, RISCV_ISA_EXT_ZIFENCEI), -- 2.36.1 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-12-31 8:30 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20230910082911.3378782-1-guoren@kernel.org>
[not found] ` <20230910082911.3378782-5-guoren@kernel.org>
2023-09-11 2:35 ` [PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k Waiman Long
[not found] ` <CAJF2gTSbUUdLhN8PFdFzQd0M1T2MVOL1cdZn46WKq1S8MuQYHw@mail.gmail.com>
2023-09-11 13:03 ` Waiman Long
[not found] ` <CAJF2gTQ3Q7f+FGorVTR66c6TGWsSeeKVvLF+LH1_m3kSHrm0yA@mail.gmail.com>
[not found] ` <ZQF49GIZoFceUGYH@redhat.com>
[not found] ` <CAJF2gTTHdCr-FQVSGUc+LapkJPmDiEYYa_1P6T86uCjRujgnTg@mail.gmail.com>
2023-09-13 13:06 ` Waiman Long
[not found] ` <20230910082911.3378782-8-guoren@kernel.org>
2023-09-11 15:22 ` [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line Waiman Long
2023-09-11 15:34 ` Waiman Long
[not found] ` <CAJF2gTT2hRxgnQt+WJ9P0YBWnUaZJ1-9g3ZE9tOz_MiLSsUjwQ@mail.gmail.com>
[not found] ` <ZQK2-CIL9U_QdMjh@redhat.com>
2023-09-14 17:23 ` Waiman Long
[not found] ` <ZUlPwQVG4OTkighB@redhat.com>
2023-11-12 4:23 ` [PATCH V11 00/17] riscv: Add Native/Paravirt qspinlock support Guo Ren
2023-11-13 10:19 ` Leonardo Bras Soares Passos
2023-12-31 8:29 ` [PATCH V11 03/17] riscv: Use Zicbop in arch_xchg when available guoren
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).