virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k
       [not found] ` <20230910082911.3378782-5-guoren@kernel.org>
@ 2023-09-11  2:35   ` Waiman Long
       [not found]     ` <CAJF2gTSbUUdLhN8PFdFzQd0M1T2MVOL1cdZn46WKq1S8MuQYHw@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2023-09-11  2:35 UTC (permalink / raw)
  To: guoren, paul.walmsley, anup, peterz, mingo, will, palmer,
	boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas,
	conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook,
	greentime.hu, ajones, jszhang, wefu, wuwei2016, leobras
  Cc: linux-arch, Guo Ren, kvm, linux-doc, linux-csky, virtualization,
	linux-riscv


On 9/10/23 04:28, guoren@kernel.org wrote:
> From: Guo Ren <guoren@linux.alibaba.com>
>
> The target of xchg_tail is to write the tail to the lock value, so
> adding prefetchw could help the next cmpxchg step, which may
> decrease the cmpxchg retry loops of xchg_tail. Some processors may
> utilize this feature to give a forward guarantee, e.g., RISC-V
> XuanTie processors would block the snoop channel & irq for several
> cycles when prefetch.w instruction (from Zicbop extension) retired,
> which guarantees the next cmpxchg succeeds.
>
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>   kernel/locking/qspinlock.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index d3f99060b60f..96b54e2ade86 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -223,7 +223,10 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
>    */
>   static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
>   {
> -	u32 old, new, val = atomic_read(&lock->val);
> +	u32 old, new, val;
> +
> +	prefetchw(&lock->val);
> +	val = atomic_read(&lock->val);
>   
>   	for (;;) {
>   		new = (val & _Q_LOCKED_PENDING_MASK) | tail;

That looks a bit weird. You pre-fetch and then immediately read it. How 
much performance gain you get by this change alone?

Maybe you can define an arch specific primitive that default back to 
atomic_read() if not defined.

Cheers,
Longman

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k
       [not found]     ` <CAJF2gTSbUUdLhN8PFdFzQd0M1T2MVOL1cdZn46WKq1S8MuQYHw@mail.gmail.com>
@ 2023-09-11 13:03       ` Waiman Long
       [not found]         ` <CAJF2gTQ3Q7f+FGorVTR66c6TGWsSeeKVvLF+LH1_m3kSHrm0yA@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Waiman Long @ 2023-09-11 13:03 UTC (permalink / raw)
  To: Guo Ren
  Cc: Guo Ren, kvm, linux-doc, peterz, catalin.marinas, bjorn, palmer,
	virtualization, conor.dooley, jszhang, linux-riscv, will,
	keescook, linux-arch, anup, linux-csky, xiaoguang.xing, mingo,
	greentime.hu, ajones, alexghiti, paulmck, boqun.feng, rostedt,
	leobras, paul.walmsley, tglx, rdunlap, wuwei2016, wefu

On 9/10/23 23:09, Guo Ren wrote:
> On Mon, Sep 11, 2023 at 10:35 AM Waiman Long <longman@redhat.com> wrote:
>>
>> On 9/10/23 04:28, guoren@kernel.org wrote:
>>> From: Guo Ren <guoren@linux.alibaba.com>
>>>
>>> The target of xchg_tail is to write the tail to the lock value, so
>>> adding prefetchw could help the next cmpxchg step, which may
>>> decrease the cmpxchg retry loops of xchg_tail. Some processors may
>>> utilize this feature to give a forward guarantee, e.g., RISC-V
>>> XuanTie processors would block the snoop channel & irq for several
>>> cycles when prefetch.w instruction (from Zicbop extension) retired,
>>> which guarantees the next cmpxchg succeeds.
>>>
>>> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
>>> Signed-off-by: Guo Ren <guoren@kernel.org>
>>> ---
>>>    kernel/locking/qspinlock.c | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
>>> index d3f99060b60f..96b54e2ade86 100644
>>> --- a/kernel/locking/qspinlock.c
>>> +++ b/kernel/locking/qspinlock.c
>>> @@ -223,7 +223,10 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
>>>     */
>>>    static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
>>>    {
>>> -     u32 old, new, val = atomic_read(&lock->val);
>>> +     u32 old, new, val;
>>> +
>>> +     prefetchw(&lock->val);
>>> +     val = atomic_read(&lock->val);
>>>
>>>        for (;;) {
>>>                new = (val & _Q_LOCKED_PENDING_MASK) | tail;
>> That looks a bit weird. You pre-fetch and then immediately read it. How
>> much performance gain you get by this change alone?
>>
>> Maybe you can define an arch specific primitive that default back to
>> atomic_read() if not defined.
> Thx for the reply. This is a generic optimization point I would like
> to talk about with you.
>
> First, prefetchw() makes cacheline an exclusive state and serves for
> the next cmpxchg loop semantic, which writes the idx_tail part of
> arch_spin_lock. The atomic_read only makes cacheline in the shared
> state, which couldn't give any guarantee for the next cmpxchg loop
> semantic. Micro-architecture could utilize prefetchw() to provide a
> strong forward progress guarantee for the xchg_tail, e.g., the T-HEAD
> XuanTie processor would hold the exclusive cacheline state until the
> next cmpxchg write success.
>
> In the end, Let's go back to the principle: the xchg_tail is an atomic
> swap operation that contains write eventually, so giving a prefetchw()
> at the beginning is acceptable for all architectures..
> ••••••••••••

I did realize afterward that prefetchw gets the cacheline in exclusive 
state. I will suggest you mention that in your commit log as well as 
adding a comment about its purpose in the code.

Thanks,
Longman

>> Cheers,
>> Longman
>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line
       [not found] ` <20230910082911.3378782-8-guoren@kernel.org>
@ 2023-09-11 15:22   ` Waiman Long
  2023-09-11 15:34   ` Waiman Long
  1 sibling, 0 replies; 9+ messages in thread
From: Waiman Long @ 2023-09-11 15:22 UTC (permalink / raw)
  To: guoren, paul.walmsley, anup, peterz, mingo, will, palmer,
	boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas,
	conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook,
	greentime.hu, ajones, jszhang, wefu, wuwei2016, leobras
  Cc: linux-arch, Guo Ren, kvm, linux-doc, linux-csky, virtualization,
	linux-riscv

On 9/10/23 04:29, guoren@kernel.org wrote:
> From: Guo Ren <guoren@linux.alibaba.com>
>
> Allow cmdline to force the kernel to use queued_spinlock when
> CONFIG_RISCV_COMBO_SPINLOCKS=y.
>
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>   Documentation/admin-guide/kernel-parameters.txt |  2 ++
>   arch/riscv/kernel/setup.c                       | 16 +++++++++++++++-
>   2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 7dfb540c4f6c..61cacb8dfd0e 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4693,6 +4693,8 @@
>   			[KNL] Number of legacy pty's. Overwrites compiled-in
>   			default number.
>   
> +	qspinlock	[RISCV] Force to use qspinlock or auto-detect spinlock.
> +
>   	qspinlock.numa_spinlock_threshold_ns=	[NUMA, PV_OPS]
>   			Set the time threshold in nanoseconds for the
>   			number of intra-node lock hand-offs before the

Your patch series is still based on top of numa-aware qspinlock patchset 
which isn't upstream yet. Please rebase it without that as that will 
cause merge conflict during upstream merge.

Cheers,
Longman

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line
       [not found] ` <20230910082911.3378782-8-guoren@kernel.org>
  2023-09-11 15:22   ` [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line Waiman Long
@ 2023-09-11 15:34   ` Waiman Long
       [not found]     ` <CAJF2gTT2hRxgnQt+WJ9P0YBWnUaZJ1-9g3ZE9tOz_MiLSsUjwQ@mail.gmail.com>
  1 sibling, 1 reply; 9+ messages in thread
From: Waiman Long @ 2023-09-11 15:34 UTC (permalink / raw)
  To: guoren, paul.walmsley, anup, peterz, mingo, will, palmer,
	boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas,
	conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook,
	greentime.hu, ajones, jszhang, wefu, wuwei2016, leobras
  Cc: linux-arch, Guo Ren, kvm, linux-doc, linux-csky, virtualization,
	linux-riscv

On 9/10/23 04:29, guoren@kernel.org wrote:
> From: Guo Ren <guoren@linux.alibaba.com>
>
> Allow cmdline to force the kernel to use queued_spinlock when
> CONFIG_RISCV_COMBO_SPINLOCKS=y.
>
> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
> Signed-off-by: Guo Ren <guoren@kernel.org>
> ---
>   Documentation/admin-guide/kernel-parameters.txt |  2 ++
>   arch/riscv/kernel/setup.c                       | 16 +++++++++++++++-
>   2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 7dfb540c4f6c..61cacb8dfd0e 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4693,6 +4693,8 @@
>   			[KNL] Number of legacy pty's. Overwrites compiled-in
>   			default number.
>   
> +	qspinlock	[RISCV] Force to use qspinlock or auto-detect spinlock.
> +
>   	qspinlock.numa_spinlock_threshold_ns=	[NUMA, PV_OPS]
>   			Set the time threshold in nanoseconds for the
>   			number of intra-node lock hand-offs before the
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index a447cf360a18..0f084f037651 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -270,6 +270,15 @@ static void __init parse_dtb(void)
>   }
>   
>   #ifdef CONFIG_RISCV_COMBO_SPINLOCKS
> +bool enable_qspinlock_key = false;

You can use __ro_after_init qualifier for enable_qspinlock_key. BTW, 
this is not a static key, just a simple flag. So what is the point of 
the _key suffix?

Cheers,
Longman

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k
       [not found]             ` <CAJF2gTTHdCr-FQVSGUc+LapkJPmDiEYYa_1P6T86uCjRujgnTg@mail.gmail.com>
@ 2023-09-13 13:06               ` Waiman Long
  0 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2023-09-13 13:06 UTC (permalink / raw)
  To: Guo Ren, Leonardo Bras
  Cc: Guo Ren, kvm, linux-doc, peterz, catalin.marinas, bjorn, palmer,
	virtualization, conor.dooley, jszhang, linux-riscv, will,
	keescook, linux-arch, anup, linux-csky, xiaoguang.xing, mingo,
	greentime.hu, ajones, alexghiti, paulmck, boqun.feng, rostedt,
	paul.walmsley, tglx, rdunlap, wuwei2016, wefu

On 9/13/23 08:52, Guo Ren wrote:
> On Wed, Sep 13, 2023 at 4:55 PM Leonardo Bras <leobras@redhat.com> wrote:
>> On Tue, Sep 12, 2023 at 09:10:08AM +0800, Guo Ren wrote:
>>> On Mon, Sep 11, 2023 at 9:03 PM Waiman Long <longman@redhat.com> wrote:
>>>> On 9/10/23 23:09, Guo Ren wrote:
>>>>> On Mon, Sep 11, 2023 at 10:35 AM Waiman Long <longman@redhat.com> wrote:
>>>>>> On 9/10/23 04:28, guoren@kernel.org wrote:
>>>>>>> From: Guo Ren <guoren@linux.alibaba.com>
>>>>>>>
>>>>>>> The target of xchg_tail is to write the tail to the lock value, so
>>>>>>> adding prefetchw could help the next cmpxchg step, which may
>>>>>>> decrease the cmpxchg retry loops of xchg_tail. Some processors may
>>>>>>> utilize this feature to give a forward guarantee, e.g., RISC-V
>>>>>>> XuanTie processors would block the snoop channel & irq for several
>>>>>>> cycles when prefetch.w instruction (from Zicbop extension) retired,
>>>>>>> which guarantees the next cmpxchg succeeds.
>>>>>>>
>>>>>>> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
>>>>>>> Signed-off-by: Guo Ren <guoren@kernel.org>
>>>>>>> ---
>>>>>>>     kernel/locking/qspinlock.c | 5 ++++-
>>>>>>>     1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
>>>>>>> index d3f99060b60f..96b54e2ade86 100644
>>>>>>> --- a/kernel/locking/qspinlock.c
>>>>>>> +++ b/kernel/locking/qspinlock.c
>>>>>>> @@ -223,7 +223,10 @@ static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
>>>>>>>      */
>>>>>>>     static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
>>>>>>>     {
>>>>>>> -     u32 old, new, val = atomic_read(&lock->val);
>>>>>>> +     u32 old, new, val;
>>>>>>> +
>>>>>>> +     prefetchw(&lock->val);
>>>>>>> +     val = atomic_read(&lock->val);
>>>>>>>
>>>>>>>         for (;;) {
>>>>>>>                 new = (val & _Q_LOCKED_PENDING_MASK) | tail;
>>>>>> That looks a bit weird. You pre-fetch and then immediately read it. How
>>>>>> much performance gain you get by this change alone?
>>>>>>
>>>>>> Maybe you can define an arch specific primitive that default back to
>>>>>> atomic_read() if not defined.
>>>>> Thx for the reply. This is a generic optimization point I would like
>>>>> to talk about with you.
>>>>>
>>>>> First, prefetchw() makes cacheline an exclusive state and serves for
>>>>> the next cmpxchg loop semantic, which writes the idx_tail part of
>>>>> arch_spin_lock. The atomic_read only makes cacheline in the shared
>>>>> state, which couldn't give any guarantee for the next cmpxchg loop
>>>>> semantic. Micro-architecture could utilize prefetchw() to provide a
>>>>> strong forward progress guarantee for the xchg_tail, e.g., the T-HEAD
>>>>> XuanTie processor would hold the exclusive cacheline state until the
>>>>> next cmpxchg write success.
>>>>>
>>>>> In the end, Let's go back to the principle: the xchg_tail is an atomic
>>>>> swap operation that contains write eventually, so giving a prefetchw()
>>>>> at the beginning is acceptable for all architectures..
>>>>> ••••••••••••
>>>> I did realize afterward that prefetchw gets the cacheline in exclusive
>>>> state. I will suggest you mention that in your commit log as well as
>>>> adding a comment about its purpose in the code.
>>> Okay, I would do that in v12, thx.
>> I would suggest adding a snippet from the ISA Extenstion doc:
>>
>> "A prefetch.w instruction indicates to hardware that the cache block whose
>> effective address is the sum of the base address specified in rs1 and the
>> sign-extended offset encoded in imm[11:0], where imm[4:0] equals 0b00000,
>> is likely to be accessed by a data write (i.e. store) in the near future."
> Good point, thx.

qspinlock is generic code. I suppose this is for the RISCV architecture. 
You can mention that in the commit log as an example, but I prefer more 
generic comment especially in the code.

Cheers,
Longman

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line
       [not found]       ` <ZQK2-CIL9U_QdMjh@redhat.com>
@ 2023-09-14 17:23         ` Waiman Long
  0 siblings, 0 replies; 9+ messages in thread
From: Waiman Long @ 2023-09-14 17:23 UTC (permalink / raw)
  To: Leonardo Bras, Guo Ren
  Cc: Guo Ren, kvm, linux-doc, peterz, catalin.marinas, bjorn, palmer,
	virtualization, conor.dooley, jszhang, linux-riscv, will,
	keescook, linux-arch, anup, linux-csky, xiaoguang.xing, mingo,
	greentime.hu, ajones, alexghiti, paulmck, boqun.feng, rostedt,
	paul.walmsley, tglx, rdunlap, wuwei2016, wefu

On 9/14/23 03:32, Leonardo Bras wrote:
> On Tue, Sep 12, 2023 at 09:08:34AM +0800, Guo Ren wrote:
>> On Mon, Sep 11, 2023 at 11:34 PM Waiman Long <longman@redhat.com> wrote:
>>> On 9/10/23 04:29, guoren@kernel.org wrote:
>>>> From: Guo Ren <guoren@linux.alibaba.com>
>>>>
>>>> Allow cmdline to force the kernel to use queued_spinlock when
>>>> CONFIG_RISCV_COMBO_SPINLOCKS=y.
>>>>
>>>> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
>>>> Signed-off-by: Guo Ren <guoren@kernel.org>
>>>> ---
>>>>    Documentation/admin-guide/kernel-parameters.txt |  2 ++
>>>>    arch/riscv/kernel/setup.c                       | 16 +++++++++++++++-
>>>>    2 files changed, 17 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>>> index 7dfb540c4f6c..61cacb8dfd0e 100644
>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>> @@ -4693,6 +4693,8 @@
>>>>                        [KNL] Number of legacy pty's. Overwrites compiled-in
>>>>                        default number.
>>>>
>>>> +     qspinlock       [RISCV] Force to use qspinlock or auto-detect spinlock.
>>>> +
>>>>        qspinlock.numa_spinlock_threshold_ns=   [NUMA, PV_OPS]
>>>>                        Set the time threshold in nanoseconds for the
>>>>                        number of intra-node lock hand-offs before the
>>>> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
>>>> index a447cf360a18..0f084f037651 100644
>>>> --- a/arch/riscv/kernel/setup.c
>>>> +++ b/arch/riscv/kernel/setup.c
>>>> @@ -270,6 +270,15 @@ static void __init parse_dtb(void)
>>>>    }
>>>>
>>>>    #ifdef CONFIG_RISCV_COMBO_SPINLOCKS
>>>> +bool enable_qspinlock_key = false;
>>> You can use __ro_after_init qualifier for enable_qspinlock_key. BTW,
>>> this is not a static key, just a simple flag. So what is the point of
>>> the _key suffix?
>> Okay, I would change it to:
>> bool enable_qspinlock_flag __ro_after_init = false;
> IIUC, this bool / flag is used in a single file, so it makes sense for it
> to be static. Being static means it does not need to be initialized to
> false, as it's standard to zero-fill this areas.
>
> Also, since it's a bool, it does not need to be called _flag.
>
> I would go with:
>
> static bool enable_qspinlock __ro_after_init;

I actually was thinking about the same suggestion to add static. Then I 
realized that the flag was also used in another file in a later patch. 
Of course, if it turns out that this flag is no longer needed outside of 
this file, it should be static.

Cheers,
Longman

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V11 00/17] riscv: Add Native/Paravirt qspinlock support
       [not found] ` <ZUlPwQVG4OTkighB@redhat.com>
@ 2023-11-12  4:23   ` Guo Ren
  2023-11-13 10:19     ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 9+ messages in thread
From: Guo Ren @ 2023-11-12  4:23 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: paul.walmsley, anup, peterz, mingo, will, palmer, longman,
	boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas,
	conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook,
	greentime.hu, ajones, jszhang, wefu, wuwei2016, linux-arch,
	linux-riscv, linux-doc, kvm, virtualization, linux-csky, Guo Ren

On Mon, Nov 6, 2023 at 3:42 PM Leonardo Bras <leobras@redhat.com> wrote:
>
> On Sun, Sep 10, 2023 at 04:28:54AM -0400, guoren@kernel.org wrote:
> > From: Guo Ren <guoren@linux.alibaba.com>
> >
> > patch[1 - 10]: Native   qspinlock
> > patch[11 -17]: Paravirt qspinlock
> >
> > patch[4]: Add prefetchw in qspinlock's xchg_tail when cpus >= 16k
> >
> > This series based on:
> >  - [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h
> >    https://lore.kernel.org/linux-riscv/20230810040349.92279-2-leobras@redhat.com/
> >  - [PATCH V3] asm-generic: ticket-lock: Optimize arch_spin_value_unlocked
> >    https://lore.kernel.org/linux-riscv/20230908154339.3250567-1-guoren@kernel.org/
> >
> > I merge them into sg2042-master branch, then you could directly try it on
> > sg2042 hardware platform:
> >
> > https://github.com/guoren83/linux/tree/sg2042-master-qspinlock-64ilp32_v5
> >
> > Use sophgo_mango_ubuntu_defconfig for sg2042 64/128 cores hardware
> > platform.
> >
> > Native qspinlock
> > ================
> >
> > This time we've proved the qspinlock on th1520 [1] & sg2042 [2], which
> > gives stability and performance improvement. All T-HEAD processors have
> > a strong LR/SC forward progress guarantee than the requirements of the
> > ISA, which could satisfy the xchg_tail of native_qspinlock. Now,
> > qspinlock has been run with us for more than 1 year, and we have enough
> > confidence to enable it for all the T-HEAD processors. Of causes, we
> > found a livelock problem with the qspinlock lock torture test from the
> > CPU store merge buffer delay mechanism, which caused the queued spinlock
> > becomes a dead ring and RCU warning to come out. We introduce a custom
> > WRITE_ONCE to solve this. Do we need explicit ISA instruction to signal
> > it? Or let hardware handle this.
> >
> > We've tested the patch on SOPHGO sg2042 & th1520 and passed the stress
> > test on Fedora & Ubuntu & OpenEuler ... Here is the performance
> > comparison between qspinlock and ticket_lock on sg2042 (64 cores):
> >
> > sysbench test=threads threads=32 yields=100 lock=8 (+13.8%):
> >   queued_spinlock 0.5109/0.00
> >   ticket_spinlock 0.5814/0.00
> >
> > perf futex/hash (+6.7%):
> >   queued_spinlock 1444393 operations/sec (+- 0.09%)
> >   ticket_spinlock 1353215 operations/sec (+- 0.15%)
> >
> > perf futex/wake-parallel (+8.6%):
> >   queued_spinlock (waking 1/64 threads) in 0.0253 ms (+-2.90%)
> >   ticket_spinlock (waking 1/64 threads) in 0.0275 ms (+-3.12%)
> >
> > perf futex/requeue (+4.2%):
> >   queued_spinlock Requeued 64 of 64 threads in 0.0785 ms (+-0.55%)
> >   ticket_spinlock Requeued 64 of 64 threads in 0.0818 ms (+-4.12%)
> >
> > System Benchmarks (+6.4%)
> >   queued_spinlock:
> >     System Benchmarks Index Values               BASELINE       RESULT    INDEX
> >     Dhrystone 2 using register variables         116700.0  628613745.4  53865.8
> >     Double-Precision Whetstone                       55.0     182422.8  33167.8
> >     Execl Throughput                                 43.0      13116.6   3050.4
> >     File Copy 1024 bufsize 2000 maxblocks          3960.0    7762306.2  19601.8
> >     File Copy 256 bufsize 500 maxblocks            1655.0    3417556.8  20649.9
> >     File Copy 4096 bufsize 8000 maxblocks          5800.0    7427995.7  12806.9
> >     Pipe Throughput                               12440.0   23058600.5  18535.9
> >     Pipe-based Context Switching                   4000.0    2835617.7   7089.0
> >     Process Creation                                126.0      12537.3    995.0
> >     Shell Scripts (1 concurrent)                     42.4      57057.4  13456.9
> >     Shell Scripts (8 concurrent)                      6.0       7367.1  12278.5
> >     System Call Overhead                          15000.0   33308301.3  22205.5
> >                                                                        ========
> >     System Benchmarks Index Score                                       12426.1
> >
> >   ticket_spinlock:
> >     System Benchmarks Index Values               BASELINE       RESULT    INDEX
> >     Dhrystone 2 using register variables         116700.0  626541701.9  53688.2
> >     Double-Precision Whetstone                       55.0     181921.0  33076.5
> >     Execl Throughput                                 43.0      12625.1   2936.1
> >     File Copy 1024 bufsize 2000 maxblocks          3960.0    6553792.9  16550.0
> >     File Copy 256 bufsize 500 maxblocks            1655.0    3189231.6  19270.3
> >     File Copy 4096 bufsize 8000 maxblocks          5800.0    7221277.0  12450.5
> >     Pipe Throughput                               12440.0   20594018.7  16554.7
> >     Pipe-based Context Switching                   4000.0    2571117.7   6427.8
> >     Process Creation                                126.0      10798.4    857.0
> >     Shell Scripts (1 concurrent)                     42.4      57227.5  13497.1
> >     Shell Scripts (8 concurrent)                      6.0       7329.2  12215.3
> >     System Call Overhead                          15000.0   30766778.4  20511.2
> >                                                                        ========
> >     System Benchmarks Index Score                                       11670.7
> >
> > The qspinlock has a significant improvement on SOPHGO SG2042 64
> > cores platform than the ticket_lock.
> >
> > Paravirt qspinlock
> > ==================
> >
> > We implemented kvm_kick_cpu/kvm_wait_cpu and add tracepoints to observe the
> > behaviors. Also, introduce a new SBI extension SBI_EXT_PVLOCK (0xAB0401). If the
> > name and number are approved, I will send a formal proposal to the SBI spec.
> >
>
> Hello Guo Ren,
>
> Any update on this series?
Found a nested virtualization problem, and I'm solving that. After
that, I'll update v12.

>
> Thanks!
> Leo
>
>
> > Changlog:
> > V11:
> >  - Based on Leonardo Bras's cmpxchg_small patches v5.
> >  - Based on Guo Ren's Optimize arch_spin_value_unlocked patch v3.
> >  - Remove abusing alternative framework and use jump_label instead.
> >  - Introduce prefetch.w to improve T-HEAD processors' LR/SC forward progress
> >    guarantee.
> >  - Optimize qspinlock xchg_tail when NR_CPUS >= 16K.
> >
> > V10:
> > https://lore.kernel.org/linux-riscv/20230802164701.192791-1-guoren@kernel.org/
> >  - Using an alternative framework instead of static_key_branch in the
> >    asm/spinlock.h.
> >  - Fixup store merge buffer problem, which causes qspinlock lock
> >    torture test livelock.
> >  - Add paravirt qspinlock support, include KVM backend
> >  - Add Compact NUMA-awared qspinlock support
> >
> > V9:
> > https://lore.kernel.org/linux-riscv/20220808071318.3335746-1-guoren@kernel.org/
> >  - Cleanup generic ticket-lock code, (Using smp_mb__after_spinlock as
> >    RCsc)
> >  - Add qspinlock and combo-lock for riscv
> >  - Add qspinlock to openrisc
> >  - Use generic header in csky
> >  - Optimize cmpxchg & atomic code
> >
> > V8:
> > https://lore.kernel.org/linux-riscv/20220724122517.1019187-1-guoren@kernel.org/
> >  - Coding convention ticket fixup
> >  - Move combo spinlock into riscv and simply asm-generic/spinlock.h
> >  - Fixup xchg16 with wrong return value
> >  - Add csky qspinlock
> >  - Add combo & qspinlock & ticket-lock comparison
> >  - Clean up unnecessary riscv acquire and release definitions
> >  - Enable ARCH_INLINE_READ*/WRITE*/SPIN* for riscv & csky
> >
> > V7:
> > https://lore.kernel.org/linux-riscv/20220628081946.1999419-1-guoren@kernel.org/
> >  - Add combo spinlock (ticket & queued) support
> >  - Rename ticket_spinlock.h
> >  - Remove unnecessary atomic_read in ticket_spin_value_unlocked
> >
> > V6:
> > https://lore.kernel.org/linux-riscv/20220621144920.2945595-1-guoren@kernel.org/
> >  - Fixup Clang compile problem Reported-by: kernel test robot
> >  - Cleanup asm-generic/spinlock.h
> >  - Remove changelog in patch main comment part, suggested by
> >    Conor.Dooley
> >  - Remove "default y if NUMA" in Kconfig
> >
> > V5:
> > https://lore.kernel.org/linux-riscv/20220620155404.1968739-1-guoren@kernel.org/
> >  - Update comment with RISC-V forward guarantee feature.
> >  - Back to V3 direction and optimize asm code.
> >
> > V4:
> > https://lore.kernel.org/linux-riscv/1616868399-82848-4-git-send-email-guoren@kernel.org/
> >  - Remove custom sub-word xchg implementation
> >  - Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32 in locking/qspinlock
> >
> > V3:
> > https://lore.kernel.org/linux-riscv/1616658937-82063-1-git-send-email-guoren@kernel.org/
> >  - Coding convention by Peter Zijlstra's advices
> >
> > V2:
> > https://lore.kernel.org/linux-riscv/1606225437-22948-2-git-send-email-guoren@kernel.org/
> >  - Coding convention in cmpxchg.h
> >  - Re-implement short xchg
> >  - Remove char & cmpxchg implementations
> >
> > V1:
> > https://lore.kernel.org/linux-riscv/20190211043829.30096-1-michaeljclark@mac.com/
> >  - Using cmpxchg loop to implement sub-word atomic
> >
> >
> > Guo Ren (17):
> >   asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
> >   asm-generic: ticket-lock: Move into ticket_spinlock.h
> >   riscv: Use Zicbop in arch_xchg when available
> >   locking/qspinlock: Improve xchg_tail for number of cpus >= 16k
> >   riscv: qspinlock: Add basic queued_spinlock support
> >   riscv: qspinlock: Introduce combo spinlock
> >   riscv: qspinlock: Introduce qspinlock param for command line
> >   riscv: qspinlock: Add virt_spin_lock() support for KVM guest
> >   riscv: qspinlock: errata: Add ERRATA_THEAD_WRITE_ONCE fixup
> >   riscv: qspinlock: errata: Enable qspinlock for T-HEAD processors
> >   RISC-V: paravirt: pvqspinlock: Add paravirt qspinlock skeleton
> >   RISC-V: paravirt: pvqspinlock: Add nopvspin kernel parameter
> >   RISC-V: paravirt: pvqspinlock: Add SBI implementation
> >   RISC-V: paravirt: pvqspinlock: Add kconfig entry
> >   RISC-V: paravirt: pvqspinlock: Add trace point for pv_kick/wait
> >   RISC-V: paravirt: pvqspinlock: KVM: Add paravirt qspinlock skeleton
> >   RISC-V: paravirt: pvqspinlock: KVM: Implement
> >     kvm_sbi_ext_pvlock_kick_cpu()
> >
> >  .../admin-guide/kernel-parameters.txt         |   8 +-
> >  arch/riscv/Kconfig                            |  50 ++++++++
> >  arch/riscv/Kconfig.errata                     |  19 +++
> >  arch/riscv/errata/thead/errata.c              |  29 +++++
> >  arch/riscv/include/asm/Kbuild                 |   2 +-
> >  arch/riscv/include/asm/cmpxchg.h              |   4 +-
> >  arch/riscv/include/asm/errata_list.h          |  13 --
> >  arch/riscv/include/asm/hwcap.h                |   1 +
> >  arch/riscv/include/asm/insn-def.h             |   5 +
> >  arch/riscv/include/asm/kvm_vcpu_sbi.h         |   1 +
> >  arch/riscv/include/asm/processor.h            |  13 ++
> >  arch/riscv/include/asm/qspinlock.h            |  35 ++++++
> >  arch/riscv/include/asm/qspinlock_paravirt.h   |  29 +++++
> >  arch/riscv/include/asm/rwonce.h               |  24 ++++
> >  arch/riscv/include/asm/sbi.h                  |  14 +++
> >  arch/riscv/include/asm/spinlock.h             | 113 ++++++++++++++++++
> >  arch/riscv/include/asm/vendorid_list.h        |  14 +++
> >  arch/riscv/include/uapi/asm/kvm.h             |   1 +
> >  arch/riscv/kernel/Makefile                    |   1 +
> >  arch/riscv/kernel/cpufeature.c                |   1 +
> >  arch/riscv/kernel/qspinlock_paravirt.c        |  83 +++++++++++++
> >  arch/riscv/kernel/sbi.c                       |   2 +-
> >  arch/riscv/kernel/setup.c                     |  60 ++++++++++
> >  .../kernel/trace_events_filter_paravirt.h     |  60 ++++++++++
> >  arch/riscv/kvm/Makefile                       |   1 +
> >  arch/riscv/kvm/vcpu_sbi.c                     |   4 +
> >  arch/riscv/kvm/vcpu_sbi_pvlock.c              |  57 +++++++++
> >  include/asm-generic/rwonce.h                  |   2 +
> >  include/asm-generic/spinlock.h                |  87 +-------------
> >  include/asm-generic/spinlock_types.h          |  12 +-
> >  include/asm-generic/ticket_spinlock.h         | 103 ++++++++++++++++
> >  kernel/locking/qspinlock.c                    |   5 +-
> >  32 files changed, 739 insertions(+), 114 deletions(-)
> >  create mode 100644 arch/riscv/include/asm/qspinlock.h
> >  create mode 100644 arch/riscv/include/asm/qspinlock_paravirt.h
> >  create mode 100644 arch/riscv/include/asm/rwonce.h
> >  create mode 100644 arch/riscv/include/asm/spinlock.h
> >  create mode 100644 arch/riscv/kernel/qspinlock_paravirt.c
> >  create mode 100644 arch/riscv/kernel/trace_events_filter_paravirt.h
> >  create mode 100644 arch/riscv/kvm/vcpu_sbi_pvlock.c
> >  create mode 100644 include/asm-generic/ticket_spinlock.h
> >
> > --
> > 2.36.1
> >
>


-- 
Best Regards
 Guo Ren

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V11 00/17] riscv: Add Native/Paravirt qspinlock support
  2023-11-12  4:23   ` [PATCH V11 00/17] riscv: Add Native/Paravirt qspinlock support Guo Ren
@ 2023-11-13 10:19     ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 9+ messages in thread
From: Leonardo Bras Soares Passos @ 2023-11-13 10:19 UTC (permalink / raw)
  To: Guo Ren
  Cc: paul.walmsley, anup, peterz, mingo, will, palmer, longman,
	boqun.feng, tglx, paulmck, rostedt, rdunlap, catalin.marinas,
	conor.dooley, xiaoguang.xing, bjorn, alexghiti, keescook,
	greentime.hu, ajones, jszhang, wefu, wuwei2016, linux-arch,
	linux-riscv, linux-doc, kvm, virtualization, linux-csky, Guo Ren

On Sun, Nov 12, 2023 at 1:24 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Mon, Nov 6, 2023 at 3:42 PM Leonardo Bras <leobras@redhat.com> wrote:
> >
> > On Sun, Sep 10, 2023 at 04:28:54AM -0400, guoren@kernel.org wrote:
> > > From: Guo Ren <guoren@linux.alibaba.com>
> > >
> > > patch[1 - 10]: Native   qspinlock
> > > patch[11 -17]: Paravirt qspinlock
> > >
> > > patch[4]: Add prefetchw in qspinlock's xchg_tail when cpus >= 16k
> > >
> > > This series based on:
> > >  - [RFC PATCH v5 0/5] Rework & improve riscv cmpxchg.h and atomic.h
> > >    https://lore.kernel.org/linux-riscv/20230810040349.92279-2-leobras@redhat.com/
> > >  - [PATCH V3] asm-generic: ticket-lock: Optimize arch_spin_value_unlocked
> > >    https://lore.kernel.org/linux-riscv/20230908154339.3250567-1-guoren@kernel.org/
> > >
> > > I merge them into sg2042-master branch, then you could directly try it on
> > > sg2042 hardware platform:
> > >
> > > https://github.com/guoren83/linux/tree/sg2042-master-qspinlock-64ilp32_v5
> > >
> > > Use sophgo_mango_ubuntu_defconfig for sg2042 64/128 cores hardware
> > > platform.
> > >
> > > Native qspinlock
> > > ================
> > >
> > > This time we've proved the qspinlock on th1520 [1] & sg2042 [2], which
> > > gives stability and performance improvement. All T-HEAD processors have
> > > a strong LR/SC forward progress guarantee than the requirements of the
> > > ISA, which could satisfy the xchg_tail of native_qspinlock. Now,
> > > qspinlock has been run with us for more than 1 year, and we have enough
> > > confidence to enable it for all the T-HEAD processors. Of causes, we
> > > found a livelock problem with the qspinlock lock torture test from the
> > > CPU store merge buffer delay mechanism, which caused the queued spinlock
> > > becomes a dead ring and RCU warning to come out. We introduce a custom
> > > WRITE_ONCE to solve this. Do we need explicit ISA instruction to signal
> > > it? Or let hardware handle this.
> > >
> > > We've tested the patch on SOPHGO sg2042 & th1520 and passed the stress
> > > test on Fedora & Ubuntu & OpenEuler ... Here is the performance
> > > comparison between qspinlock and ticket_lock on sg2042 (64 cores):
> > >
> > > sysbench test=threads threads=32 yields=100 lock=8 (+13.8%):
> > >   queued_spinlock 0.5109/0.00
> > >   ticket_spinlock 0.5814/0.00
> > >
> > > perf futex/hash (+6.7%):
> > >   queued_spinlock 1444393 operations/sec (+- 0.09%)
> > >   ticket_spinlock 1353215 operations/sec (+- 0.15%)
> > >
> > > perf futex/wake-parallel (+8.6%):
> > >   queued_spinlock (waking 1/64 threads) in 0.0253 ms (+-2.90%)
> > >   ticket_spinlock (waking 1/64 threads) in 0.0275 ms (+-3.12%)
> > >
> > > perf futex/requeue (+4.2%):
> > >   queued_spinlock Requeued 64 of 64 threads in 0.0785 ms (+-0.55%)
> > >   ticket_spinlock Requeued 64 of 64 threads in 0.0818 ms (+-4.12%)
> > >
> > > System Benchmarks (+6.4%)
> > >   queued_spinlock:
> > >     System Benchmarks Index Values               BASELINE       RESULT    INDEX
> > >     Dhrystone 2 using register variables         116700.0  628613745.4  53865.8
> > >     Double-Precision Whetstone                       55.0     182422.8  33167.8
> > >     Execl Throughput                                 43.0      13116.6   3050.4
> > >     File Copy 1024 bufsize 2000 maxblocks          3960.0    7762306.2  19601.8
> > >     File Copy 256 bufsize 500 maxblocks            1655.0    3417556.8  20649.9
> > >     File Copy 4096 bufsize 8000 maxblocks          5800.0    7427995.7  12806.9
> > >     Pipe Throughput                               12440.0   23058600.5  18535.9
> > >     Pipe-based Context Switching                   4000.0    2835617.7   7089.0
> > >     Process Creation                                126.0      12537.3    995.0
> > >     Shell Scripts (1 concurrent)                     42.4      57057.4  13456.9
> > >     Shell Scripts (8 concurrent)                      6.0       7367.1  12278.5
> > >     System Call Overhead                          15000.0   33308301.3  22205.5
> > >                                                                        ========
> > >     System Benchmarks Index Score                                       12426.1
> > >
> > >   ticket_spinlock:
> > >     System Benchmarks Index Values               BASELINE       RESULT    INDEX
> > >     Dhrystone 2 using register variables         116700.0  626541701.9  53688.2
> > >     Double-Precision Whetstone                       55.0     181921.0  33076.5
> > >     Execl Throughput                                 43.0      12625.1   2936.1
> > >     File Copy 1024 bufsize 2000 maxblocks          3960.0    6553792.9  16550.0
> > >     File Copy 256 bufsize 500 maxblocks            1655.0    3189231.6  19270.3
> > >     File Copy 4096 bufsize 8000 maxblocks          5800.0    7221277.0  12450.5
> > >     Pipe Throughput                               12440.0   20594018.7  16554.7
> > >     Pipe-based Context Switching                   4000.0    2571117.7   6427.8
> > >     Process Creation                                126.0      10798.4    857.0
> > >     Shell Scripts (1 concurrent)                     42.4      57227.5  13497.1
> > >     Shell Scripts (8 concurrent)                      6.0       7329.2  12215.3
> > >     System Call Overhead                          15000.0   30766778.4  20511.2
> > >                                                                        ========
> > >     System Benchmarks Index Score                                       11670.7
> > >
> > > The qspinlock has a significant improvement on SOPHGO SG2042 64
> > > cores platform than the ticket_lock.
> > >
> > > Paravirt qspinlock
> > > ==================
> > >
> > > We implemented kvm_kick_cpu/kvm_wait_cpu and add tracepoints to observe the
> > > behaviors. Also, introduce a new SBI extension SBI_EXT_PVLOCK (0xAB0401). If the
> > > name and number are approved, I will send a formal proposal to the SBI spec.
> > >
> >
> > Hello Guo Ren,
> >
> > Any update on this series?
> Found a nested virtualization problem, and I'm solving that. After
> that, I'll update v12.

Oh, nice to hear :)
I am very excited about this series, please let me know of any update.

Thanks!
Leo

>
> >
> > Thanks!
> > Leo
> >
> >
> > > Changlog:
> > > V11:
> > >  - Based on Leonardo Bras's cmpxchg_small patches v5.
> > >  - Based on Guo Ren's Optimize arch_spin_value_unlocked patch v3.
> > >  - Remove abusing alternative framework and use jump_label instead.
> > >  - Introduce prefetch.w to improve T-HEAD processors' LR/SC forward progress
> > >    guarantee.
> > >  - Optimize qspinlock xchg_tail when NR_CPUS >= 16K.
> > >
> > > V10:
> > > https://lore.kernel.org/linux-riscv/20230802164701.192791-1-guoren@kernel.org/
> > >  - Using an alternative framework instead of static_key_branch in the
> > >    asm/spinlock.h.
> > >  - Fixup store merge buffer problem, which causes qspinlock lock
> > >    torture test livelock.
> > >  - Add paravirt qspinlock support, include KVM backend
> > >  - Add Compact NUMA-awared qspinlock support
> > >
> > > V9:
> > > https://lore.kernel.org/linux-riscv/20220808071318.3335746-1-guoren@kernel.org/
> > >  - Cleanup generic ticket-lock code, (Using smp_mb__after_spinlock as
> > >    RCsc)
> > >  - Add qspinlock and combo-lock for riscv
> > >  - Add qspinlock to openrisc
> > >  - Use generic header in csky
> > >  - Optimize cmpxchg & atomic code
> > >
> > > V8:
> > > https://lore.kernel.org/linux-riscv/20220724122517.1019187-1-guoren@kernel.org/
> > >  - Coding convention ticket fixup
> > >  - Move combo spinlock into riscv and simply asm-generic/spinlock.h
> > >  - Fixup xchg16 with wrong return value
> > >  - Add csky qspinlock
> > >  - Add combo & qspinlock & ticket-lock comparison
> > >  - Clean up unnecessary riscv acquire and release definitions
> > >  - Enable ARCH_INLINE_READ*/WRITE*/SPIN* for riscv & csky
> > >
> > > V7:
> > > https://lore.kernel.org/linux-riscv/20220628081946.1999419-1-guoren@kernel.org/
> > >  - Add combo spinlock (ticket & queued) support
> > >  - Rename ticket_spinlock.h
> > >  - Remove unnecessary atomic_read in ticket_spin_value_unlocked
> > >
> > > V6:
> > > https://lore.kernel.org/linux-riscv/20220621144920.2945595-1-guoren@kernel.org/
> > >  - Fixup Clang compile problem Reported-by: kernel test robot
> > >  - Cleanup asm-generic/spinlock.h
> > >  - Remove changelog in patch main comment part, suggested by
> > >    Conor.Dooley
> > >  - Remove "default y if NUMA" in Kconfig
> > >
> > > V5:
> > > https://lore.kernel.org/linux-riscv/20220620155404.1968739-1-guoren@kernel.org/
> > >  - Update comment with RISC-V forward guarantee feature.
> > >  - Back to V3 direction and optimize asm code.
> > >
> > > V4:
> > > https://lore.kernel.org/linux-riscv/1616868399-82848-4-git-send-email-guoren@kernel.org/
> > >  - Remove custom sub-word xchg implementation
> > >  - Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32 in locking/qspinlock
> > >
> > > V3:
> > > https://lore.kernel.org/linux-riscv/1616658937-82063-1-git-send-email-guoren@kernel.org/
> > >  - Coding convention by Peter Zijlstra's advices
> > >
> > > V2:
> > > https://lore.kernel.org/linux-riscv/1606225437-22948-2-git-send-email-guoren@kernel.org/
> > >  - Coding convention in cmpxchg.h
> > >  - Re-implement short xchg
> > >  - Remove char & cmpxchg implementations
> > >
> > > V1:
> > > https://lore.kernel.org/linux-riscv/20190211043829.30096-1-michaeljclark@mac.com/
> > >  - Using cmpxchg loop to implement sub-word atomic
> > >
> > >
> > > Guo Ren (17):
> > >   asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock
> > >   asm-generic: ticket-lock: Move into ticket_spinlock.h
> > >   riscv: Use Zicbop in arch_xchg when available
> > >   locking/qspinlock: Improve xchg_tail for number of cpus >= 16k
> > >   riscv: qspinlock: Add basic queued_spinlock support
> > >   riscv: qspinlock: Introduce combo spinlock
> > >   riscv: qspinlock: Introduce qspinlock param for command line
> > >   riscv: qspinlock: Add virt_spin_lock() support for KVM guest
> > >   riscv: qspinlock: errata: Add ERRATA_THEAD_WRITE_ONCE fixup
> > >   riscv: qspinlock: errata: Enable qspinlock for T-HEAD processors
> > >   RISC-V: paravirt: pvqspinlock: Add paravirt qspinlock skeleton
> > >   RISC-V: paravirt: pvqspinlock: Add nopvspin kernel parameter
> > >   RISC-V: paravirt: pvqspinlock: Add SBI implementation
> > >   RISC-V: paravirt: pvqspinlock: Add kconfig entry
> > >   RISC-V: paravirt: pvqspinlock: Add trace point for pv_kick/wait
> > >   RISC-V: paravirt: pvqspinlock: KVM: Add paravirt qspinlock skeleton
> > >   RISC-V: paravirt: pvqspinlock: KVM: Implement
> > >     kvm_sbi_ext_pvlock_kick_cpu()
> > >
> > >  .../admin-guide/kernel-parameters.txt         |   8 +-
> > >  arch/riscv/Kconfig                            |  50 ++++++++
> > >  arch/riscv/Kconfig.errata                     |  19 +++
> > >  arch/riscv/errata/thead/errata.c              |  29 +++++
> > >  arch/riscv/include/asm/Kbuild                 |   2 +-
> > >  arch/riscv/include/asm/cmpxchg.h              |   4 +-
> > >  arch/riscv/include/asm/errata_list.h          |  13 --
> > >  arch/riscv/include/asm/hwcap.h                |   1 +
> > >  arch/riscv/include/asm/insn-def.h             |   5 +
> > >  arch/riscv/include/asm/kvm_vcpu_sbi.h         |   1 +
> > >  arch/riscv/include/asm/processor.h            |  13 ++
> > >  arch/riscv/include/asm/qspinlock.h            |  35 ++++++
> > >  arch/riscv/include/asm/qspinlock_paravirt.h   |  29 +++++
> > >  arch/riscv/include/asm/rwonce.h               |  24 ++++
> > >  arch/riscv/include/asm/sbi.h                  |  14 +++
> > >  arch/riscv/include/asm/spinlock.h             | 113 ++++++++++++++++++
> > >  arch/riscv/include/asm/vendorid_list.h        |  14 +++
> > >  arch/riscv/include/uapi/asm/kvm.h             |   1 +
> > >  arch/riscv/kernel/Makefile                    |   1 +
> > >  arch/riscv/kernel/cpufeature.c                |   1 +
> > >  arch/riscv/kernel/qspinlock_paravirt.c        |  83 +++++++++++++
> > >  arch/riscv/kernel/sbi.c                       |   2 +-
> > >  arch/riscv/kernel/setup.c                     |  60 ++++++++++
> > >  .../kernel/trace_events_filter_paravirt.h     |  60 ++++++++++
> > >  arch/riscv/kvm/Makefile                       |   1 +
> > >  arch/riscv/kvm/vcpu_sbi.c                     |   4 +
> > >  arch/riscv/kvm/vcpu_sbi_pvlock.c              |  57 +++++++++
> > >  include/asm-generic/rwonce.h                  |   2 +
> > >  include/asm-generic/spinlock.h                |  87 +-------------
> > >  include/asm-generic/spinlock_types.h          |  12 +-
> > >  include/asm-generic/ticket_spinlock.h         | 103 ++++++++++++++++
> > >  kernel/locking/qspinlock.c                    |   5 +-
> > >  32 files changed, 739 insertions(+), 114 deletions(-)
> > >  create mode 100644 arch/riscv/include/asm/qspinlock.h
> > >  create mode 100644 arch/riscv/include/asm/qspinlock_paravirt.h
> > >  create mode 100644 arch/riscv/include/asm/rwonce.h
> > >  create mode 100644 arch/riscv/include/asm/spinlock.h
> > >  create mode 100644 arch/riscv/kernel/qspinlock_paravirt.c
> > >  create mode 100644 arch/riscv/kernel/trace_events_filter_paravirt.h
> > >  create mode 100644 arch/riscv/kvm/vcpu_sbi_pvlock.c
> > >  create mode 100644 include/asm-generic/ticket_spinlock.h
> > >
> > > --
> > > 2.36.1
> > >
> >
>
>
> --
> Best Regards
>  Guo Ren
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V11 03/17] riscv: Use Zicbop in arch_xchg when available
       [not found] <20230910082911.3378782-1-guoren@kernel.org>
                   ` (2 preceding siblings ...)
       [not found] ` <ZUlPwQVG4OTkighB@redhat.com>
@ 2023-12-31  8:29 ` guoren
  3 siblings, 0 replies; 9+ messages in thread
From: guoren @ 2023-12-31  8:29 UTC (permalink / raw)
  To: paul.walmsley, palmer, guoren, panqinglin2020, bjorn,
	conor.dooley, leobras, peterz, keescook, wuwei2016,
	xiaoguang.xing, chao.wei, unicorn_wang, uwu, jszhang, wefu,
	atishp, ajones, anup, mingo, will, palmer, longman, boqun.feng,
	tglx, paulmck, rostedt, rdunlap, catalin.marinas, alexghiti,
	greentime.hu
  Cc: linux-riscv, linux-kernel, linux-arch, linux-doc, kvm,
	virtualization, linux-csky, Guo Ren

From: Guo Ren <guoren@linux.alibaba.com>

Cache-block prefetch instructions are HINTs to the hardware to
indicate that software intends to perform a particular type of
memory access in the near future. Enable ARCH_HAS_PREFETCHW and
improve the arch_xchg for qspinlock xchg_tail.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 arch/riscv/Kconfig                 | 15 +++++++++++++++
 arch/riscv/include/asm/cmpxchg.h   |  4 +++-
 arch/riscv/include/asm/hwcap.h     |  1 +
 arch/riscv/include/asm/insn-def.h  |  5 +++++
 arch/riscv/include/asm/processor.h | 13 +++++++++++++
 arch/riscv/kernel/cpufeature.c     |  1 +
 6 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e9ae6fa232c3..2c346fe169c1 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -617,6 +617,21 @@ config RISCV_ISA_ZICBOZ
 
 	   If you don't know what to do here, say Y.
 
+config RISCV_ISA_ZICBOP
+	bool "Zicbop extension support for cache block prefetch"
+	depends on MMU
+	depends on RISCV_ALTERNATIVE
+	default y
+	help
+	   Adds support to dynamically detect the presence of the ZICBOP
+	   extension (Cache Block Prefetch Operations) and enable its
+	   usage.
+
+	   The Zicbop extension can be used to prefetch cache block for
+	   read/write/instruction fetch.
+
+	   If you don't know what to do here, say Y.
+
 config TOOLCHAIN_HAS_ZIHINTPAUSE
 	bool
 	default y
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
index 702725727671..56eff7a9d2d2 100644
--- a/arch/riscv/include/asm/cmpxchg.h
+++ b/arch/riscv/include/asm/cmpxchg.h
@@ -11,6 +11,7 @@
 
 #include <asm/barrier.h>
 #include <asm/fence.h>
+#include <asm/processor.h>
 
 #define __arch_xchg_masked(prepend, append, r, p, n)			\
 ({									\
@@ -25,6 +26,7 @@
 									\
 	__asm__ __volatile__ (						\
 	       prepend							\
+	       PREFETCHW_ASM(%5)					\
 	       "0:	lr.w %0, %2\n"					\
 	       "	and  %1, %0, %z4\n"				\
 	       "	or   %1, %1, %z3\n"				\
@@ -32,7 +34,7 @@
 	       "	bnez %1, 0b\n"					\
 	       append							\
 	       : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b))	\
-	       : "rJ" (__newx), "rJ" (~__mask)				\
+	       : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b)		\
 	       : "memory");						\
 									\
 	r = (__typeof__(*(p)))((__retx & __mask) >> __s);		\
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index b7b58258f6c7..78b7b8b53778 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -58,6 +58,7 @@
 #define RISCV_ISA_EXT_ZICSR		40
 #define RISCV_ISA_EXT_ZIFENCEI		41
 #define RISCV_ISA_EXT_ZIHPM		42
+#define RISCV_ISA_EXT_ZICBOP		43
 
 #define RISCV_ISA_EXT_MAX		64
 
diff --git a/arch/riscv/include/asm/insn-def.h b/arch/riscv/include/asm/insn-def.h
index 6960beb75f32..dc590d331894 100644
--- a/arch/riscv/include/asm/insn-def.h
+++ b/arch/riscv/include/asm/insn-def.h
@@ -134,6 +134,7 @@
 
 #define RV_OPCODE_MISC_MEM	RV_OPCODE(15)
 #define RV_OPCODE_SYSTEM	RV_OPCODE(115)
+#define RV_OPCODE_PREFETCH	RV_OPCODE(19)
 
 #define HFENCE_VVMA(vaddr, asid)				\
 	INSN_R(OPCODE_SYSTEM, FUNC3(0), FUNC7(17),		\
@@ -196,4 +197,8 @@
 	INSN_I(OPCODE_MISC_MEM, FUNC3(2), __RD(0),		\
 	       RS1(base), SIMM12(4))
 
+#define CBO_prefetchw(base)					\
+	INSN_R(OPCODE_PREFETCH, FUNC3(6), FUNC7(0),		\
+	       RD(x0), RS1(base), RS2(x0))
+
 #endif /* __ASM_INSN_DEF_H */
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index de9da852f78d..7ad3a24212e8 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -12,6 +12,8 @@
 #include <vdso/processor.h>
 
 #include <asm/ptrace.h>
+#include <asm/insn-def.h>
+#include <asm/hwcap.h>
 
 #ifdef CONFIG_64BIT
 #define DEFAULT_MAP_WINDOW	(UL(1) << (MMAP_VA_BITS - 1))
@@ -103,6 +105,17 @@ static inline void arch_thread_struct_whitelist(unsigned long *offset,
 #define KSTK_EIP(tsk)		(ulong)(task_pt_regs(tsk)->epc)
 #define KSTK_ESP(tsk)		(ulong)(task_pt_regs(tsk)->sp)
 
+#define ARCH_HAS_PREFETCHW
+#define PREFETCHW_ASM(base)	ALTERNATIVE(__nops(1), \
+					    CBO_prefetchw(base), \
+					    0, \
+					    RISCV_ISA_EXT_ZICBOP, \
+					    CONFIG_RISCV_ISA_ZICBOP)
+static inline void prefetchw(const void *ptr)
+{
+	asm volatile(PREFETCHW_ASM(%0)
+		: : "r" (ptr) : "memory");
+}
 
 /* Do necessary setup to start up a newly executed thread. */
 extern void start_thread(struct pt_regs *regs,
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index ef7b4fd9e876..e0b897db0b97 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -159,6 +159,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
 	__RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h),
 	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
 	__RISCV_ISA_EXT_DATA(zicboz, RISCV_ISA_EXT_ZICBOZ),
+	__RISCV_ISA_EXT_DATA(zicbop, RISCV_ISA_EXT_ZICBOP),
 	__RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR),
 	__RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
 	__RISCV_ISA_EXT_DATA(zifencei, RISCV_ISA_EXT_ZIFENCEI),
-- 
2.36.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-12-31  8:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230910082911.3378782-1-guoren@kernel.org>
     [not found] ` <20230910082911.3378782-5-guoren@kernel.org>
2023-09-11  2:35   ` [PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k Waiman Long
     [not found]     ` <CAJF2gTSbUUdLhN8PFdFzQd0M1T2MVOL1cdZn46WKq1S8MuQYHw@mail.gmail.com>
2023-09-11 13:03       ` Waiman Long
     [not found]         ` <CAJF2gTQ3Q7f+FGorVTR66c6TGWsSeeKVvLF+LH1_m3kSHrm0yA@mail.gmail.com>
     [not found]           ` <ZQF49GIZoFceUGYH@redhat.com>
     [not found]             ` <CAJF2gTTHdCr-FQVSGUc+LapkJPmDiEYYa_1P6T86uCjRujgnTg@mail.gmail.com>
2023-09-13 13:06               ` Waiman Long
     [not found] ` <20230910082911.3378782-8-guoren@kernel.org>
2023-09-11 15:22   ` [PATCH V11 07/17] riscv: qspinlock: Introduce qspinlock param for command line Waiman Long
2023-09-11 15:34   ` Waiman Long
     [not found]     ` <CAJF2gTT2hRxgnQt+WJ9P0YBWnUaZJ1-9g3ZE9tOz_MiLSsUjwQ@mail.gmail.com>
     [not found]       ` <ZQK2-CIL9U_QdMjh@redhat.com>
2023-09-14 17:23         ` Waiman Long
     [not found] ` <ZUlPwQVG4OTkighB@redhat.com>
2023-11-12  4:23   ` [PATCH V11 00/17] riscv: Add Native/Paravirt qspinlock support Guo Ren
2023-11-13 10:19     ` Leonardo Bras Soares Passos
2023-12-31  8:29 ` [PATCH V11 03/17] riscv: Use Zicbop in arch_xchg when available guoren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).