Re: [PATCH] locking/osq_lock: Optimize osq_lock performance using per-NUMA

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Waiman Long <longman@redhat.com>
To: Guo Hui <guohui@uniontech.com>,
	peterz@infradead.org, mingo@redhat.com, will@kernel.org,
	boqun.feng@gmail.com, David.Laight@ACULAB.COM
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] locking/osq_lock: Optimize osq_lock performance using per-NUMA
Date: Tue, 20 Feb 2024 13:16:26 -0500	[thread overview]
Message-ID: <03c1eb2e-4eae-49be-94cb-b90894cc00a9@redhat.com> (raw)
In-Reply-To: <20240220073058.6435-1-guohui@uniontech.com>


On 2/20/24 02:30, Guo Hui wrote:
> After extensive testing of osq_lock,
> we found that the performance of osq_lock is closely related to
> the distance between NUMA nodes.The greater the distance
> between NUMA nodes,the more serious the performance degradation of
> osq_lock.When a group of processes that need to compete for
> the same lock are on the same NUMA node,the performance of osq_lock
> is the best.when the group of processes is distributed on
> different NUMA nodes,as the distance between NUMA nodes increases,
> the performance of osq_lock becomes worse.
>
> This patch uses the following solutions to improve performance:
> Divide the osq_lock linked list according to NUMA nodes.
> Each NUMA node corresponds to an osq linked list.
> Each CPU is added to the linked list corresponding to
> its respective NUMA node.When the last CPU of
> the NUMA node releases osq_lock,osq_lock is passed to
> the next NUMA node.
>
> As shown in the figure below, the last osq_node1 on NUMA0 passes the lock
> to the first node (osq_node3) of the next NUMA1 node.
>
> -----------------------------------------------------------
> |            NUMA0           |            NUMA1           |
> |----------------------------|----------------------------|
> |  osq_node0 ---> osq_node1 -|-> osq_node3 ---> osq_node4 |
> -----------------------------|-----------------------------
>
> Set an atomic type global variable osq_lock_node to
> record the NUMA node number that can currently obtain
> the osq_lock lock.When the osq_lock_node value is
> a certain node number,the CPU on the node obtains
> the osq_lock lock in turn,and the CPUs on
> other NUMA nodes poll wait.
>
> This solution greatly reduces the performance degradation caused
> by communication between CPUs on different NUMA nodes.
>
> The effect on the 96-core 4-NUMA ARM64 platform is as follows:
> System Benchmarks Partial Index       with patch  without patch  promote
> File Copy 1024 bufsize 2000 maxblocks   2060.8      980.3        +110.22%
> File Copy 256 bufsize 500 maxblocks     1346.5      601.9        +123.71%
> File Copy 4096 bufsize 8000 maxblocks   4229.9      2216.1       +90.87%
>
> The effect on the 128-core 8-NUMA X86_64 platform is as follows:
> System Benchmarks Partial Index       with patch  without patch  promote
> File Copy 1024 bufsize 2000 maxblocks   841.1       553.7        +51.91%
> File Copy 256 bufsize 500 maxblocks     517.4       339.8        +52.27%
> File Copy 4096 bufsize 8000 maxblocks   2058.4      1392.8       +47.79%
That is similar in idea to the numa-aware qspinlock patch series.
> Signed-off-by: Guo Hui <guohui@uniontech.com>
> ---
>   include/linux/osq_lock.h  | 20 +++++++++++--
>   kernel/locking/osq_lock.c | 60 +++++++++++++++++++++++++++++++++------
>   2 files changed, 69 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
> index ea8fb31379e3..c016c1cf5e8b 100644
> --- a/include/linux/osq_lock.h
> +++ b/include/linux/osq_lock.h
> @@ -2,6 +2,8 @@
>   #ifndef __LINUX_OSQ_LOCK_H
>   #define __LINUX_OSQ_LOCK_H
>   
> +#include <linux/nodemask.h>
> +
>   /*
>    * An MCS like lock especially tailored for optimistic spinning for sleeping
>    * lock implementations (mutex, rwsem, etc).
> @@ -11,8 +13,9 @@ struct optimistic_spin_queue {
>   	/*
>   	 * Stores an encoded value of the CPU # of the tail node in the queue.
>   	 * If the queue is empty, then it's set to OSQ_UNLOCKED_VAL.
> +	 * The actual number of NUMA nodes is generally not greater than 32.
>   	 */
> -	atomic_t tail;
> +	atomic_t tail[32];

That is a no-go. You are increasing the size of a mutex/rwsem by 128 
bytes. If you want to enable this numa-awareness, you have to do it in a 
way without increasing the size of optimistic_spin_queue. My suggestion 
is to queue optimistic_spin_node in a numa-aware way in osq_lock.c 
without touching optimistic_spin_queue.

Cheers,
Longman

next prev parent reply	other threads:[~2024-02-20 18:16 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-20  7:30 [PATCH] locking/osq_lock: Optimize osq_lock performance using per-NUMA Guo Hui
2024-02-20 18:16 ` Waiman Long [this message]
2024-02-21  2:42   ` Guo Hui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03c1eb2e-4eae-49be-94cb-b90894cc00a9@redhat.com \
    --to=longman@redhat.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=boqun.feng@gmail.com \
    --cc=guohui@uniontech.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox