Re: rq lock contention due to commit af7f588d8f73

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Aaron Lu <aaron.lu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>, linux-kernel@vger.kernel.org
Subject: Re: rq lock contention due to commit af7f588d8f73
Date: Mon, 27 Mar 2023 15:57:43 -0400	[thread overview]
Message-ID: <fc66a0a9-aeb3-cc80-83fb-a5c02ee898ca@efficios.com> (raw)
In-Reply-To: <20230327140425.GA1090@ziqianlu-desk2>

On 2023-03-27 10:04, Aaron Lu wrote:
> On Mon, Mar 27, 2023 at 09:20:44AM -0400, Mathieu Desnoyers wrote:
>> On 2023-03-27 04:05, Aaron Lu wrote:
>>> Hi Mathieu,
>>>
>>> I was doing some optimization work[1] for kernel scheduler using a
>>> database workload: sysbench+postgres and before I submit my work, I
>>> rebased my patch on top of latest v6.3-rc kernels to see if everything
>>> still works expected and then I found rq's lock became very heavily
>>> contended as compared to v6.2 based kernels.
>>>
>>> Using the above mentioned workload, before commit af7f588d8f73("sched:
>>> Introduce per-memory-map concurrency ID"), the profile looked like:
>>>
>>>        7.30%     0.71%  [kernel.vmlinux]            [k] __schedule
>>>        0.03%     0.03%  [kernel.vmlinux]            [k] native_queued_spin_lock_slowpath
>>>
>>> After that commit:
>>>
>>>       49.01%     0.87%  [kernel.vmlinux]            [k] __schedule
>>>       43.20%    43.18%  [kernel.vmlinux]            [k] native_queued_spin_lock_slowpath
>>>
>>> The above profile was captured with sysbench's nr_threads set to 56; if
>>> I used more thread number, the contention would be more severe on that
>>> 2sockets/112core/224cpu Intel Sapphire Rapids server.
>>>
>>> The docker image I used to do optimization work is not available outside
>>> but I managed to reproduce this problem using only publicaly available
>>> stuffs, here it goes:
>>> 1 docker pull postgres
>>> 2 sudo docker run --rm --name postgres-instance -e POSTGRES_PASSWORD=mypass -e POSTGRES_USER=sbtest -d postgres -c shared_buffers=80MB -c max_connections=250
>>> 3 go inside the container
>>>     sudo docker exec -it $the_just_started_container_id bash
>>> 4 install sysbench inside container
>>>     sudo apt update and sudo apt install sysbench
>>> 5 prepare
>>>     root@container:/# sysbench --db-driver=pgsql --pgsql-user=sbtest --pgsql_password=mypass --pgsql-db=sbtest --pgsql-port=5432 --tables=16 --table-size=10000 --threads=56 --time=60 --report-interval=2 /usr/share/sysbench/oltp_read_only.lua prepare
>>> 6 run
>>>     root@container:/# sysbench --db-driver=pgsql --pgsql-user=sbtest --pgsql_password=mypass --pgsql-db=sbtest --pgsql-port=5432 --tables=16 --table-size=10000 --threads=56 --time=60 --report-interval=2 /usr/share/sysbench/oltp_read_only.lua run
>>>
>>> Let it warm up a little bit and after 10-20s you can do profile and see
>>> the increased rq lock contention. You may need a machine that has at
>>> least 56 cpus to see this, I didn't try on other machines.
>>>
>>> Feel free to let me know if you need any other info.
>>
>> While I setup my dev machine with this reproducer, here are a few
>> questions to help figure out the context:
>>
>> I understand that pgsql is a multi-process database. Is it strictly
>> single-threaded per-process, or does each process have more than
>> one thread ?
> 
> I do not know the details of Postgres, according to this:
> https://wiki.postgresql.org/wiki/FAQ#How_does_PostgreSQL_use_CPU_resources.3F
> I think it is single-threaded per-process.
> 
> The client, sysbench, is single process multi-threaded IIUC.
> 
>>
>> I understand that your workload is scheduling between threads which
>> belong to different processes. Are there more heavily active threads
>> than there are scheduler runqueues (CPUs) on your machine ?
> 
> In the reproducer I described above, 56 threads are started on the
> client side and if each client thread is served by a server process,
> there would be about 112 tasks. I don't think the client thread and
> the server process are active at the same time but even if they are,
> 112 is still smaller than the machine's CPU number: 224.
> 
>>
>> When I developed the mm_cid feature, I originally implemented two additional
>> optimizations:
>>
>>      Additional optimizations can be done if the spin locks added when
>>      context switching between threads belonging to different memory maps end
>>      up being a performance bottleneck. Those are left out of this patch
>>      though. A performance impact would have to be clearly demonstrated to
>>      justify the added complexity.
>>
>> I suspect that your workload demonstrates the need for at least one of those
>> optimizations. I just wonder if we are in a purely single-threaded scenario
>> for each process, or if each process has many threads.
> 
> My understanding is: the server side is single threaded and the client
> side is multi threaded.

OK.

I've just resuscitated my per-runqueue concurrency ID cache patch from an older
patchset, and posted it as RFC. So far it passed one round of rseq selftests. Can
you test it in your environment to see if I'm on the right track ?

https://lore.kernel.org/lkml/20230327195318.137094-1-mathieu.desnoyers@efficios.com/

Thanks!

Mathieu


> 
> Thanks,
> Aaron

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

next prev parent reply	other threads:[~2023-03-27 19:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-27  8:05 rq lock contention due to commit af7f588d8f73 Aaron Lu
2023-03-27  9:09 ` Peter Zijlstra
2023-03-27 10:14   ` Aaron Lu
2023-03-27 10:42   ` Aaron Lu
2023-03-27 13:20 ` Mathieu Desnoyers
2023-03-27 14:04   ` Aaron Lu
2023-03-27 14:11     ` Mathieu Desnoyers
2023-03-27 19:57     ` Mathieu Desnoyers [this message]
2023-03-28  6:58       ` Aaron Lu
2023-03-28 12:39         ` Mathieu Desnoyers
2023-03-28 13:07           ` Aaron Lu
2023-03-29  7:45           ` Aaron Lu
2023-03-29 18:07             ` Mathieu Desnoyers
2023-04-04  9:53 ` Linux regression tracking #adding (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc66a0a9-aeb3-cc80-83fb-a5c02ee898ca@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=aaron.lu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox