From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Swapnil Sapkal <Swapnil.Sapkal@amd.com>,
Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-tip-commits@vger.kernel.org,
Aaron Lu <aaron.lu@intel.com>,
x86@kernel.org, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [tip: sched/core] sched: Fix performance regression introduced by mm_cid
Date: Wed, 21 Jun 2023 14:51:42 -0400 [thread overview]
Message-ID: <a73761e4-b791-e9a2-a276-e1551628e33b@efficios.com> (raw)
In-Reply-To: <ddbd1564-8135-5bc3-72b4-afb7c6e9caba@amd.com>
On 6/21/23 12:36, Swapnil Sapkal wrote:
> Hello Mathieu,
>
[...]
>>
>> I suspect the regression is caused by the mm_count cache line bouncing.
>>
>> Please try with this additional patch applied:
>>
>> https://lore.kernel.org/lkml/20230515143536.114960-1-mathieu.desnoyers@efficios.com/
>
> Thanks for the suggestion. I tried out with the patch you suggested. I
> am seeing
> improvement in hackbench numbers with mm_count padding. But this is not
> matching
> with what we achieved through reverting the new mm_cid patch.
>
> Below are the results on the 1 Socket 4th Generation EPYC Processor (1 x
> 96C/192T):
>
> Threads:
>
> Test: Base (v6.4-rc1) Base + new_mmcid_reverted Base +
> mm_count_padding
> 1-groups: 5.23 (0.00 pct) 4.61 (11.85 pct) 5.11
> (2.29 pct)
> 2-groups: 4.99 (0.00 pct) 4.72 (5.41 pct) 5.00
> (-0.20 pct)
> 4-groups: 5.96 (0.00 pct) 4.87 (18.28 pct) 5.86
> (1.67 pct)
> 8-groups: 6.58 (0.00 pct) 5.44 (17.32 pct) 6.20
> (5.77 pct)
> 16-groups: 11.48 (0.00 pct) 8.07 (29.70 pct) 10.68
> (6.96 pct)
>
> Processes:
>
> Test: Base (v6.4-rc1) Base + new_mmcid_reverted Base +
> mm_count_padding
> 1-groups: 5.19 (0.00 pct) 4.90 (5.58 pct) 5.19
> (0.00 pct)
> 2-groups: 5.44 (0.00 pct) 5.39 (0.91 pct) 5.39
> (0.91 pct)
> 4-groups: 5.69 (0.00 pct) 5.64 (0.87 pct) 5.64
> (0.87 pct)
> 8-groups: 6.08 (0.00 pct) 6.01 (1.15 pct) 6.04
> (0.65 pct)
> 16-groups: 10.87 (0.00 pct) 10.83 (0.36 pct) 10.93
> (-0.55 pct)
>
> The ibs profile shows that function __switch_to_asm() is coming at top
> in baseline
> run and is not seen with mm_count padding patch. Will be attaching full
> ibs profile
> data for all the 3 runs:
>
> # Base (v6.4-rc1)
> Threads:
> Total time: 11.486 [sec]
>
> 5.15% sched-messaging [kernel.vmlinux] [k] __switch_to_asm
> 4.31% sched-messaging [kernel.vmlinux] [k] copyout
> 4.29% sched-messaging [kernel.vmlinux] [k]
> native_queued_spin_lock_slowpath
> 4.22% sched-messaging [kernel.vmlinux] [k] copyin
> 3.92% sched-messaging [kernel.vmlinux] [k]
> apparmor_file_permission
> 2.91% sched-messaging [kernel.vmlinux] [k] __schedule
> 2.34% swapper [kernel.vmlinux] [k] __switch_to_asm
> 2.10% sched-messaging [kernel.vmlinux] [k] prepare_to_wait_event
> 2.10% sched-messaging [kernel.vmlinux] [k] try_to_wake_up
> 2.07% sched-messaging [kernel.vmlinux] [k]
> finish_task_switch.isra.0
> 2.00% sched-messaging [kernel.vmlinux] [k] pipe_write
> 1.82% sched-messaging [kernel.vmlinux] [k]
> check_preemption_disabled
> 1.73% sched-messaging [kernel.vmlinux] [k]
> exit_to_user_mode_prepare
> 1.52% sched-messaging [kernel.vmlinux] [k] __entry_text_start
> 1.49% sched-messaging [kernel.vmlinux] [k] osq_lock
> 1.45% sched-messaging libc.so.6 [.] write
> 1.44% swapper [kernel.vmlinux] [k] native_sched_clock
> 1.38% sched-messaging [kernel.vmlinux] [k] psi_group_change
> 1.38% sched-messaging [kernel.vmlinux] [k] pipe_read
> 1.37% sched-messaging libc.so.6 [.] read
> 1.06% sched-messaging [kernel.vmlinux] [k] vfs_read
> 1.01% swapper [kernel.vmlinux] [k] psi_group_change
> 1.00% sched-messaging [kernel.vmlinux] [k] update_curr
>
> # Base + mm_count_padding
> Threads:
> Total time: 11.384 [sec]
>
> 4.43% sched-messaging [kernel.vmlinux] [k] copyin
> 4.39% sched-messaging [kernel.vmlinux] [k]
> native_queued_spin_lock_slowpath
> 4.07% sched-messaging [kernel.vmlinux] [k]
> apparmor_file_permission
> 4.07% sched-messaging [kernel.vmlinux] [k] copyout
> 2.49% sched-messaging [kernel.vmlinux] [k] entry_SYSCALL_64
> 2.37% sched-messaging [kernel.vmlinux] [k] update_cfs_group
> 2.19% sched-messaging [kernel.vmlinux] [k] pipe_write
> 2.00% sched-messaging [kernel.vmlinux] [k]
> check_preemption_disabled
> 1.93% swapper [kernel.vmlinux] [k] update_load_avg
> 1.81% sched-messaging [kernel.vmlinux] [k]
> exit_to_user_mode_prepare
> 1.69% sched-messaging [kernel.vmlinux] [k] try_to_wake_up
> 1.58% sched-messaging libc.so.6 [.] write
> 1.53% sched-messaging [kernel.vmlinux] [k] psi_group_change
> 1.50% sched-messaging libc.so.6 [.] read
> 1.50% sched-messaging [kernel.vmlinux] [k] pipe_read
> 1.39% sched-messaging [kernel.vmlinux] [k] update_load_avg
> 1.39% sched-messaging [kernel.vmlinux] [k] osq_lock
> 1.30% sched-messaging [kernel.vmlinux] [k] update_curr
> 1.28% swapper [kernel.vmlinux] [k] psi_group_change
> 1.16% sched-messaging [kernel.vmlinux] [k] vfs_read
> 1.12% sched-messaging [kernel.vmlinux] [k] vfs_write
> 1.10% sched-messaging [kernel.vmlinux] [k]
> entry_SYSRETQ_unsafe_stack
> 1.09% sched-messaging [kernel.vmlinux] [k] __switch_to_asm
> 1.08% sched-messaging [kernel.vmlinux] [k] do_syscall_64
> 1.06% sched-messaging [kernel.vmlinux] [k]
> select_task_rq_fair
> 1.03% swapper [kernel.vmlinux] [k] update_cfs_group
> 1.00% swapper [kernel.vmlinux] [k] rb_insert_color
>
> # Base + reverted_new_mm_cid
> Threads:
> Total time: 7.847 [sec]
>
> 12.14% sched-messaging [kernel.vmlinux] [k]
> native_queued_spin_lock_slowpath
> 8.86% swapper [kernel.vmlinux] [k]
> native_queued_spin_lock_slowpath
> 6.13% sched-messaging [kernel.vmlinux] [k] copyin
> 5.54% sched-messaging [kernel.vmlinux] [k]
> apparmor_file_permission
> 3.59% sched-messaging [kernel.vmlinux] [k] copyout
> 2.61% sched-messaging [kernel.vmlinux] [k] osq_lock
> 2.48% sched-messaging [kernel.vmlinux] [k] pipe_write
> 2.33% sched-messaging [kernel.vmlinux] [k]
> exit_to_user_mode_prepare
> 2.01% sched-messaging [kernel.vmlinux] [k]
> check_preemption_disabled
> 1.96% sched-messaging [kernel.vmlinux] [k] __entry_text_start
> 1.91% sched-messaging libc.so.6 [.] write
> 1.77% sched-messaging libc.so.6 [.] read
> 1.64% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner
> 1.58% sched-messaging [kernel.vmlinux] [k] pipe_read
> 1.52% sched-messaging [kernel.vmlinux] [k] try_to_wake_up
> 1.38% sched-messaging [kernel.vmlinux] [k]
> ktime_get_coarse_real_ts64
> 1.35% sched-messaging [kernel.vmlinux] [k] vfs_write
> 1.28% sched-messaging [kernel.vmlinux] [k]
> entry_SYSRETQ_unsafe_stack
> 1.28% sched-messaging [kernel.vmlinux] [k] vfs_read
> 1.25% sched-messaging [kernel.vmlinux] [k] do_syscall_64
> 1.22% sched-messaging [kernel.vmlinux] [k] __fget_light
> 1.18% sched-messaging [kernel.vmlinux] [k] mutex_lock
> 1.12% sched-messaging [kernel.vmlinux] [k] file_update_time
> 1.04% sched-messaging [kernel.vmlinux] [k] _copy_from_iter
> 1.01% sched-messaging [kernel.vmlinux] [k] current_time
>
> So with the reverted new_mm_cid patch, we are seeing a lot of time being
> spent in
> native_queued_spin_lock_slowpath and yet, hackbench finishes faster.
>
> I keep further digging into this please let me know if you have any
> pointers for me.
Do you have CONFIG_SECURITY_APPARMOR=y ? Can you try without ?
I notice that apparmor_file_permission appears near the top of your
profiles, and apparmor uses an internal aa_buffers_lock spinlock,
which could possibly explain the top hits for
native_queued_spin_lock_slowpath. My current suspicion is that
the raw spinlock that was taken by "Base + reverted_new_mm_cid"
changed the contention pattern on the apparmor lock enough to
speed things up by pure accident.
Thanks,
Mathieu
>
>>
>> This patch has recently been merged into the mm tree.
>>
>> Thanks,
>>
>> Mathieu
>>
> --
> Thanks and Regards,
> Swapnil
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
next prev parent reply other threads:[~2023-06-21 18:51 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-22 7:43 [tip: sched/core] sched: Fix performance regression introduced by mm_cid tip-bot2 for Mathieu Desnoyers
2023-06-20 8:14 ` Swapnil Sapkal
2023-06-20 9:11 ` Peter Zijlstra
2023-06-20 10:35 ` Swapnil Sapkal
2023-06-20 10:51 ` Mathieu Desnoyers
2023-06-21 16:36 ` Swapnil Sapkal
2023-06-21 18:51 ` Mathieu Desnoyers [this message]
2023-06-21 21:41 ` Mathieu Desnoyers
2023-06-21 23:59 ` John Johansen
2023-06-22 14:33 ` Mathieu Desnoyers
2023-06-22 16:09 ` John Johansen
2023-06-23 6:52 ` Sebastian Andrzej Siewior
2023-06-23 6:37 ` Sebastian Andrzej Siewior
2023-06-23 7:16 ` John Johansen
2023-06-23 8:15 ` Sebastian Andrzej Siewior
2023-06-23 7:35 ` John Johansen
2023-06-23 8:17 ` Sebastian Andrzej Siewior
2023-07-14 6:02 ` Swapnil Sapkal
2023-07-14 14:55 ` Mathieu Desnoyers
2023-07-18 6:01 ` Swapnil Sapkal
2023-06-23 13:12 ` Linux regression tracking #adding (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a73761e4-b791-e9a2-a276-e1551628e33b@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=Swapnil.Sapkal@amd.com \
--cc=aaron.lu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox