Re: [tip: sched/core] sched: Fix performance regression introduced by mm_cid

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Swapnil Sapkal <Swapnil.Sapkal@amd.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-tip-commits@vger.kernel.org,
	Aaron Lu <aaron.lu@intel.com>,
	x86@kernel.org, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [tip: sched/core] sched: Fix performance regression introduced by mm_cid
Date: Wed, 21 Jun 2023 14:51:42 -0400	[thread overview]
Message-ID: <a73761e4-b791-e9a2-a276-e1551628e33b@efficios.com> (raw)
In-Reply-To: <ddbd1564-8135-5bc3-72b4-afb7c6e9caba@amd.com>

On 6/21/23 12:36, Swapnil Sapkal wrote:
> Hello Mathieu,
> 
[...]
>>
>> I suspect the regression is caused by the mm_count cache line bouncing.
>>
>> Please try with this additional patch applied:
>>
>> https://lore.kernel.org/lkml/20230515143536.114960-1-mathieu.desnoyers@efficios.com/
> 
> Thanks for the suggestion. I tried out with the patch you suggested. I 
> am seeing
> improvement in hackbench numbers with mm_count padding. But this is not 
> matching
> with what we achieved through reverting the new mm_cid patch.
> 
> Below are the results on the 1 Socket 4th Generation EPYC Processor (1 x 
> 96C/192T):
> 
> Threads:
> 
> Test:              Base (v6.4-rc1)   Base + new_mmcid_reverted  Base + 
> mm_count_padding
>   1-groups:         5.23 (0.00 pct)         4.61 (11.85 pct)        5.11 
> (2.29 pct)
>   2-groups:         4.99 (0.00 pct)         4.72 (5.41 pct)         5.00 
> (-0.20 pct)
>   4-groups:         5.96 (0.00 pct)         4.87 (18.28 pct)        5.86 
> (1.67 pct)
>   8-groups:         6.58 (0.00 pct)         5.44 (17.32 pct)        6.20 
> (5.77 pct)
> 16-groups:        11.48 (0.00 pct)         8.07 (29.70 pct)       10.68 
> (6.96 pct)
> 
> Processes:
> 
> Test:              Base (v6.4-rc1)  Base + new_mmcid_reverted   Base + 
> mm_count_padding
>   1-groups:         5.19 (0.00 pct)         4.90 (5.58 pct)         5.19 
> (0.00 pct)
>   2-groups:         5.44 (0.00 pct)         5.39 (0.91 pct)         5.39 
> (0.91 pct)
>   4-groups:         5.69 (0.00 pct)         5.64 (0.87 pct)         5.64 
> (0.87 pct)
>   8-groups:         6.08 (0.00 pct)         6.01 (1.15 pct)         6.04 
> (0.65 pct)
> 16-groups:        10.87 (0.00 pct)        10.83 (0.36 pct)        10.93 
> (-0.55 pct)
> 
> The ibs profile shows that function __switch_to_asm() is coming at top 
> in baseline
> run and is not seen with mm_count padding patch. Will be attaching full 
> ibs profile
> data for all the 3 runs:
> 
> # Base (v6.4-rc1)
> Threads:
> Total time: 11.486 [sec]
> 
>     5.15%  sched-messaging  [kernel.vmlinux]      [k] __switch_to_asm
>     4.31%  sched-messaging  [kernel.vmlinux]      [k] copyout
>     4.29%  sched-messaging  [kernel.vmlinux]      [k] 
> native_queued_spin_lock_slowpath
>     4.22%  sched-messaging  [kernel.vmlinux]      [k] copyin
>     3.92%  sched-messaging  [kernel.vmlinux]      [k] 
> apparmor_file_permission
>     2.91%  sched-messaging  [kernel.vmlinux]      [k] __schedule
>     2.34%  swapper          [kernel.vmlinux]      [k] __switch_to_asm
>     2.10%  sched-messaging  [kernel.vmlinux]      [k] prepare_to_wait_event
>     2.10%  sched-messaging  [kernel.vmlinux]      [k] try_to_wake_up
>     2.07%  sched-messaging  [kernel.vmlinux]      [k] 
> finish_task_switch.isra.0
>     2.00%  sched-messaging  [kernel.vmlinux]      [k] pipe_write
>     1.82%  sched-messaging  [kernel.vmlinux]      [k] 
> check_preemption_disabled
>     1.73%  sched-messaging  [kernel.vmlinux]      [k] 
> exit_to_user_mode_prepare
>     1.52%  sched-messaging  [kernel.vmlinux]      [k] __entry_text_start
>     1.49%  sched-messaging  [kernel.vmlinux]      [k] osq_lock
>     1.45%  sched-messaging  libc.so.6             [.] write
>     1.44%  swapper          [kernel.vmlinux]      [k] native_sched_clock
>     1.38%  sched-messaging  [kernel.vmlinux]      [k] psi_group_change
>     1.38%  sched-messaging  [kernel.vmlinux]      [k] pipe_read
>     1.37%  sched-messaging  libc.so.6             [.] read
>     1.06%  sched-messaging  [kernel.vmlinux]      [k] vfs_read
>     1.01%  swapper          [kernel.vmlinux]      [k] psi_group_change
>     1.00%  sched-messaging  [kernel.vmlinux]      [k] update_curr
> 
> # Base + mm_count_padding
> Threads:
> Total time: 11.384 [sec]
> 
>     4.43%  sched-messaging  [kernel.vmlinux]         [k] copyin
>     4.39%  sched-messaging  [kernel.vmlinux]         [k] 
> native_queued_spin_lock_slowpath
>     4.07%  sched-messaging  [kernel.vmlinux]         [k] 
> apparmor_file_permission
>     4.07%  sched-messaging  [kernel.vmlinux]         [k] copyout
>     2.49%  sched-messaging  [kernel.vmlinux]         [k] entry_SYSCALL_64
>     2.37%  sched-messaging  [kernel.vmlinux]         [k] update_cfs_group
>     2.19%  sched-messaging  [kernel.vmlinux]         [k] pipe_write
>     2.00%  sched-messaging  [kernel.vmlinux]         [k] 
> check_preemption_disabled
>     1.93%  swapper          [kernel.vmlinux]         [k] update_load_avg
>     1.81%  sched-messaging  [kernel.vmlinux]         [k] 
> exit_to_user_mode_prepare
>     1.69%  sched-messaging  [kernel.vmlinux]         [k] try_to_wake_up
>     1.58%  sched-messaging  libc.so.6                [.] write
>     1.53%  sched-messaging  [kernel.vmlinux]         [k] psi_group_change
>     1.50%  sched-messaging  libc.so.6                [.] read
>     1.50%  sched-messaging  [kernel.vmlinux]         [k] pipe_read
>     1.39%  sched-messaging  [kernel.vmlinux]         [k] update_load_avg
>     1.39%  sched-messaging  [kernel.vmlinux]         [k] osq_lock
>     1.30%  sched-messaging  [kernel.vmlinux]         [k] update_curr
>     1.28%  swapper          [kernel.vmlinux]         [k] psi_group_change
>     1.16%  sched-messaging  [kernel.vmlinux]         [k] vfs_read
>     1.12%  sched-messaging  [kernel.vmlinux]         [k] vfs_write
>     1.10%  sched-messaging  [kernel.vmlinux]         [k] 
> entry_SYSRETQ_unsafe_stack
>     1.09%  sched-messaging  [kernel.vmlinux]         [k] __switch_to_asm
>     1.08%  sched-messaging  [kernel.vmlinux]         [k] do_syscall_64
>     1.06%  sched-messaging  [kernel.vmlinux]         [k] 
> select_task_rq_fair
>     1.03%  swapper          [kernel.vmlinux]         [k] update_cfs_group
>     1.00%  swapper          [kernel.vmlinux]         [k] rb_insert_color
> 
> # Base + reverted_new_mm_cid
> Threads:
> Total time: 7.847 [sec]
> 
>    12.14%  sched-messaging  [kernel.vmlinux]      [k] 
> native_queued_spin_lock_slowpath
>     8.86%  swapper          [kernel.vmlinux]      [k] 
> native_queued_spin_lock_slowpath
>     6.13%  sched-messaging  [kernel.vmlinux]      [k] copyin
>     5.54%  sched-messaging  [kernel.vmlinux]      [k] 
> apparmor_file_permission
>     3.59%  sched-messaging  [kernel.vmlinux]      [k] copyout
>     2.61%  sched-messaging  [kernel.vmlinux]      [k] osq_lock
>     2.48%  sched-messaging  [kernel.vmlinux]      [k] pipe_write
>     2.33%  sched-messaging  [kernel.vmlinux]      [k] 
> exit_to_user_mode_prepare
>     2.01%  sched-messaging  [kernel.vmlinux]      [k] 
> check_preemption_disabled
>     1.96%  sched-messaging  [kernel.vmlinux]      [k] __entry_text_start
>     1.91%  sched-messaging  libc.so.6             [.] write
>     1.77%  sched-messaging  libc.so.6             [.] read
>     1.64%  sched-messaging  [kernel.vmlinux]      [k] mutex_spin_on_owner
>     1.58%  sched-messaging  [kernel.vmlinux]      [k] pipe_read
>     1.52%  sched-messaging  [kernel.vmlinux]      [k] try_to_wake_up
>     1.38%  sched-messaging  [kernel.vmlinux]      [k] 
> ktime_get_coarse_real_ts64
>     1.35%  sched-messaging  [kernel.vmlinux]      [k] vfs_write
>     1.28%  sched-messaging  [kernel.vmlinux]      [k] 
> entry_SYSRETQ_unsafe_stack
>     1.28%  sched-messaging  [kernel.vmlinux]      [k] vfs_read
>     1.25%  sched-messaging  [kernel.vmlinux]      [k] do_syscall_64
>     1.22%  sched-messaging  [kernel.vmlinux]      [k] __fget_light
>     1.18%  sched-messaging  [kernel.vmlinux]      [k] mutex_lock
>     1.12%  sched-messaging  [kernel.vmlinux]      [k] file_update_time
>     1.04%  sched-messaging  [kernel.vmlinux]      [k] _copy_from_iter
>     1.01%  sched-messaging  [kernel.vmlinux]      [k] current_time
> 
> So with the reverted new_mm_cid patch, we are seeing a lot of time being 
> spent in
> native_queued_spin_lock_slowpath and yet, hackbench finishes faster.
> 
> I keep further digging into this please let me know if you have any 
> pointers for me.

Do you have CONFIG_SECURITY_APPARMOR=y ? Can you try without ?

I notice that apparmor_file_permission appears near the top of your
profiles, and apparmor uses an internal aa_buffers_lock spinlock,
which could possibly explain the top hits for
native_queued_spin_lock_slowpath. My current suspicion is that
the raw spinlock that was taken by "Base + reverted_new_mm_cid"
changed the contention pattern on the apparmor lock enough to
speed things up by pure accident.

Thanks,

Mathieu


> 
>>
>> This patch has recently been merged into the mm tree.
>>
>> Thanks,
>>
>> Mathieu
>>
> -- 
> Thanks and Regards,
> Swapnil

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

next prev parent reply	other threads:[~2023-06-21 18:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-22  7:43 [tip: sched/core] sched: Fix performance regression introduced by mm_cid tip-bot2 for Mathieu Desnoyers
2023-06-20  8:14 ` Swapnil Sapkal
2023-06-20  9:11   ` Peter Zijlstra
2023-06-20 10:35     ` Swapnil Sapkal
2023-06-20 10:51       ` Mathieu Desnoyers
2023-06-21 16:36         ` Swapnil Sapkal
2023-06-21 18:51           ` Mathieu Desnoyers [this message]
2023-06-21 21:41             ` Mathieu Desnoyers
2023-06-21 23:59               ` John Johansen
2023-06-22 14:33                 ` Mathieu Desnoyers
2023-06-22 16:09                   ` John Johansen
2023-06-23  6:52                   ` Sebastian Andrzej Siewior
2023-06-23  6:37                 ` Sebastian Andrzej Siewior
2023-06-23  7:16                   ` John Johansen
2023-06-23  8:15                     ` Sebastian Andrzej Siewior
2023-06-23  7:35                   ` John Johansen
2023-06-23  8:17                     ` Sebastian Andrzej Siewior
2023-07-14  6:02             ` Swapnil Sapkal
2023-07-14 14:55               ` Mathieu Desnoyers
2023-07-18  6:01                 ` Swapnil Sapkal
2023-06-23 13:12   ` Linux regression tracking #adding (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a73761e4-b791-e9a2-a276-e1551628e33b@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=Swapnil.Sapkal@amd.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.