From: David Vernet <void@manifault.com>
To: Aaron Lu <aaron.lu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, mingo@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
rostedt@goodmis.org, dietmar.eggemann@arm.com,
bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
vschneid@redhat.com, joshdon@google.com,
roman.gushchin@linux.dev, tj@kernel.org, kernel-team@meta.com
Subject: Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS
Date: Tue, 20 Jun 2023 12:36:26 -0500 [thread overview]
Message-ID: <20230620173626.GA3027191@maniforge> (raw)
In-Reply-To: <20230616005338.GA115001@ziqianlu-dell>
On Fri, Jun 16, 2023 at 08:53:38AM +0800, Aaron Lu wrote:
> On Thu, Jun 15, 2023 at 06:26:05PM -0500, David Vernet wrote:
>
> > Ok, it seems that the issue is that I wasn't creating enough netperf
> > clients. I assumed that -n $(nproc) was sufficient. I was able to repro
>
> Yes that switch is confusing.
>
> > the contention on my 26 core / 52 thread skylake client as well:
> >
> >
>
> > Thanks for the help in getting the repro on my end.
>
> You are welcome.
>
> > So yes, there is certainly a scalability concern to bear in mind for
> > swqueue for LLCs with a lot of cores. If you have a lot of tasks quickly
> > e.g. blocking and waking on futexes in a tight loop, I expect a similar
> > issue would be observed.
> >
> > On the other hand, the issue did not occur on my 7950X. I also wasn't
>
> Using netperf/UDP_RR?
Correct
> > able to repro the contention on the Skylake if I ran with the default
> > netperf workload rather than UDP_RR (even with the additional clients).
>
> I also tried that on the 18cores/36threads/LLC Skylake and the contention
> is indeed much smaller than UDP_RR:
>
> 7.30% 7.29% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
>
> But I wouldn't say it's entirely gone. Also consider Skylake has a lot
> fewer cores per LLC than later Intel servers like Icelake and Sapphire
> Rapids and I expect things would be worse on those two machines.
I cannot reproduce this contention locally, even on a slightly larger
Skylake. Not really sure what to make of the difference here. Perhaps
it's because you're running with CONFIG_SCHED_CORE=y? What is the
change in throughput when you run the default workload on your SKL?
> > I didn't bother to take the mean of all of the throughput results
> > between NO_SWQUEUE and SWQUEUE, but they looked roughly equal.
> >
> > So swqueue isn't ideal for every configuration, but I'll echo my
> > sentiment from [0] that this shouldn't on its own necessarily preclude
> > it from being merged given that it does help a large class of
> > configurations and workloads, and it's disabled by default.
> >
> > [0]: https://lore.kernel.org/all/20230615000103.GC2883716@maniforge/
>
> I was wondering: does it make sense to do some divide on machines with
> big LLCs? Like converting the per-LLC swqueue to per-group swqueue where
> the group can be made of ~8 cpus of the same LLC. This will have a
> similar effect of reducing the number of CPUs in a single LLC so the
> scalability issue can hopefully be fixed while at the same time, it
> might still help some workloads. I realized this isn't ideal in that
> wakeup happens at LLC scale so the group thing may not fit very well
> here.
>
> Just a thought, feel free to ignore it if you don't think this is
> feasible :-)
That's certainly an idea we could explore, but my inclination would be
to keep everything at a per-LLC granularity. It makes it easier to
reason about performance; both in terms of work conservation per-LLC
(again, not every workload suffers from having large LLCs even if others
do, and halving the size of a swqueue in an LLC could harm other
workloads which benefit from the increased work conservation), and in
terms of contention. To the latter point, I think it would be difficult
to choose an LLC size that wasn't somewhat artificial and workload
specific. If someone has that requirement, I think sched_ext would be a
better alternative.
Thanks,
David
next prev parent reply other threads:[~2023-06-20 17:36 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-13 5:20 [RFC PATCH 0/3] sched: Implement shared wakequeue in CFS David Vernet
2023-06-13 5:20 ` [RFC PATCH 1/3] sched: Make migrate_task_to() take any task David Vernet
2023-06-21 13:04 ` Peter Zijlstra
2023-06-22 2:07 ` David Vernet
2023-06-13 5:20 ` [RFC PATCH 2/3] sched/fair: Add SWQUEUE sched feature and skeleton calls David Vernet
2023-06-21 12:49 ` Peter Zijlstra
2023-06-22 14:53 ` David Vernet
2023-06-13 5:20 ` [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS David Vernet
2023-06-13 8:32 ` Peter Zijlstra
2023-06-14 4:35 ` Aaron Lu
2023-06-14 9:27 ` Peter Zijlstra
2023-06-15 0:01 ` David Vernet
2023-06-15 4:49 ` Aaron Lu
2023-06-15 7:31 ` Aaron Lu
2023-06-15 23:26 ` David Vernet
2023-06-16 0:53 ` Aaron Lu
2023-06-20 17:36 ` David Vernet [this message]
2023-06-21 2:35 ` Aaron Lu
2023-06-21 2:43 ` David Vernet
2023-06-21 4:54 ` Aaron Lu
2023-06-21 5:43 ` David Vernet
2023-06-21 6:03 ` Aaron Lu
2023-06-22 15:57 ` Chris Mason
2023-06-13 8:41 ` Peter Zijlstra
2023-06-14 20:26 ` David Vernet
2023-06-16 8:08 ` Vincent Guittot
2023-06-20 19:54 ` David Vernet
2023-06-20 21:37 ` Roman Gushchin
2023-06-21 14:22 ` Peter Zijlstra
2023-06-19 6:13 ` Gautham R. Shenoy
2023-06-20 20:08 ` David Vernet
2023-06-21 8:17 ` Gautham R. Shenoy
2023-06-22 1:43 ` David Vernet
2023-06-22 9:11 ` Gautham R. Shenoy
2023-06-22 10:29 ` Peter Zijlstra
2023-06-23 9:50 ` Gautham R. Shenoy
2023-06-26 6:04 ` Gautham R. Shenoy
2023-06-27 3:17 ` David Vernet
2023-06-27 16:31 ` Chris Mason
2023-06-21 14:20 ` Peter Zijlstra
2023-06-21 20:34 ` David Vernet
2023-06-22 10:58 ` Peter Zijlstra
2023-06-22 14:43 ` David Vernet
2023-07-10 11:57 ` [RFC PATCH 0/3] " K Prateek Nayak
2023-07-11 4:43 ` David Vernet
2023-07-11 5:06 ` K Prateek Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230620173626.GA3027191@maniforge \
--to=void@manifault.com \
--cc=aaron.lu@intel.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.