From: Tejun Heo <tj@kernel.org>
To: Naohiro Aota <Naohiro.Aota@wdc.com>
Cc: "jiangshanlai@gmail.com" <jiangshanlai@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kernel-team@meta.com" <kernel-team@meta.com>
Subject: Re: Re: Re: [PATCHSET wq/for-6.8] workqueue: Implement system-wide max_active for unbound workqueues
Date: Tue, 30 Jan 2024 06:11:16 -1000 [thread overview]
Message-ID: <ZbkfpKv7CQs2u9RH@slm.duckdns.org> (raw)
In-Reply-To: <a4obzmueffpsmvlzfe64oksxdzyknxacxb2kkeytwzjtlzhz6r@w4lyfr6vrrp7>
Hello,
On Tue, Jan 30, 2024 at 02:24:47AM +0000, Naohiro Aota wrote:
> > If so, I'm not sure how meaningful the result is. e.g. The perf would depend
> > heavily on random factors like which threads end up on which node and so on.
> > Sure, if we're slow because we're creating huge number of concurrent
> > workers, that's still a problem but comparing relatively small perf delta
> > might not be all that meaningful. How much is the result variance in that
> > setup?
>
> Yeah, that is true. I conducted the benchmark 30 times, and the sample standard
> deviation is 320.30. They ranged as follow.
> Min 1732 MiB/s - Max 2565 MiB/s
> Mean: 2212.3 MiB/s Sample stddev 320.30
>
> Comparing to that, here is the result on the baseline.
> Min 1113 MiB/s - Max 1498 MiB/s
> Mean: 1231.85 Sample stddev 104.31
>
> For a reference, a result on reverted case is as follow:
> Min 2211 MiB/s - Max 2506 MiB/s
> Mean 2372.23 MiB/s Sample stddev 82.49
>
> So, the patched one is indeed better than the baseline. Even the worst case
> on patched version is better than the best on baseline. And, as you
> mentioned. patched version has far larger variance than baseline and
> reverted one.
Yeah, the average being similar while the variance being way larger makes
sense. Before the revert, it's spraying things across the machine. After,
per run, the execution is more sticky, so you basically end up amplifying
the varince.
> > > FYI, without the kernel command-line (i.e, numa=on and all RAM available as
> > > usual), as shown below, your patch series (v1) improved the performance
> > > significantly. It is even better than the reverted case.
> > >
> > > - misc-next, numa=on
> > > WRITE: bw=1121MiB/s (1175MB/s), 1121MiB/s-1121MiB/s (1175MB/s-1175MB/s), io=332GiB (356GB), run=303030-303030msec
> > > - misc-next+wq patches, numa=on
> > > WRITE: bw=2185MiB/s (2291MB/s), 2185MiB/s-2185MiB/s (2291MB/s-2291MB/s), io=667GiB (717GB), run=312806-312806msec
> > > - misc-next+wq reverted, numa=on
> > > WRITE: bw=1557MiB/s (1633MB/s), 1557MiB/s-1557MiB/s (1633MB/s-1633MB/s), io=659GiB (708GB), run=433426-433426msec
> >
> > That looks pretty good, right?
>
> Yes, it is so good. Since the numa=off case is quite unusual and it has a
> large variance, I believe this patch series is a good improvement.
Great to hear.
Thanks.
--
tejun
prev parent reply other threads:[~2024-01-30 16:11 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-20 7:24 [PATCHSET wq/for-6.8] workqueue: Implement system-wide max_active for unbound workqueues Tejun Heo
2023-12-20 7:24 ` [PATCH 01/10] workqueue: Move pwq->max_active to wq->max_active Tejun Heo
2023-12-26 9:13 ` Lai Jiangshan
2023-12-26 20:05 ` Tejun Heo
2023-12-26 21:36 ` Tejun Heo
2023-12-20 7:24 ` [PATCH 02/10] workqueue: Factor out pwq_is_empty() Tejun Heo
2023-12-20 7:24 ` [PATCH 03/10] workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work() Tejun Heo
2023-12-20 7:24 ` [PATCH 04/10] workqueue: Move nr_active handling into helpers Tejun Heo
2023-12-26 9:12 ` Lai Jiangshan
2023-12-26 20:06 ` Tejun Heo
2023-12-20 7:24 ` [PATCH 05/10] workqueue: Make wq_adjust_max_active() round-robin pwqs while activating Tejun Heo
2023-12-20 7:24 ` [PATCH 06/10] workqueue: Add first_possible_node and node_nr_cpus[] Tejun Heo
2023-12-20 7:24 ` [PATCH 07/10] workqueue: Move pwq_dec_nr_in_flight() to the end of work item handling Tejun Heo
2023-12-20 7:24 ` [PATCH 08/10] workqueue: Introduce struct wq_node_nr_active Tejun Heo
2023-12-26 9:14 ` Lai Jiangshan
2023-12-26 20:12 ` Tejun Heo
2023-12-20 7:24 ` [PATCH 09/10] workqueue: Implement system-wide nr_active enforcement for unbound workqueues Tejun Heo
2023-12-20 7:24 ` [PATCH 10/10] workqueue: Reimplement ordered workqueue using shared nr_active Tejun Heo
2024-01-13 0:18 ` Tejun Heo
2023-12-20 9:20 ` [PATCHSET wq/for-6.8] workqueue: Implement system-wide max_active for unbound workqueues Lai Jiangshan
2023-12-21 23:01 ` Tejun Heo
2023-12-22 8:04 ` Lai Jiangshan
2023-12-22 9:08 ` Tejun Heo
2024-01-05 2:44 ` Naohiro Aota
2024-01-12 0:49 ` Tejun Heo
2024-01-13 0:17 ` Tejun Heo
2024-01-15 5:46 ` Naohiro Aota
2024-01-16 21:04 ` Tejun Heo
2024-01-30 2:24 ` Naohiro Aota
2024-01-30 16:11 ` Tejun Heo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZbkfpKv7CQs2u9RH@slm.duckdns.org \
--to=tj@kernel.org \
--cc=Naohiro.Aota@wdc.com \
--cc=jiangshanlai@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox