From: Ingo Molnar <mingo@kernel.org>
To: kernel test robot <oliver.sang@intel.com>,
Mel Gorman <mgorman@techsingularity.net>,
Peter Zijlstra <peterz@infradead.org>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com,
linux-kernel@vger.kernel.org, ying.huang@intel.com,
feng.tang@intel.com, fengwei.yin@intel.com,
aubrey.li@linux.intel.com, yu.c.chen@intel.com,
Mike Galbraith <efault@gmx.de>,
K Prateek Nayak <kprateek.nayak@amd.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
linux-tip-commits@vger.kernel.org, x86@kernel.org,
Gautham Shenoy <gautham.shenoy@amd.com>
Subject: Re: [PATCH] sched/fair: Do not wakeup-preempt same-prio SCHED_OTHER tasks
Date: Mon, 25 Sep 2023 13:07:08 +0200 [thread overview]
Message-ID: <ZRFp3EO2JUXtK6XB@gmail.com> (raw)
In-Reply-To: <202309221758.d655aa5b-oliver.sang@intel.com>
* kernel test robot <oliver.sang@intel.com> wrote:
> Hello,
>
> kernel test robot noticed a -19.0% regression of stress-ng.filename.ops_per_sec on:
Thanks for the testing, this is useful!
So I've tabulated the results into a much easier to read format:
> | testcase: change | stress-ng: stress-ng.filename.ops_per_sec -19.0% regression
> | testcase: change | stress-ng: stress-ng.lockbus.ops_per_sec -6.0% regression
> | testcase: change | stress-ng: stress-ng.sigfd.ops_per_sec 17.6% improvement
> | testcase: change | phoronix-test-suite: phoronix-test-suite.darktable.Masskrug.CPU-only.seconds -5.3% improvement
> | testcase: change | lmbench3: lmbench3.TCP.socket.bandwidth.64B.MB/sec 11.5% improvement
> | testcase: change | phoronix-test-suite: phoronix-test-suite.darktable.Boat.CPU-only.seconds -3.5% improvement
> | testcase: change | stress-ng: stress-ng.sigrt.ops_per_sec 100.2% improvement
> | testcase: change | stress-ng: stress-ng.sigsuspend.ops_per_sec -93.9% regression
> | testcase: change | stress-ng: stress-ng.sigsuspend.ops_per_sec -82.1% regression
> | testcase: change | stress-ng: stress-ng.sock.ops_per_sec 59.4% improvement
> | testcase: change | blogbench: blogbench.write_score -35.9% regression
> | testcase: change | hackbench: hackbench.throughput -4.8% regression
> | testcase: change | blogbench: blogbench.write_score -59.3% regression
> | testcase: change | stress-ng: stress-ng.exec.ops_per_sec -34.6% regression
> | testcase: change | netperf: netperf.Throughput_Mbps 60.6% improvement
> | testcase: change | hackbench: hackbench.throughput 19.1% improvement
> | testcase: change | stress-ng: stress-ng.dnotify.ops_per_sec -15.7% regression
And then sorted them along the regression/improvement axis:
> | testcase: change | stress-ng: stress-ng.sigsuspend.ops_per_sec -93.9% regression
> | testcase: change | stress-ng: stress-ng.sigsuspend.ops_per_sec -82.1% regression
> | testcase: change | blogbench: blogbench.write_score -59.3% regression
> | testcase: change | blogbench: blogbench.write_score -35.9% regression
> | testcase: change | stress-ng: stress-ng.exec.ops_per_sec -34.6% regression
> | testcase: change | stress-ng: stress-ng.filename.ops_per_sec -19.0% regression
> | testcase: change | stress-ng: stress-ng.dnotify.ops_per_sec -15.7% regression
> | testcase: change | stress-ng: stress-ng.lockbus.ops_per_sec -6.0% regression
> | testcase: change | hackbench: hackbench.throughput -4.8% regression
> | testcase: change | phoronix-test-suite: phoronix-test-suite.darktable.Masskrug.CPU-only.seconds +5.3% improvement
> | testcase: change | phoronix-test-suite: phoronix-test-suite.darktable.Boat.CPU-only.seconds +3.5% improvement
> | testcase: change | lmbench3: lmbench3.TCP.socket.bandwidth.64B.MB/sec 11.5% improvement
> | testcase: change | stress-ng: stress-ng.sigfd.ops_per_sec 17.6% improvement
> | testcase: change | hackbench: hackbench.throughput 19.1% improvement
> | testcase: change | stress-ng: stress-ng.sock.ops_per_sec 59.4% improvement
> | testcase: change | netperf: netperf.Throughput_Mbps 60.6% improvement
> | testcase: change | stress-ng: stress-ng.sigrt.ops_per_sec 100.2% improvement
Testing results notes:
- the '+' denotes an inverted improvement. The mixing of signs in the output of the
ktest robot is arguably confusing.
- Any hope getting similar summary format by default? It's much more informative than
just picking up the biggest regression, which wasn't even done correctly AFAICT.
Summary:
While there's a lot of improvements, it is primarily the nature of performance
regressions that dictate the way forward:
- stress-ng.sigsuspend.ops_per_sec regressions, -93%:
Clearly signal delivery performance hurts from delayed preemption, but
that should be straightforward to resolve, if we are willing to commit
to adding a high-prio insta-wakeup variant API ...
- stress-ng.exec.ops_per_sec -34% regression:
Likewise this possibly expresses that it's better to immediately reschedule
during exec() - but maybe it's more and reflects some unfavorable migration,
as suggested by the NUMA locality figures:
%change %stddev
| \
79317172 -34.2% 52217838 ± 3% numa-numastat.node0.local_node
79360983 -34.2% 52240348 ± 3% numa-numastat.node0.numa_hit
77971050 -33.2% 52068168 ± 3% numa-numastat.node1.local_node
78009071 -33.2% 52089987 ± 3% numa-numastat.node1.numa_hit
88287 -45.7% 47970 ± 2% vmstat.system.cs
- 'blogbench' regression of -59%:
It too has a very large reduction in context switches:
%stddev %change %stddev
\ | \
30035 -49.7% 15097 ± 3% vmstat.system.cs
2243545 ± 2% -4.1% 2152228 blogbench.read_score
52412617 -28.3% 37571769 blogbench.time.file_system_outputs
2682930 -74.1% 694136 blogbench.time.involuntary_context_switches
2369329 -50.0% 1184098 ± 5% blogbench.time.voluntary_context_switches
5851 -35.9% 3752 ± 2% blogbench.write_score
It's unclear to me what's happening with this one, just from these stats,
but it's "write_score" that hurts most.
- 'stress-ng.filename.ops_per_sec' regression of -19%:
This test suffered from an *increase* in context-switching, and a large
increase in CPU-idle:
%stddev %change %stddev
\ | \
4641666 +19.5% 5545394 ± 2% cpuidle..usage
90589 ± 2% +70.5% 154471 ± 2% vmstat.system.cs
628439 -19.2% 507711 stress-ng.filename.ops
10317 -19.0% 8355 stress-ng.filename.ops_per_sec
171981 -59.7% 69333 ± 3% stress-ng.time.involuntary_context_switches
770691 ± 3% +200.9% 2319214 stress-ng.time.voluntary_context_switches
Anyway, it's clear from these results that while many workloads hurt
from our notion of wake-preemption, there's several ones that benefit
from it, especially generic ones like phoronix-test-suite - which have
no good way to turn off wakeup preemption (SCHED_BATCH might help though).
One way to approach this would be to instead of always doing
wakeup-preemption (our current default), we could turn it around and
only use it when it is clearly beneficial - such as signal delivery,
or exec().
The canonical way to solve this would be give *userspace* a way to
signal that it's beneficial to preempt immediately, ie. yield(),
but right now that interface is hurting tasks that only want to
give other tasks a chance to run, without necessarily giving up
their own right to run:
se->deadline += calc_delta_fair(se->slice, se);
Anyway, my patch is obviously a no-go as-is, and this clearly needs more work.
Thanks,
Ingo
next prev parent reply other threads:[~2023-09-25 11:07 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-10 13:24 [tip:sched/eevdf] [sched/fair] e0c2ff903c: phoronix-test-suite.blogbench.Write.final_score -34.8% regression kernel test robot
2023-08-11 1:11 ` Chen Yu
2023-08-11 2:42 ` Chen Yu
2023-08-14 13:29 ` Peter Zijlstra
2023-08-14 18:32 ` Mike Galbraith
2023-08-15 23:52 ` Peter Zijlstra
2023-08-16 3:54 ` Mike Galbraith
2023-08-16 12:37 ` Peter Zijlstra
2023-08-16 13:40 ` Peter Zijlstra
2023-08-16 15:38 ` Mike Galbraith
2023-08-16 20:04 ` Peter Zijlstra
2023-08-17 1:25 ` Mike Galbraith
2023-08-17 15:10 ` [tip: sched/core] sched/eevdf: Curb wakeup-preemption tip-bot2 for Peter Zijlstra
2023-08-21 10:39 ` K Prateek Nayak
2023-08-21 15:30 ` Mike Galbraith
2023-08-22 3:03 ` K Prateek Nayak
2023-08-22 6:09 ` Mike Galbraith
2023-08-25 6:41 ` K Prateek Nayak
2023-09-19 9:02 ` [PATCH] sched/fair: Do not wakeup-preempt same-prio SCHED_OTHER tasks Ingo Molnar
2023-09-19 9:48 ` Mike Galbraith
2023-09-22 10:00 ` kernel test robot
2023-09-25 11:07 ` Ingo Molnar [this message]
2023-09-25 16:45 ` Chen Yu
2023-08-18 1:09 ` [tip:sched/eevdf] [sched/fair] e0c2ff903c: phoronix-test-suite.blogbench.Write.final_score -34.8% regression Chen Yu
2023-08-22 6:48 ` Chen Yu
2023-08-22 7:07 ` Chen Yu
2023-08-16 3:40 ` Chen Yu
2023-08-16 9:20 ` Peter Zijlstra
2023-08-14 12:49 ` Peter Zijlstra
2023-08-18 1:54 ` Chen Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRFp3EO2JUXtK6XB@gmail.com \
--to=mingo@kernel.org \
--cc=aubrey.li@linux.intel.com \
--cc=efault@gmx.de \
--cc=feng.tang@intel.com \
--cc=fengwei.yin@intel.com \
--cc=gautham.shenoy@amd.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=lkp@intel.com \
--cc=mgorman@techsingularity.net \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
--cc=ying.huang@intel.com \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.