Re: [PATCH V3 1/2] sched: Reduce the default slice to avoid tasks getting an extra tick

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: K Prateek Nayak <kprateek.nayak@amd.com>
To: zihan zhou <15645113830zzh@gmail.com>
Cc: <bsegall@google.com>, <dietmar.eggemann@arm.com>,
	<gautham.shenoy@amd.com>, <juri.lelli@redhat.com>,
	<linux-kernel@vger.kernel.org>, <mgorman@suse.de>,
	<mingo@redhat.com>, <peterz@infradead.org>, <rostedt@goodmis.org>,
	<vincent.guittot@linaro.org>, <vschneid@redhat.com>
Subject: Re: [PATCH V3 1/2] sched: Reduce the default slice to avoid tasks getting an extra tick
Date: Fri, 7 Mar 2025 09:40:11 +0530	[thread overview]
Message-ID: <3d0f9c2b-8498-4405-b178-9f6c8615f73b@amd.com> (raw)
In-Reply-To: <20250222030221.63120-1-15645113830zzh@gmail.com>

Hello Zhou,

Sorry this slipped past me.

On 2/22/2025 8:32 AM, zihan zhou wrote:
> Thank you for your reply, thank you for providing such a detailed test,
> which also let me learn a lot.
> 
>> Hello Zhou,
>>
>> I'll leave some testing data below but overall, in my testing with
>> CONFIG_HZ=250 and CONFIG_HZ=10000, I cannot see any major regressions
>> (at least not for any stable data point) There are few small regressions
>> probably as a result of grater opportunity for wakeup preemption since
>> RUN_TO_PARITY will work for a slightly shorter duration now but I
>> haven't dug deeper to confirm if they are run to run variation or a
>> result of the larger number of wakeup preemption.
>>
>> Since most servers run with CONFIG_HZ=250, and the tick is anyways 4ms
>> and with default base slice currently at 3ms, I don't think there will
>> be any discernible difference in most workloads (fingers crossed)
>>
>> Please find full data below.
> 
> 
> This should be CONFIG_HZ=250 and CONFIG_HZ=1000, is it wrong?

That is correct! My bad.

> 
> It seems that no performance difference is good news. This change will not
> affect performance. This problem was first found in the openeuler 6.6
> kernel. If one task runs all the time and the other runs for 3ms and then
> sleeps for 1us, the running time of the two tasks will become 4:3, but 1:1
> on orig cfs. This problem has disappeared in the mainline kernel.
> 
>> o Benchmark results (CONFIG_HZ=1000)
>>
>> ==================================================================
>> Test          : hackbench
>> Units         : Normalized time in seconds
>> Interpretation: Lower is better
>> Statistic     : AMean
>> ==================================================================
>> Case:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>>    1-groups     1.00 [ -0.00]( 8.66)     1.05 [ -5.30](16.73)
>>    2-groups     1.00 [ -0.00]( 5.02)     1.07 [ -6.54]( 7.29)
>>    4-groups     1.00 [ -0.00]( 1.27)     1.02 [ -1.67]( 3.74)
>>    8-groups     1.00 [ -0.00]( 2.75)     0.99 [  0.78]( 2.61)
>> 16-groups     1.00 [ -0.00]( 2.02)     0.97 [  2.97]( 1.19)
>>
>>
>> ==================================================================
>> Test          : tbench
>> Units         : Normalized throughput
>> Interpretation: Higher is better
>> Statistic     : AMean
>> ==================================================================
>> Clients:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>>       1     1.00 [  0.00]( 0.40)     1.00 [ -0.44]( 0.47)
>>       2     1.00 [  0.00]( 0.49)     0.99 [ -0.65]( 1.39)
>>       4     1.00 [  0.00]( 0.94)     1.00 [ -0.34]( 0.09)
>>       8     1.00 [  0.00]( 0.64)     0.99 [ -0.77]( 1.57)
>>      16     1.00 [  0.00]( 1.04)     0.98 [ -2.00]( 0.98)
>>      32     1.00 [  0.00]( 1.13)     1.00 [  0.34]( 1.31)
>>      64     1.00 [  0.00]( 0.58)     1.00 [ -0.28]( 0.80)
>>     128     1.00 [  0.00]( 1.40)     0.99 [ -0.91]( 0.51)
>>     256     1.00 [  0.00]( 1.14)     0.99 [ -1.48]( 1.17)
>>     512     1.00 [  0.00]( 0.51)     1.00 [ -0.25]( 0.66)
>>    1024     1.00 [  0.00]( 0.62)     0.99 [ -0.79]( 0.40)
>>
>>
>> ==================================================================
>> Test          : stream-10
>> Units         : Normalized Bandwidth, MB/s
>> Interpretation: Higher is better
>> Statistic     : HMean
>> ==================================================================
>> Test:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>>    Copy     1.00 [  0.00](16.03)     0.98 [ -2.33](17.69)
>> Scale     1.00 [  0.00]( 6.26)     0.99 [ -0.60]( 7.94)
>>     Add     1.00 [  0.00]( 8.35)     1.01 [  0.50](11.49)
>> Triad     1.00 [  0.00]( 9.56)     1.01 [  0.66]( 9.25)
>>
>>
>> ==================================================================
>> Test          : stream-100
>> Units         : Normalized Bandwidth, MB/s
>> Interpretation: Higher is better
>> Statistic     : HMean
>> ==================================================================
>> Test:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>>    Copy     1.00 [  0.00]( 6.03)     1.02 [  1.58]( 2.27)
>> Scale     1.00 [  0.00]( 5.78)     1.02 [  1.64]( 4.50)
>>     Add     1.00 [  0.00]( 5.25)     1.01 [  1.37]( 4.17)
>> Triad     1.00 [  0.00]( 5.25)     1.03 [  3.35]( 1.18)
>>
>>
>> ==================================================================
>> Test          : netperf
>> Units         : Normalized Througput
>> Interpretation: Higher is better
>> Statistic     : AMean
>> ==================================================================
>> Clients:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>>    1-clients     1.00 [  0.00]( 0.06)     1.01 [  0.66]( 0.75)
>>    2-clients     1.00 [  0.00]( 0.80)     1.01 [  0.79]( 0.31)
>>    4-clients     1.00 [  0.00]( 0.65)     1.01 [  0.56]( 0.73)
>>    8-clients     1.00 [  0.00]( 0.82)     1.01 [  0.70]( 0.59)
>> 16-clients     1.00 [  0.00]( 0.68)     1.01 [  0.63]( 0.77)
>> 32-clients     1.00 [  0.00]( 0.95)     1.01 [  0.87]( 1.06)
>> 64-clients     1.00 [  0.00]( 1.55)     1.01 [  0.66]( 1.60)
>> 128-clients     1.00 [  0.00]( 1.23)     1.00 [ -0.28]( 1.58)
>> 256-clients     1.00 [  0.00]( 4.92)     1.00 [  0.25]( 4.47)
>> 512-clients     1.00 [  0.00](57.12)     1.00 [  0.24](62.52)
>>
>>
>> ==================================================================
>> Test          : schbench
>> Units         : Normalized 99th percentile latency in us
>> Interpretation: Lower is better
>> Statistic     : Median
>> ==================================================================
>> #workers:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>>     1     1.00 [ -0.00](27.55)     0.81 [ 19.35](31.80)
>>     2     1.00 [ -0.00](19.98)     0.87 [ 12.82]( 9.17)
>>     4     1.00 [ -0.00](10.66)     1.09 [ -9.09]( 6.45)
>>     8     1.00 [ -0.00]( 4.06)     0.90 [  9.62]( 6.38)
>>    16     1.00 [ -0.00]( 5.33)     0.98 [  1.69]( 1.97)
>>    32     1.00 [ -0.00]( 8.92)     0.97 [  3.16]( 1.09)
>>    64     1.00 [ -0.00]( 6.06)     0.97 [  3.30]( 2.97)
>> 128     1.00 [ -0.00](10.15)     1.05 [ -5.47]( 4.75)
>> 256     1.00 [ -0.00](27.12)     1.00 [ -0.20](13.52)
>> 512     1.00 [ -0.00]( 2.54)     0.80 [ 19.75]( 0.40)
>>
>>
>> ==================================================================
>> Test          : new-schbench-requests-per-second
>> Units         : Normalized Requests per second
>> Interpretation: Higher is better
>> Statistic     : Median
>> ==================================================================
>> #workers:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>>     1     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.46)
>>     2     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.15)
>>     4     1.00 [  0.00]( 0.15)     1.00 [  0.00]( 0.15)
>>     8     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.15)
>>    16     1.00 [  0.00]( 0.00)     1.00 [  0.00]( 0.00)
>>    32     1.00 [  0.00]( 0.43)     1.01 [  0.63]( 0.28)
>>    64     1.00 [  0.00]( 1.17)     1.00 [  0.00]( 0.20)
>> 128     1.00 [  0.00]( 0.20)     1.00 [  0.00]( 0.20)
>> 256     1.00 [  0.00]( 0.27)     1.00 [  0.00]( 1.69)
>> 512     1.00 [  0.00]( 0.21)     0.95 [ -4.70]( 0.34)
>>
>>
>> ==================================================================
>> Test          : new-schbench-wakeup-latency
>> Units         : Normalized 99th percentile latency in us
>> Interpretation: Lower is better
>> Statistic     : Median
>> ==================================================================
>> #workers:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>>     1     1.00 [ -0.00](11.08)     1.33 [-33.33](15.78)
>>     2     1.00 [ -0.00]( 4.08)     1.08 [ -7.69](10.00)
>>     4     1.00 [ -0.00]( 6.39)     1.21 [-21.43](22.13)
>>     8     1.00 [ -0.00]( 6.88)     1.15 [-15.38](11.93)
>>    16     1.00 [ -0.00](13.62)     1.08 [ -7.69](10.33)
>>    32     1.00 [ -0.00]( 0.00)     1.00 [ -0.00]( 3.87)
>>    64     1.00 [ -0.00]( 8.13)     1.00 [ -0.00]( 2.38)
>> 128     1.00 [ -0.00]( 5.26)     0.98 [  2.11]( 1.92)
>> 256     1.00 [ -0.00]( 1.00)     0.78 [ 22.36](14.65)
>> 512     1.00 [ -0.00]( 0.48)     0.73 [ 27.15]( 6.75)
>>
>>
>> ==================================================================
>> Test          : new-schbench-request-latency
>> Units         : Normalized 99th percentile latency in us
>> Interpretation: Lower is better
>> Statistic     : Median
>> ==================================================================
>> #workers:      mainline[pct imp](CV)    new_base_slice[pct imp](CV)
>>     1     1.00 [ -0.00]( 1.53)     1.00 [ -0.00]( 1.77)
>>     2     1.00 [ -0.00]( 0.50)     1.01 [ -1.35]( 1.19)
>>     4     1.00 [ -0.00]( 0.14)     1.00 [ -0.00]( 0.42)
>>     8     1.00 [ -0.00]( 0.24)     1.00 [ -0.27]( 1.37)
>>    16     1.00 [ -0.00]( 0.00)     1.00 [  0.27]( 0.14)
>>    32     1.00 [ -0.00]( 0.66)     1.01 [ -1.48]( 2.65)
>>    64     1.00 [ -0.00]( 5.72)     0.96 [  4.32]( 5.64)
>> 128     1.00 [ -0.00]( 0.10)     1.00 [ -0.20]( 0.18)
>> 256     1.00 [ -0.00]( 2.52)     0.96 [  4.04]( 9.70)
>> 512     1.00 [ -0.00]( 0.68)     1.06 [ -5.52]( 0.36)
>>
>>
>> ==================================================================
>> Test          : longer running benchmarks
>> Units         : Normalized throughput
>> Interpretation: Higher is better
>> Statistic     : Median
>> ==================================================================
>> Benchmark		pct imp
>> ycsb-cassandra          -0.64%
>> ycsb-mongodb             0.56%
>> deathstarbench-1x        0.30%
>> deathstarbench-2x        3.21%
>> deathstarbench-3x        2.18%
>> deathstarbench-6x       -0.40%
>> mysql-hammerdb-64VU     -0.63%
>> ---
> 
> It seems that new_base_slice has made some progress in high load/latency
> and regressed a bit on low load.
> 
> It seems that slice should not only be related to the number of cpus, but
> also to the corresponding relationship between the overall load and the
> number of cpus. The load is relatively heavy, so the slice should be
> smaller. The load is relatively light, so the slice should be larger.
> Fixing it to a value may not be the optimal solution.

We've seen that assumptions go wrong in our experiments; some benchmarks
really love their time on the CPU without any preemptions :)

> 
>> With that overwhelming amount of data out of the way, please feel free
>> to add:
>>
>> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> 
> I think you're worth it, but it seems a bit late. I have received the email
> of tip-bot2, I am not sure if there can still add it.

That is fine as long as there is a record on lore :)

> 
> Your email made me realize that I should establish a systematic testing
> method. Can you give me some useful projects?

We use selective benchmarks from LKP: https://github.com/intel/lkp-tests

Then there are some larger benchmarks we run based on previous regression
reports and debugs. some of them are:

YCSB: https://github.com/brianfrankcooper/YCSB
netperf: https://github.com/HewlettPackard/netperf
DeathStarBench: https://github.com/delimitrou/DeathStarBench
HammerDB: https://github.com/TPC-Council/HammerDB.git
tbench (part of dbench): https://dbench.samba.org/web/download.html
schbench: https://git.kernel.org/pub/scm/linux/kernel/git/mason/schbench.git
sched-messaging: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/bench/sched-messaging.c?h=v6.14-rc4

Some of them are hard to setup the first time; we internally have some
tools that have made it easy to run these benchmarks in a way that
stresses the system but we keep an eye out for regression reports to
understand what benchmarks folks are running in the field.

Sorry again for the delay and thank you.

> 
> Thanks!

-- 
Thanks and Regards,
Prateek

next prev parent reply	other threads:[~2025-03-07  4:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-08  7:48 [PATCH V3 0/2] sched: Reduce the default slice to avoid tasks getting an extra tick zihan zhou
2025-02-08  7:53 ` [PATCH V3 1/2] " zihan zhou
2025-02-10  1:29   ` Qais Yousef
2025-02-10  6:18     ` zihan zhou
2025-02-10 22:55       ` Qais Yousef
2025-02-10  9:13     ` Peter Zijlstra
2025-02-10 23:05       ` Qais Yousef
2025-02-22  3:19         ` zihan zhou
2025-02-23  0:08           ` Qais Yousef
2025-02-24 14:15       ` Vincent Guittot
2025-02-25  0:25         ` Qais Yousef
2025-02-25  1:29           ` Vincent Guittot
2025-02-25 10:13             ` Vincent Guittot
2025-02-25 13:06               ` Qais Yousef
2025-02-14  3:33   ` K Prateek Nayak
2025-02-22  3:02     ` zihan zhou
2025-03-07  4:10       ` K Prateek Nayak [this message]
2025-03-14  1:49         ` zihan zhou
2025-02-15 10:55   ` [tip: sched/core] " tip-bot2 for zihan zhou
2025-02-08  7:57 ` [PATCH V3 2/2] " zihan zhou
2025-02-08 19:32   ` Vincent Guittot
2025-02-10  6:26     ` zihan zhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d0f9c2b-8498-4405-b178-9f6c8615f73b@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=15645113830zzh@gmail.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gautham.shenoy@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox