Re: [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Ze Gao <zegao2021@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Valentin Schneider <vschneid@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org, Ze Gao <zegao@tencent.com>
Subject: Re: [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta
Date: Fri, 2 Feb 2024 17:20:00 +0530	[thread overview]
Message-ID: <ZbzW6EuJ1gFTi80U@linux.ibm.com> (raw)
In-Reply-To: <CAD8CoPAJh9ggK8ODYFiUaF2WXPG4d5ERDUdpL532N5kc=-xuSw@mail.gmail.com>

On Wed, Jan 24, 2024 at 10:32:08AM +0800, Ze Gao wrote:
> > Hi, How are you setting custom request values for process A and B?
> 
> I cherry-picked peter's commit[1], and adds a SCHED_QUANTA feature control
> for testing w/o my patch.  You can check out [2] to see how it works.
> 
Thank you sharing your setup.

Built the kernel according to [2] keeping v6.8.0-rc1 as base

// NO_SCHED_QUANTA
# perf script -i perf.data.old  -s perf-latency.py
PID 355045: Average Delta = 87.72726154385964 ms, Max Delta = 110.015044 ms, Count = 57
PID 355044: Average Delta = 92.2655679245283 ms, Max Delta = 110.017182 ms, Count = 53

// SCHED_QUANTA
# perf script -i perf.data  -s perf-latency.py
PID 355065: Average Delta = 10.00 ms, Max Delta = 10.012708 ms, Count = 500
PID 355064: Average Delta = 9.959 ms, Max Delta = 10.023588 ms, Count = 501

#  cat /sys/kernel/debug/sched/base_slice_ns
3000000

base slice is not being enforced.

Next, Looking closing at the perf.data file

# perf script -i perf.data -C 1 | grep switch
...
 stress-ng-cpu 355064 [001] 776706.003222:       sched:sched_switch: stress-ng-cpu:355064 [120] R ==> stress-ng-cpu:355065 [120]
 stress-ng-cpu 355065 [001] 776706.013218:       sched:sched_switch: stress-ng-cpu:355065 [120] R ==> stress-ng-cpu:355064 [120]
 stress-ng-cpu 355064 [001] 776706.023218:       sched:sched_switch: stress-ng-cpu:355064 [120] R ==> stress-ng-cpu:355065 [120]
 stress-ng-cpu 355065 [001] 776706.033218:       sched:sched_switch: stress-ng-cpu:355065 [120] R ==> stress-ng-cpu:355064 [120]
...

Delta wait time is approx 0.01s or 10ms
So, switch is not happening at base_slice_ns boundary.

But why? is it possible base_slice_ns is not properly used in
arch != x86 ?

> 
> echo NO_SCHED_QUANTA > /sys/kernel/debug/sched/features
> test
> sleep 2
> echo SCHED_QUANTA > /sys/kernel/debug/sched/features
> test
> 
> 
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/kernel/sched?h=sched/eevdf&id=98866150f92f268a2f08eb1d884de9677eb4ec8f
> [2]: https://github.com/zegao96/linux/tree/sched-eevdf
> 
> 
> Regards,
>         -- Ze
> 
> > >
> > >                       stress-ng-cpu:10705     stress-ng-cpu:10706
> > > ---------------------------------------------------------------------
> > > Slices(ms)            100                     0.1
> > > Runtime(ms)           4934.206                5025.048
> > > Switches              58                      67
> > > Average delay(ms)     87.074                  73.863
> > > Maximum delay(ms)     101.998                 101.010
> > >
> > > In contrast, using sysctl_sched_base_slice as the size of a 'quantum'
> > > in this patch gives us a better control of the allocation accuracy and
> > > the avg latency:
> > >
> > >                       stress-ng-cpu:10584     stress-ng-cpu:10583
> > > ---------------------------------------------------------------------
> > > Slices(ms)            100                     0.1
> > > Runtime(ms)           4980.309                4981.356
> > > Switches              1253                    1254
> > > Average delay(ms)     3.990                   3.990
> > > Maximum delay(ms)     5.001                   4.014
> > >
> > > Furthmore, with sysctl_sched_base_slice = 10ms, we might benefit from
> > > less switches at the cost of worse delay:
> > >
> > >                       stress-ng-cpu:11208     stress-ng-cpu:11207
> > > ---------------------------------------------------------------------
> > > Slices(ms)            100                     0.1
> > > Runtime(ms)           4983.722                4977.035
> > > Switches              456                     456
> > > Average delay(ms)     10.963                  10.939
> > > Maximum delay(ms)     19.002                  21.001
> > >
> > > By being able to tune sysctl_sched_base_slice knob, we can achieve
> > > the goal to strike a good balance between throughput and latency by
> > > adjusting the frequency of context switches, and the conclusions are
> > > much close to what's covered in [1] with the explicit definition of
> > > a time quantum. And it aslo gives more freedom to choose the eligible
> > > request length range(either through nice value or raw value)
> > > without worrying about overscheduling or underscheduling too much.
> > >
> > > Note this change should introduce no obvious regression because all
> > > processes have the same request length as sysctl_sched_base_slice as
> > > in the status quo. And the result of benchmarks proves this as well.
> > >
> > > schbench -m2 -F128 -n10       -r90    w/patch tip/6.7-rc7
> > > Wakeup  (usec): 99.0th:               3028    95
> > > Request (usec): 99.0th:               14992   21984
> > > RPS    (count): 50.0th:               5864    5848
> > >
> > > hackbench -s 512 -l 200 -f 25 -P      w/patch  tip/6.7-rc7
> > > -g 10                                         0.212   0.223
> > > -g 20                                 0.415   0.432
> > > -g 30                                 0.625   0.639
> > > -g 40                                 0.852   0.858
> > >
> > > [1]: https://dl.acm.org/doi/10.5555/890606
> > > [2]: https://lore.kernel.org/all/20230420150537.GC4253@hirez.programming.kicks-ass.net/T/#u
> > >
> > > Signed-off-by: Ze Gao <zegao@tencent.com>
> > > ---
> >
>

next prev parent reply	other threads:[~2024-02-02 11:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-11 11:57 [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta Ze Gao
2024-01-23 12:42 ` Vishal Chourasia
2024-01-24  2:32   ` Ze Gao
2024-02-02 11:50     ` Vishal Chourasia [this message]
2024-02-04  3:05       ` Ze Gao
2024-02-05  7:37         ` Vishal Chourasia
2024-02-06  7:50           ` Ze Gao
2024-02-06 13:09 ` Luis Machado
2024-02-07  3:05   ` Ze Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbzW6EuJ1gFTi80U@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zegao2021@gmail.com \
    --cc=zegao@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox