Re: [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Ze Gao <zegao2021@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Valentin Schneider <vschneid@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta
Date: Mon, 5 Feb 2024 13:07:08 +0530	[thread overview]
Message-ID: <ZcCQJGNHgY1inVmL@linux.ibm.com> (raw)
In-Reply-To: <CAD8CoPD1edPmqgDVS1X6G2z-k4aPnCxm=KyCaf9PteOUm=--QQ@mail.gmail.com>

On Sun, Feb 04, 2024 at 11:05:22AM +0800, Ze Gao wrote:
> On Fri, Feb 2, 2024 at 7:50 PM Vishal Chourasia <vishalc@linux.ibm.com> wrote:
> >
> > On Wed, Jan 24, 2024 at 10:32:08AM +0800, Ze Gao wrote:
> > > > Hi, How are you setting custom request values for process A and B?
> > >
> > > I cherry-picked peter's commit[1], and adds a SCHED_QUANTA feature control
> > > for testing w/o my patch.  You can check out [2] to see how it works.
> > >
> > Thank you sharing your setup.
> >
> > Built the kernel according to [2] keeping v6.8.0-rc1 as base
> >
> > // NO_SCHED_QUANTA
> > # perf script -i perf.data.old  -s perf-latency.py
> > PID 355045: Average Delta = 87.72726154385964 ms, Max Delta = 110.015044 ms, Count = 57
> > PID 355044: Average Delta = 92.2655679245283 ms, Max Delta = 110.017182 ms, Count = 53
> >
> > // SCHED_QUANTA
> > # perf script -i perf.data  -s perf-latency.py
> > PID 355065: Average Delta = 10.00 ms, Max Delta = 10.012708 ms, Count = 500
> > PID 355064: Average Delta = 9.959 ms, Max Delta = 10.023588 ms, Count = 501
> >
> > #  cat /sys/kernel/debug/sched/base_slice_ns
> > 3000000
> >
> > base slice is not being enforced.
> >
> > Next, Looking closing at the perf.data file
> >
> > # perf script -i perf.data -C 1 | grep switch
> > ...
> >  stress-ng-cpu 355064 [001] 776706.003222:       sched:sched_switch: stress-ng-cpu:355064 [120] R ==> stress-ng-cpu:355065 [120]
> >  stress-ng-cpu 355065 [001] 776706.013218:       sched:sched_switch: stress-ng-cpu:355065 [120] R ==> stress-ng-cpu:355064 [120]
> >  stress-ng-cpu 355064 [001] 776706.023218:       sched:sched_switch: stress-ng-cpu:355064 [120] R ==> stress-ng-cpu:355065 [120]
> >  stress-ng-cpu 355065 [001] 776706.033218:       sched:sched_switch: stress-ng-cpu:355065 [120] R ==> stress-ng-cpu:355064 [120]
> > ...
> >
> > Delta wait time is approx 0.01s or 10ms
> 
> You can check out your HZ, which should be 100 in your settings
> in my best guess.That explains your results.
Yes. How much is it in your case? If I may ask.
> 
> > So, switch is not happening at base_slice_ns boundary.
> >
> > But why? is it possible base_slice_ns is not properly used in
> > arch != x86 ?
> 
> The thing is  in my RFC the effective quanta is actually
> 
>    max_t(u64, TICK_NSEC, sysctl_sched_base_slice)
> 
> where sysctl_sched_base_slice is precisely a handy tunable knob
> for users ( maybe i should make it loud and clear more ).
> 
> See what I do in update_entity_lag(), you will understand.
Thanks. I will look into it.
> 
> Note we have 3 time related concepts here:
> 1. TIME TICK: (schedule) accounting time unit.
> 2. TIME QUANTA （not necessarily the effective one): scheduling time unit
> 3. USER SLICE: time slice per request
To double check,
User slice is the request size submitted by a competing task for the time-shared resource (here,
processor) against other competing tasks.

Scheduler allocates time-shared resource (here, processor) in `q` quantum
which is our TIME QUANTA

TIME TICK is time period between two scheduler ticks.

Thanks, 
 -- vishal.c
> 
> To implement latency-nice while being as fair as possible, We must
> carefully consider the size relationship between them, and especially
> the value range of USER SLICE due to the cold fact that the lag(
> unfairness) is literally subject to both time quanta and user requested
> slices.
> 
> 
> Regards,
>         -- Ze
> 
> > >
> > > echo NO_SCHED_QUANTA > /sys/kernel/debug/sched/features
> > > test
> > > sleep 2
> > > echo SCHED_QUANTA > /sys/kernel/debug/sched/features
> > > test
> > >
> > >
> > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/kernel/sched?h=sched/eevdf&id=98866150f92f268a2f08eb1d884de9677eb4ec8f
> > > [2]: https://github.com/zegao96/linux/tree/sched-eevdf
> > >
> > >
> > > Regards,
> > >         -- Ze
> > >
> > > > >
> > > > >                       stress-ng-cpu:10705     stress-ng-cpu:10706
> > > > > ---------------------------------------------------------------------
> > > > > Slices(ms)            100                     0.1
> > > > > Runtime(ms)           4934.206                5025.048
> > > > > Switches              58                      67
> > > > > Average delay(ms)     87.074                  73.863
> > > > > Maximum delay(ms)     101.998                 101.010
> > > > >
> > > > > In contrast, using sysctl_sched_base_slice as the size of a 'quantum'
> > > > > in this patch gives us a better control of the allocation accuracy and
> > > > > the avg latency:
> > > > >
> > > > >                       stress-ng-cpu:10584     stress-ng-cpu:10583
> > > > > ---------------------------------------------------------------------
> > > > > Slices(ms)            100                     0.1
> > > > > Runtime(ms)           4980.309                4981.356
> > > > > Switches              1253                    1254
> > > > > Average delay(ms)     3.990                   3.990
> > > > > Maximum delay(ms)     5.001                   4.014
> > > > >
> > > > > Furthmore, with sysctl_sched_base_slice = 10ms, we might benefit from
> > > > > less switches at the cost of worse delay:
> > > > >
> > > > >                       stress-ng-cpu:11208     stress-ng-cpu:11207
> > > > > ---------------------------------------------------------------------
> > > > > Slices(ms)            100                     0.1
> > > > > Runtime(ms)           4983.722                4977.035
> > > > > Switches              456                     456
> > > > > Average delay(ms)     10.963                  10.939
> > > > > Maximum delay(ms)     19.002                  21.001
> > > > >
> > > > > By being able to tune sysctl_sched_base_slice knob, we can achieve
> > > > > the goal to strike a good balance between throughput and latency by
> > > > > adjusting the frequency of context switches, and the conclusions are
> > > > > much close to what's covered in [1] with the explicit definition of
> > > > > a time quantum. And it aslo gives more freedom to choose the eligible
> > > > > request length range(either through nice value or raw value)
> > > > > without worrying about overscheduling or underscheduling too much.
> > > > >
> > > > > Note this change should introduce no obvious regression because all
> > > > > processes have the same request length as sysctl_sched_base_slice as
> > > > > in the status quo. And the result of benchmarks proves this as well.
> > > > >
> > > > > schbench -m2 -F128 -n10       -r90    w/patch tip/6.7-rc7
> > > > > Wakeup  (usec): 99.0th:               3028    95
> > > > > Request (usec): 99.0th:               14992   21984
> > > > > RPS    (count): 50.0th:               5864    5848
> > > > >
> > > > > hackbench -s 512 -l 200 -f 25 -P      w/patch  tip/6.7-rc7
> > > > > -g 10                                         0.212   0.223
> > > > > -g 20                                 0.415   0.432
> > > > > -g 30                                 0.625   0.639
> > > > > -g 40                                 0.852   0.858
> > > > >
> > > > > [1]: https://dl.acm.org/doi/10.5555/890606
> > > > > [2]: https://lore.kernel.org/all/20230420150537.GC4253@hirez.programming.kicks-ass.net/T/#u
> > > > >
> > > > > Signed-off-by: Ze Gao <zegao@tencent.com>
> > > > > ---
> > > >
> > >

next prev parent reply	other threads:[~2024-02-05  7:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-11 11:57 [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta Ze Gao
2024-01-23 12:42 ` Vishal Chourasia
2024-01-24  2:32   ` Ze Gao
2024-02-02 11:50     ` Vishal Chourasia
2024-02-04  3:05       ` Ze Gao
2024-02-05  7:37         ` Vishal Chourasia [this message]
2024-02-06  7:50           ` Ze Gao
2024-02-06 13:09 ` Luis Machado
2024-02-07  3:05   ` Ze Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcCQJGNHgY1inVmL@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zegao2021@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.