Re: [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Ze Gao <zegao2021@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Valentin Schneider <vschneid@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org, Ze Gao <zegao@tencent.com>
Subject: Re: [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta
Date: Tue, 23 Jan 2024 18:12:04 +0530	[thread overview]
Message-ID: <Za-0HCP7WG3PIe7h@linux.ibm.com> (raw)
In-Reply-To: <20240111115745.62813-2-zegao@tencent.com>

On Thu, Jan 11, 2024 at 06:57:46AM -0500, Ze Gao wrote:
> AFAIS, We've overlooked what role of the concept of time quanta plays
> in EEVDF. According to Theorem 1 in [1], we have
> 
> 	-r_max < log_k(t) < max(r_max, q)
> 
> cleary we don't want either r_max (the maximum user request) or q (time
> quanta) to be too much big.
> 
> To trade for throughput, in [2] it chooses to do tick preemtion at
> per request boundary (i.e., once a cetain request is fulfilled), which
> means we literally have no concept of time quanta defined anymore.
> Obviously this is no problem if we make
> 
> 	q = r_i = sysctl_sched_base_slice
> 
> just as exactly what we have for now, which actually creates a implict
> quanta for us and works well.
> 
> However, with custom slice being possible, the lag bound is subject
> only to the distribution of users requested slices given the fact no
> time quantum is available now and we would pay the cost of losing
> many scheduling opportunities to maintain fairness and responsiveness
> due to [2]. What's worse, we may suffer unexpected unfairness and
> lantecy.
> 
> For example, take two cpu bound processes with the same weight and bind
> them to the same cpu, and let process A request for 100ms whereas B
> request for 0.1ms each time (with HZ=1000, sysctl_sched_base_slice=3ms,
> nr_cpu=42).  And we can clearly see that playing with custom slice can
> actually incur unfair cpu bandwidth allocation (10706 whose request
> length is 0.1ms gets more cpu time as well as better latency compared to
> 10705. Note you might see the other way around in different machines but
> the allocation inaccuracy retains, and even top can show you the
> noticeble difference in terms of cpu util by per second reporting), which
> is obviously not what we want because that would mess up the nice system
> and fairness would not hold.

Hi, How are you setting custom request values for process A and B?

> 
> 			stress-ng-cpu:10705	stress-ng-cpu:10706
> ---------------------------------------------------------------------
> Slices(ms)		100			0.1
> Runtime(ms)		4934.206		5025.048
> Switches		58			67
> Average delay(ms)	87.074			73.863
> Maximum delay(ms)	101.998			101.010
> 
> In contrast, using sysctl_sched_base_slice as the size of a 'quantum'
> in this patch gives us a better control of the allocation accuracy and
> the avg latency:
> 
> 			stress-ng-cpu:10584	stress-ng-cpu:10583
> ---------------------------------------------------------------------
> Slices(ms)		100			0.1
> Runtime(ms)		4980.309		4981.356
> Switches		1253			1254
> Average delay(ms)	3.990			3.990
> Maximum delay(ms)	5.001			4.014
> 
> Furthmore, with sysctl_sched_base_slice = 10ms, we might benefit from
> less switches at the cost of worse delay:
> 
> 			stress-ng-cpu:11208	stress-ng-cpu:11207
> ---------------------------------------------------------------------
> Slices(ms)		100			0.1
> Runtime(ms)		4983.722		4977.035
> Switches		456			456
> Average delay(ms)	10.963			10.939
> Maximum delay(ms)	19.002			21.001
> 
> By being able to tune sysctl_sched_base_slice knob, we can achieve
> the goal to strike a good balance between throughput and latency by
> adjusting the frequency of context switches, and the conclusions are
> much close to what's covered in [1] with the explicit definition of
> a time quantum. And it aslo gives more freedom to choose the eligible
> request length range(either through nice value or raw value)
> without worrying about overscheduling or underscheduling too much.
> 
> Note this change should introduce no obvious regression because all
> processes have the same request length as sysctl_sched_base_slice as
> in the status quo. And the result of benchmarks proves this as well.
> 
> schbench -m2 -F128 -n10	-r90	w/patch	tip/6.7-rc7
> Wakeup  (usec): 99.0th:		3028	95
> Request (usec): 99.0th:		14992	21984
> RPS    (count): 50.0th:		5864	5848
> 
> hackbench -s 512 -l 200 -f 25 -P	w/patch	 tip/6.7-rc7
> -g 10 					0.212	0.223
> -g 20					0.415	0.432
> -g 30				 	0.625	0.639
> -g 40					0.852	0.858
> 
> [1]: https://dl.acm.org/doi/10.5555/890606
> [2]: https://lore.kernel.org/all/20230420150537.GC4253@hirez.programming.kicks-ass.net/T/#u
> 
> Signed-off-by: Ze Gao <zegao@tencent.com>
> ---

next prev parent reply	other threads:[~2024-01-23 12:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-11 11:57 [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta Ze Gao
2024-01-23 12:42 ` Vishal Chourasia [this message]
2024-01-24  2:32   ` Ze Gao
2024-02-02 11:50     ` Vishal Chourasia
2024-02-04  3:05       ` Ze Gao
2024-02-05  7:37         ` Vishal Chourasia
2024-02-06  7:50           ` Ze Gao
2024-02-06 13:09 ` Luis Machado
2024-02-07  3:05   ` Ze Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Za-0HCP7WG3PIe7h@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zegao2021@gmail.com \
    --cc=zegao@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox