Re: [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Ze Gao <zegao2021@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Valentin Schneider <vschneid@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org, Ze Gao <zegao@tencent.com>
Subject: Re: [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta
Date: Tue, 23 Jan 2024 18:12:04 +0530	[thread overview]
Message-ID: <Za-0HCP7WG3PIe7h@linux.ibm.com> (raw)
In-Reply-To: <20240111115745.62813-2-zegao@tencent.com>

On Thu, Jan 11, 2024 at 06:57:46AM -0500, Ze Gao wrote:
> AFAIS, We've overlooked what role of the concept of time quanta plays
> in EEVDF. According to Theorem 1 in [1], we have
> 
> 	-r_max < log_k(t) < max(r_max, q)
> 
> cleary we don't want either r_max (the maximum user request) or q (time
> quanta) to be too much big.
> 
> To trade for throughput, in [2] it chooses to do tick preemtion at
> per request boundary (i.e., once a cetain request is fulfilled), which
> means we literally have no concept of time quanta defined anymore.
> Obviously this is no problem if we make
> 
> 	q = r_i = sysctl_sched_base_slice
> 
> just as exactly what we have for now, which actually creates a implict
> quanta for us and works well.
> 
> However, with custom slice being possible, the lag bound is subject
> only to the distribution of users requested slices given the fact no
> time quantum is available now and we would pay the cost of losing
> many scheduling opportunities to maintain fairness and responsiveness
> due to [2]. What's worse, we may suffer unexpected unfairness and
> lantecy.
> 
> For example, take two cpu bound processes with the same weight and bind
> them to the same cpu, and let process A request for 100ms whereas B
> request for 0.1ms each time (with HZ=1000, sysctl_sched_base_slice=3ms,
> nr_cpu=42).  And we can clearly see that playing with custom slice can
> actually incur unfair cpu bandwidth allocation (10706 whose request
> length is 0.1ms gets more cpu time as well as better latency compared to
> 10705. Note you might see the other way around in different machines but
> the allocation inaccuracy retains, and even top can show you the
> noticeble difference in terms of cpu util by per second reporting), which
> is obviously not what we want because that would mess up the nice system
> and fairness would not hold.

Hi, How are you setting custom request values for process A and B?

> 
> 			stress-ng-cpu:10705	stress-ng-cpu:10706
> ---------------------------------------------------------------------
> Slices(ms)		100			0.1
> Runtime(ms)		4934.206		5025.048
> Switches		58			67
> Average delay(ms)	87.074			73.863
> Maximum delay(ms)	101.998			101.010
> 
> In contrast, using sysctl_sched_base_slice as the size of a 'quantum'
> in this patch gives us a better control of the allocation accuracy and
> the avg latency:
> 
> 			stress-ng-cpu:10584	stress-ng-cpu:10583
> ---------------------------------------------------------------------
> Slices(ms)		100			0.1
> Runtime(ms)		4980.309		4981.356
> Switches		1253			1254
> Average delay(ms)	3.990			3.990
> Maximum delay(ms)	5.001			4.014
> 
> Furthmore, with sysctl_sched_base_slice = 10ms, we might benefit from
> less switches at the cost of worse delay:
> 
> 			stress-ng-cpu:11208	stress-ng-cpu:11207
> ---------------------------------------------------------------------
> Slices(ms)		100			0.1
> Runtime(ms)		4983.722		4977.035
> Switches		456			456
> Average delay(ms)	10.963			10.939
> Maximum delay(ms)	19.002			21.001
> 
> By being able to tune sysctl_sched_base_slice knob, we can achieve
> the goal to strike a good balance between throughput and latency by
> adjusting the frequency of context switches, and the conclusions are
> much close to what's covered in [1] with the explicit definition of
> a time quantum. And it aslo gives more freedom to choose the eligible
> request length range(either through nice value or raw value)
> without worrying about overscheduling or underscheduling too much.
> 
> Note this change should introduce no obvious regression because all
> processes have the same request length as sysctl_sched_base_slice as
> in the status quo. And the result of benchmarks proves this as well.
> 
> schbench -m2 -F128 -n10	-r90	w/patch	tip/6.7-rc7
> Wakeup  (usec): 99.0th:		3028	95
> Request (usec): 99.0th:		14992	21984
> RPS    (count): 50.0th:		5864	5848
> 
> hackbench -s 512 -l 200 -f 25 -P	w/patch	 tip/6.7-rc7
> -g 10 					0.212	0.223
> -g 20					0.415	0.432
> -g 30				 	0.625	0.639
> -g 40					0.852	0.858
> 
> [1]: https://dl.acm.org/doi/10.5555/890606
> [2]: https://lore.kernel.org/all/20230420150537.GC4253@hirez.programming.kicks-ass.net/T/#u
> 
> Signed-off-by: Ze Gao <zegao@tencent.com>
> ---

next prev parent reply	other threads:[~2024-01-23 12:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-11 11:57 [RFC PATCH] sched/eevdf: Use tunable knob sysctl_sched_base_slice as explicit time quanta Ze Gao
2024-01-23 12:42 ` Vishal Chourasia [this message]
2024-01-24  2:32   ` Ze Gao
2024-02-02 11:50     ` Vishal Chourasia
2024-02-04  3:05       ` Ze Gao
2024-02-05  7:37         ` Vishal Chourasia
2024-02-06  7:50           ` Ze Gao
2024-02-06 13:09 ` Luis Machado
2024-02-07  3:05   ` Ze Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Za-0HCP7WG3PIe7h@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=zegao2021@gmail.com \
    --cc=zegao@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.