* [PATCHv2 1/2] sched: introduce helper function to calculate distribution over sched class
@ 2024-02-22 9:22 zhaoyang.huang
2024-02-22 9:22 ` [PATCHv2 2/2] block: adjust CFS request expire time zhaoyang.huang
2024-02-23 6:41 ` [PATCHv2 1/2] sched: introduce helper function to calculate distribution over sched class Christoph Hellwig
0 siblings, 2 replies; 3+ messages in thread
From: zhaoyang.huang @ 2024-02-22 9:22 UTC (permalink / raw)
To: Vincent Guittot, Jens Axboe, linux-block, linux-kernel,
Zhaoyang Huang, steve.kang
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
As RT, DL, IRQ time could be deemed as lost time of CFS's task, some
timing value want to know the distribution of how these spread
approximately by using utilization account value (nivcsw is not enough
sometimes). This commit would like to introduce a helper function to
achieve this goal.
eg.
Effective part of A = Total_time * cpu_util_cfs / cpu_util
Timing value A
(should be a process last for several TICKs or statistics of a repeadted
process)
Timing start
|
|
preempted by RT, DL or IRQ
|\
| This period time is nonvoluntary CPU give up, need to know how long
|/
sched in again
|
|
|
Timing end
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
change of v2: using two parameter to pass se_prop and rq_prop out
---
---
include/linux/sched.h | 3 +++
kernel/sched/core.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..d6d5914fad10 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2318,6 +2318,9 @@ static inline bool owner_on_cpu(struct task_struct *owner)
/* Returns effective CPU energy utilization, as seen by the scheduler */
unsigned long sched_cpu_util(int cpu);
+/* Returns task's and cfs_rq's proportion among whole core */
+unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long *se_prop,
+ unsigned long *rq_prop);
#endif /* CONFIG_SMP */
#ifdef CONFIG_RSEQ
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 802551e0009b..b8c29dff5d37 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7494,6 +7494,41 @@ unsigned long sched_cpu_util(int cpu)
{
return effective_cpu_util(cpu, cpu_util_cfs(cpu), ENERGY_UTIL, NULL);
}
+
+/*
+ * Calculate the approximate proportion of timing value consumed by the specified
+ * tsk and all cfs tasks of this core.
+ * The user must be aware of this is done by avg_util which is tracked by
+ * the geometric series of decaying the load by y^32 = 0.5 (unit is 1ms).
+ * That is, only the period last for at least several TICKs or the statistics
+ * of repeated timing value are suitable for this helper function.
+ * This function is actually derived from effective_cpu_util but without
+ * limiting the util to the core's capacity.
+ * se_prop and rq_prop is valid only when return value is 1
+ */
+unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long *se_prop,
+ unsigned long *rq_prop)
+{
+ unsigned int cpu = task_cpu(tsk);
+ struct sched_entity *se = &tsk->se;
+ struct rq *rq = cpu_rq(cpu);
+ unsigned long util, irq, max;
+
+ if (tsk->sched_class != &fair_sched_class)
+ return 0;
+
+ max = arch_scale_cpu_capacity(cpu);
+ irq = cpu_util_irq(rq);
+
+ util = cpu_util_rt(rq) + cpu_util_cfs(cpu) + cpu_util_dl(rq);
+ util = scale_irq_capacity(util, irq, max);
+ util += irq;
+
+ *se_prop = se->avg.util_avg * 100 / util;
+ *rq_prop = cpu_util_cfs(cpu) * 100 / util;
+ return 1;
+}
+
#endif /* CONFIG_SMP */
/**
--
2.25.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCHv2 2/2] block: adjust CFS request expire time
2024-02-22 9:22 [PATCHv2 1/2] sched: introduce helper function to calculate distribution over sched class zhaoyang.huang
@ 2024-02-22 9:22 ` zhaoyang.huang
2024-02-23 6:41 ` [PATCHv2 1/2] sched: introduce helper function to calculate distribution over sched class Christoph Hellwig
1 sibling, 0 replies; 3+ messages in thread
From: zhaoyang.huang @ 2024-02-22 9:22 UTC (permalink / raw)
To: Vincent Guittot, Jens Axboe, linux-block, linux-kernel,
Zhaoyang Huang, steve.kang
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
According to current policy, CFS's may suffer involuntary IO-latency by
being preempted by RT/DL tasks or IRQ since they possess the privilege for
both of CPU and IO scheduler. This commit introduce an approximate and
light method to decrease these affection by adjusting the expire time
via the CFS's proportion among the whole cpu active time.
The average utilization of cpu's run queue could reflect the historical
active proportion of different types of task that can be proved valid for
this goal from belowing three perspective,
1. All types of sched class's load(util) are tracked and calculated in the
same way(using a geometric series which known as PELT)
2. Keep the legacy policy by NOT adjusting rq's position in fifo_list
but only make changes over expire_time.
3. The fixed expire time(hundreds of ms) is in the same range of cpu
avg_load's account series(the utilization will be decayed to 0.5 in 32ms)
TaskA
sched in
|
|
|
submit_bio
|
|
|
fifo_time = jiffies + expire
(insert_request)
TaskB
sched in
|
|
vfs_xxx
|
|preempted by RT,DL,IRQ
|\
| This period time is unfair to TaskB's IO request, should be adjust
|/
|
submit_bio
|
|
|
fifo_time = jiffies + expire * CFS_PROPORTION(rq)
(insert_request)
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
change of v2: introduce direction and threshold to make the hack working
as a guard for CFS's over-preempted.
---
---
block/mq-deadline.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index f958e79277b8..b477ba1bf6d2 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -54,6 +54,7 @@ enum dd_prio {
enum { DD_PRIO_COUNT = 3 };
+#define CFS_PROP_THRESHOLD 60
/*
* I/O statistics per I/O priority. It is fine if these counters overflow.
* What matters is that these counters are at least as wide as
@@ -802,6 +803,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
struct dd_per_prio *per_prio;
enum dd_prio prio;
+ int fifo_expire;
lockdep_assert_held(&dd->lock);
@@ -828,6 +830,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
rq->fifo_time = jiffies;
} else {
struct list_head *insert_before;
+ unsigned long se_prop, rq_prop;
deadline_add_rq_rb(per_prio, rq);
@@ -839,8 +842,21 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
/*
* set expire time and add to fifo list
+ * The expire time is adjusted when current CFS task is
+ * over-preempted by RT/DL/IRQ which is calculated by the
+ * proportion of cfs_rq's activation among whole cpu time during
+ * last several dozen's ms.Whearas, this would NOT affect the
+ * rq's position in fifo_list but only take effect when this
+ * rq is checked for its expire time when at head.
*/
- rq->fifo_time = jiffies + dd->fifo_expire[data_dir];
+ fifo_expire = dd->fifo_expire[data_dir];
+ if (data_dir == DD_READ &&
+ cfs_prop_by_util(current, &se_prop, &rq_prop) &&
+ rq_prop < CFS_PROP_THRESHOLD)
+ fifo_expire = dd->fifo_expire[data_dir] * rq_prop / 100;
+
+ rq->fifo_time = jiffies + fifo_expire;
+
insert_before = &per_prio->fifo_list[data_dir];
#ifdef CONFIG_BLK_DEV_ZONED
/*
--
2.25.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCHv2 1/2] sched: introduce helper function to calculate distribution over sched class
2024-02-22 9:22 [PATCHv2 1/2] sched: introduce helper function to calculate distribution over sched class zhaoyang.huang
2024-02-22 9:22 ` [PATCHv2 2/2] block: adjust CFS request expire time zhaoyang.huang
@ 2024-02-23 6:41 ` Christoph Hellwig
1 sibling, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2024-02-23 6:41 UTC (permalink / raw)
To: zhaoyang.huang
Cc: Vincent Guittot, Jens Axboe, linux-block, linux-kernel,
Zhaoyang Huang, steve.kang
On Thu, Feb 22, 2024 at 05:22:19PM +0800, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> As RT, DL, IRQ time could be deemed as lost time of CFS's task, some
> timing value want to know the distribution of how these spread
> approximately by using utilization account value (nivcsw is not enough
> sometimes). This commit would like to introduce a helper function to
> achieve this goal.
Maybe I'm just thick but this still looks like alphabet soup to me.
Can you try to exlain why this matters, or maybe get help from the
scheduler folks to help with explaining the concepts?
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-02-23 6:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-22 9:22 [PATCHv2 1/2] sched: introduce helper function to calculate distribution over sched class zhaoyang.huang
2024-02-22 9:22 ` [PATCHv2 2/2] block: adjust CFS request expire time zhaoyang.huang
2024-02-23 6:41 ` [PATCHv2 1/2] sched: introduce helper function to calculate distribution over sched class Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).