From: "xianhe.zhou(周显赫)" <xianhe.zhou@transsion.com>
To: "mingo@redhat.com" <mingo@redhat.com>,
"peterz@infradead.org" <peterz@infradead.org>,
"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
"incent.guittot@linaro.org" <incent.guittot@linaro.org>,
"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>
Cc: "dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
"rostedt@goodmis.org" <rostedt@goodmis.org>,
"bsegall@google.com" <bsegall@google.com>,
"mgorman@suse.de" <mgorman@suse.de>,
"vschneid@redhat.com" <vschneid@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: [PATCH RESEND] sched: Fix incorrect runnable time calculation in sched_stat_wait tracepoint
Date: Wed, 5 Mar 2025 08:09:02 +0000 [thread overview]
Message-ID: <88f3d595597443bc96fd7f11b02ccc15@transsion.com> (raw)
From b07df3ed3fd2186a698ef87ad366523ec7c7fccb Mon Sep 17 00:00:00 2001
From: "xianhe.zhou" <xianhe.zhou@transsion.com>
Date: Thu, 20 Feb 2025 22:01:26 +0800
Subject: [PATCH] sched: Fix incorrect runnable time calculation in sched_stat_wait tracepoint
The sched_stat_wait tracepoint is intended to record the runnable time of a task. However, due to incorrect timestamp handling, two major issues arise:
1. The tracepoint may incorrectly record consecutive start timestamps
or consecutive end timestamps, rather than properly pairing them.
This leads to incorrect runnable time calculations.
2. The recorded time may include temporary dequeuing and enqueuing
events, such as when modifying a task's nice value while it is running.
This patch ensures that timestamps are properly paired and excludes temporary dequeuing events to prevent incorrect measurements.
Testing:
This issue was tested on an Android 14 device. The following are the trace results before and after applying the patch:
Before the fix (incorrect values):
sched_stat_wait: comm=OverlayEngine_0 pid=1533 delay=352418315839 [ns]
sched_stat_wait: comm=OverlayEngine_0 pid=1533 delay=352409961840 [ns]
sched_stat_wait: comm=OverlayEngine_0 pid=1533 delay=352393317455 [ns]
sched_stat_wait: comm=binder:1148_3 pid=2102 delay=352380493377 [ns]
sched_stat_wait: comm=OverlayEngine_0 pid=1533 delay=352360033377 [ns]
Clearly, these runnable time values, in the hundreds of seconds, are incorrect.
After the fix (corrected values):
sched_stat_wait: comm=kworker/7:1 pid=13682 delay=63516923 [ns]
sched_stat_wait: comm=SuperA_R[136] pid=15074 delay=51324385 [ns]
sched_stat_wait: comm=SuperA_R[136] pid=15080 delay=50280539 [ns]
sched_stat_wait: comm=SuperA_R[136] pid=15071 delay=48995692 [ns]
sched_stat_wait: comm=SuperA_R[34] pid=14951 delay=41876308 [ns]
After the fix, the recorded runnable times are within expected ranges.
Fixes: 768d0c27226e ("sched: Add wait, sleep and iowait accounting tracepoints")
Signed-off-by: xianhe.zhou <xianhe.zhou@transsion.com>
---
kernel/sched/fair.c | 13 +++++++------
kernel/sched/rt.c | 15 ++++++++++++---
kernel/sched/stats.c | 7 ++++++-
3 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1c0ef435a7aa..37a315b3906b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1331,11 +1331,12 @@ update_stats_enqueue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int fl
if (!schedstat_enabled())
return;
- /*
- * Are we enqueueing a waiting task? (for current tasks
- * a dequeue/enqueue event is a NOP)
+ /*
+ * Are we enqueueing a waiting task? And it is not
+ * a temporary enqueue?(for current tasks a
+ * dequeue/enqueue event is a NOP)
*/
- if (se != cfs_rq->curr)
+ if (se != cfs_rq->curr && !(flags & ENQUEUE_RESTORE))
update_stats_wait_start_fair(cfs_rq, se);
if (flags & ENQUEUE_WAKEUP)
@@ -1351,9 +1352,9 @@ update_stats_dequeue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int fl
/*
* Mark the end of the wait period if dequeueing a
- * waiting task:
+ * waiting task and it is not a temporary dequeue:
*/
- if (se != cfs_rq->curr)
+ if (se != cfs_rq->curr && !(flags & DEQUEUE_SAVE))
update_stats_wait_end_fair(cfs_rq, se);
if ((flags & DEQUEUE_SLEEP) && entity_is_task(se)) { diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 4b8e33c615b1..991251feeaf6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1310,9 +1310,17 @@ static inline void update_stats_enqueue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
int flags)
{
+ struct task_struct *p = NULL;
+
if (!schedstat_enabled())
return;
+ if (rt_entity_is_task(rt_se))
+ p = rt_task_of(rt_se);
+
+ if (p != rq_of_rt_rq(rt_rq)->curr && !(flags & ENQUEUE_RESTORE))
+ update_stats_wait_start_rt(rt_rq, rt_se);
+
if (flags & ENQUEUE_WAKEUP)
update_stats_enqueue_sleeper_rt(rt_rq, rt_se); } @@ -1348,6 +1356,10 @@ update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
if (rt_entity_is_task(rt_se))
p = rt_task_of(rt_se);
+ if (p != rq_of_rt_rq(rt_rq)->curr && !(flags & DEQUEUE_SAVE)) {
+ update_stats_wait_end_rt(rt_rq, rt_se);
+ }
+
if ((flags & DEQUEUE_SLEEP) && p) {
unsigned int state;
@@ -1474,9 +1486,6 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
if (flags & ENQUEUE_WAKEUP)
rt_se->timeout = 0;
- check_schedstat_required();
- update_stats_wait_start_rt(rt_rq_of_se(rt_se), rt_se);
-
enqueue_rt_entity(rt_se, flags);
if (!task_current(rq, p) && p->nr_cpus_allowed > 1) diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c index 4346fd81c31f..367f4bb1c759 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -20,8 +20,13 @@ void __update_stats_wait_start(struct rq *rq, struct task_struct *p, void __update_stats_wait_end(struct rq *rq, struct task_struct *p,
struct sched_statistics *stats) {
- u64 delta = rq_clock(rq) - schedstat_val(stats->wait_start);
+ u64 delta;
+ u64 wait_start = schedstat_val(stats->wait_start);
+ if (!wait_start)
+ return;
+
+ delta = rq_clock(rq) - wait_start;
if (p) {
if (task_on_rq_migrating(p)) {
/*
--
2.34.1
next reply other threads:[~2025-03-05 8:09 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-05 8:09 xianhe.zhou(周显赫) [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-03-06 2:39 [PATCH RESEND] sched: Fix incorrect runnable time calculation in sched_stat_wait tracepoint xianhe.zhou(周显赫)
2025-03-12 18:07 ` Madadi Vineeth Reddy
2025-03-17 7:03 xianhe.zhou(周显赫)
2025-04-07 3:21 xianhe.zhou(周显赫)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=88f3d595597443bc96fd7f11b02ccc15@transsion.com \
--to=xianhe.zhou@transsion.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=incent.guittot@linaro.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox