public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
To: "xianhe.zhou(周显赫)" <xianhe.zhou@transsion.com>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"juri.lelli@redhat.com" <juri.lelli@redhat.com>,
	"incent.guittot@linaro.org" <incent.guittot@linaro.org>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"dietmar.eggemann@arm.com" <dietmar.eggemann@arm.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"bsegall@google.com" <bsegall@google.com>,
	"mgorman@suse.de" <mgorman@suse.de>,
	"vschneid@redhat.com" <vschneid@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Subject: Re: [PATCH RESEND] sched: Fix incorrect runnable time calculation in sched_stat_wait tracepoint
Date: Wed, 12 Mar 2025 23:37:03 +0530	[thread overview]
Message-ID: <f3de3cc0-e519-42f7-aa0d-261c5f20ce59@linux.ibm.com> (raw)
In-Reply-To: <98cac8ca355d4aad963deb52a427fa2c@transsion.com>

On 06/03/25 08:09, xianhe.zhou(周显赫) wrote:
> From b07df3ed3fd2186a698ef87ad366523ec7c7fccb Mon Sep 17 00:00:00 2001
> From: "xianhe.zhou" <xianhe.zhou@transsion.com>
> Date: Thu, 20 Feb 2025 22:01:26 +0800
> Subject: [PATCH] sched: Fix incorrect runnable time calculation in
>  sched_stat_wait tracepoint
> 
> The sched_stat_wait tracepoint is intended to record the runnable
> time of a task. However, due to incorrect timestamp handling, two major
> issues arise:
> 
> 1. The tracepoint may incorrectly record consecutive start timestamps
>    or consecutive end timestamps, rather than properly pairing them.
>    This leads to incorrect runnable time calculations.
> 2. The recorded time may include temporary dequeuing and enqueuing
>    events, such as when modifying a task's nice value while it is running.
> 
> This patch ensures that timestamps are properly paired and excludes
> temporary dequeuing events to prevent incorrect measurements.
> 
> Testing:
> This issue was tested on an Android 14 device. The following are the
> trace results before and after applying the patch:
> 
> Before the fix (incorrect values):
> 
> sched_stat_wait: comm=OverlayEngine_0 pid=1533 delay=352418315839 [ns]
> sched_stat_wait: comm=OverlayEngine_0 pid=1533 delay=352409961840 [ns]
> sched_stat_wait: comm=OverlayEngine_0 pid=1533 delay=352393317455 [ns]
> sched_stat_wait: comm=binder:1148_3 pid=2102 delay=352380493377 [ns]
> sched_stat_wait: comm=OverlayEngine_0 pid=1533 delay=352360033377 [ns]
> 
> Clearly, these runnable time values, in the hundreds of seconds,
> are incorrect.
> 
> After the fix (corrected values):
> 
> sched_stat_wait: comm=kworker/7:1 pid=13682 delay=63516923 [ns]
> sched_stat_wait: comm=SuperA_R[136] pid=15074 delay=51324385 [ns]
> sched_stat_wait: comm=SuperA_R[136] pid=15080 delay=50280539 [ns]
> sched_stat_wait: comm=SuperA_R[136] pid=15071 delay=48995692 [ns]
> sched_stat_wait: comm=SuperA_R[34] pid=14951 delay=41876308 [ns]
> 
> After the fix, the recorded runnable times are within expected ranges.

As per my understanding, when a temporary dequeue occurs, the delay value
printed should be smaller compared to when these temporary events are
skipped, as the delay is then computed between the final dequeue and the
original start time. 

I think I am missing something. Can you take an example and explain?

Thanks,
Madadi Vineeth Reddy

> 
> Fixes: 768d0c27226e ("sched: Add wait, sleep and iowait accounting tracepoints")
> Signed-off-by: xianhe.zhou <xianhe.zhou@transsion.com>
> ---
>  kernel/sched/fair.c  | 13 +++++++------
>  kernel/sched/rt.c    | 15 ++++++++++++---
>  kernel/sched/stats.c |  7 ++++++-
>  3 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1c0ef435a7aa..37a315b3906b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1331,11 +1331,12 @@ update_stats_enqueue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int fl
>  	if (!schedstat_enabled())
>  		return;
>  
> -	/*
> -	 * Are we enqueueing a waiting task? (for current tasks
> -	 * a dequeue/enqueue event is a NOP)
> +    /*
> +     * Are we enqueueing a waiting task? And it is not
> +     * a temporary enqueue?(for current tasks a
> +     * dequeue/enqueue event is a NOP)
>  	 */
> -	if (se != cfs_rq->curr)
> +	if (se != cfs_rq->curr && !(flags & ENQUEUE_RESTORE))
>  		update_stats_wait_start_fair(cfs_rq, se);
>  
>  	if (flags & ENQUEUE_WAKEUP)
> @@ -1351,9 +1352,9 @@ update_stats_dequeue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int fl
>  
>  	/*
>  	 * Mark the end of the wait period if dequeueing a
> -	 * waiting task:
> +	 * waiting task and it is not a temporary dequeue:
>  	 */
> -	if (se != cfs_rq->curr)
> +	if (se != cfs_rq->curr && !(flags & DEQUEUE_SAVE))
>  		update_stats_wait_end_fair(cfs_rq, se);
>  
>  	if ((flags & DEQUEUE_SLEEP) && entity_is_task(se)) {
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 4b8e33c615b1..991251feeaf6 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1310,9 +1310,17 @@ static inline void
>  update_stats_enqueue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
>  			int flags)
>  {
> +	struct task_struct *p = NULL;
> +
>  	if (!schedstat_enabled())
>  		return;
>  
> +	if (rt_entity_is_task(rt_se))
> +		p = rt_task_of(rt_se);
> +
> +	if (p != rq_of_rt_rq(rt_rq)->curr && !(flags & ENQUEUE_RESTORE))
> +		update_stats_wait_start_rt(rt_rq, rt_se);
> +
>  	if (flags & ENQUEUE_WAKEUP)
>  		update_stats_enqueue_sleeper_rt(rt_rq, rt_se);
>  }
> @@ -1348,6 +1356,10 @@ update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
>  	if (rt_entity_is_task(rt_se))
>  		p = rt_task_of(rt_se);
>  
> +	if (p != rq_of_rt_rq(rt_rq)->curr && !(flags & DEQUEUE_SAVE)) {
> +		update_stats_wait_end_rt(rt_rq, rt_se);
> +	}
> +
>  	if ((flags & DEQUEUE_SLEEP) && p) {
>  		unsigned int state;
>  
> @@ -1474,9 +1486,6 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>  	if (flags & ENQUEUE_WAKEUP)
>  		rt_se->timeout = 0;
>  
> -	check_schedstat_required();
> -	update_stats_wait_start_rt(rt_rq_of_se(rt_se), rt_se);
> -
>  	enqueue_rt_entity(rt_se, flags);
>  
>  	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
> diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
> index 4346fd81c31f..367f4bb1c759 100644
> --- a/kernel/sched/stats.c
> +++ b/kernel/sched/stats.c
> @@ -20,8 +20,13 @@ void __update_stats_wait_start(struct rq *rq, struct task_struct *p,
>  void __update_stats_wait_end(struct rq *rq, struct task_struct *p,
>  			     struct sched_statistics *stats)
>  {
> -	u64 delta = rq_clock(rq) - schedstat_val(stats->wait_start);
> +	u64 delta;
> +	u64 wait_start = schedstat_val(stats->wait_start);
>  
> +	if (!wait_start)
> +		return;
> +
> +	delta = rq_clock(rq) - wait_start;
>  	if (p) {
>  		if (task_on_rq_migrating(p)) {
>  			/*


  reply	other threads:[~2025-03-12 18:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06  2:39 [PATCH RESEND] sched: Fix incorrect runnable time calculation in sched_stat_wait tracepoint xianhe.zhou(周显赫)
2025-03-12 18:07 ` Madadi Vineeth Reddy [this message]
  -- strict thread matches above, loose matches on Subject: below --
2025-04-07  3:21 xianhe.zhou(周显赫)
2025-03-17  7:03 xianhe.zhou(周显赫)
2025-03-05  8:09 xianhe.zhou(周显赫)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f3de3cc0-e519-42f7-aa0d-261c5f20ce59@linux.ibm.com \
    --to=vineethr@linux.ibm.com \
    --cc=98cac8ca355d4aad963deb52a427fa2c@transsion.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=incent.guittot@linaro.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xianhe.zhou@transsion.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox