From: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
To: Fernand Sieber <sieberf@amazon.com>
Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Namhyung Kim <namhyung@kernel.org>,
Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Subject: Re: [PATCH] perf: Timehist account sch delay for scheduled out running
Date: Sun, 16 Jun 2024 02:20:59 +0530 [thread overview]
Message-ID: <1bb99f6f-532f-47cb-963c-c5c0262e383c@linux.ibm.com> (raw)
In-Reply-To: <20240613185906.31082-1-sieberf@amazon.com>
Hi Fernand,
On 14/06/24 00:29, Fernand Sieber wrote:
> When using perf timehist, sch delay is only computed for a waking task,
> not for a pre empted task. This patches addresses this problem.
>
> Example of `perf timehist` report before the patch for `stress` task
> competing with each other.
>
> First column is wait time, second column sch delay, third column
> runtime.
> ```
> 1.492060 [0000] s stress[81] 1.999 0.000 2.000 R next: stress[83]
> 1.494060 [0000] s stress[83] 2.000 0.000 2.000 R next: stress[81]
> 1.496060 [0000] s stress[81] 2.000 0.000 2.000 R next: stress[83]
> 1.498060 [0000] s stress[83] 2.000 0.000 1.999 R next: stress[81]
> ```
>
> After the patch, it looks like this (note that all wait time is now sch
> delay instead of zero):
> ```
> 1.492060 [0000] s stress[81] 1.999 1.999 2.000 R next: stress[83]
> 1.494060 [0000] s stress[83] 2.000 2.000 2.000 R next: stress[81]
> 1.496060 [0000] s stress[81] 2.000 2.000 2.000 R next: stress[83]
> 1.498060 [0000] s stress[83] 2.000 2.000 1.999 R next: stress[81]
> ```
>
> In timehist:
> * wait time represents the duration waiting for any system resource
> * sch delay represents the duration waiting for cpu system resources
According to the documentation(perf-sched man page), scheduling delay is defined
as the "time between wakeup and actually running."
When a task switches out, if its state is 'R' , then the task would switch-in
again without waking and wakeup in between.Therefore, that duration is not
currently calculated as scheduling delay, which aligns with the man page
description.
I agree that the time a task is preempted involuntarily and is waiting to run
could be considered part of the scheduling delay. The definition is somewhat
vague in this regard.
If you decide to proceed with this change, please ensure the man page is updated
accordingly.
Regarding wait time, the man page defines it as the "time between sched-out and
the next sched-in events for the task." This measurement does not depend on the
task state and includes time spent on a voluntary switch.
Thanks and Regards
Madadi Vineeth Reddy
>
> This is based on perf comments (dt_wait = wait time, dt_delay = sch
> delay):
> ```
> /*
> * Explanation of delta-time stats:
> *
> * t = time of current schedule out event
> * tprev = time of previous sched out event
> * also time of schedule-in event for current task
> * last_time = time of last sched change event for current task
> * (i.e, time process was last scheduled out)
> * ready_to_run = time of wakeup for current task
> *
> * -----|------------|------------|------------|------
> * last ready tprev t
> * time to run
> *
> * |-------- dt_wait --------|
> * |- dt_delay -|-- dt_run --|
> *
> * dt_run = run time of current task
> * dt_wait = time between last schedule out event for task and tprev
> * represents time spent off the cpu
> * dt_delay = time between wakeup and schedule-in of task
> */
> ```
>
> The problem with the current logic is that last time is only set when
> waking a task. Therefore it is not set for a pre empted task. To fix
> this, we set last time to the current sample time if a scheduled out
> task (on the switch tracepoint) is in state running.
>
> Signed-off-by: Fernand Sieber <sieberf@amazon.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> ---
> tools/perf/builtin-sched.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 5977c49ae2c7..7422c930abaf 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -2659,7 +2659,10 @@ static int timehist_sched_change_event(struct perf_tool *tool,
> tr->last_state = state;
>
> /* sched out event for task so reset ready to run time */
> - tr->ready_to_run = 0;
> + if (state == 'R')
> + tr->ready_to_run = t;
> + else
> + tr->ready_to_run = 0;
> }
>
> evsel__save_time(evsel, sample->time, sample->cpu);
next prev parent reply other threads:[~2024-06-15 20:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-13 18:59 [PATCH] perf: Timehist account sch delay for scheduled out running Fernand Sieber
2024-06-15 20:50 ` Madadi Vineeth Reddy [this message]
2024-06-18 8:44 ` Sieber, Fernand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1bb99f6f-532f-47cb-963c-c5c0262e383c@linux.ibm.com \
--to=vineethr@linux.ibm.com \
--cc=20240613185906.31082-1-sieberf@amazon.com \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=sieberf@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).