From: Peter Zijlstra <peterz@infradead.org>
To: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>,
linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org,
ohaugan@codeaurora.org
Subject: Re: [PATCH] sched: fix incorrect wait time and wait count statistics
Date: Tue, 27 Oct 2015 13:57:28 +0100 [thread overview]
Message-ID: <20151027125728.GA3201@worktop.amr.corp.intel.com> (raw)
In-Reply-To: <562ED710.7040009@codeaurora.org>
(Excessive quoting for Olav)
On Mon, Oct 26, 2015 at 06:44:48PM -0700, Joonwoo Park wrote:
> On 10/25/2015 03:26 AM, Peter Zijlstra wrote:
> > Also note that on both sites we also set TASK_ON_RQ_MIGRATING -- albeit
> > late. Can't you simply set that earlier (and back to QUEUED later) and
> > test for task_on_rq_migrating() instead of blowing up the fastpath like
> > you did?
> >
>
> Yes it's doable. I also find it's much simpler.
> From 98d615d46211a90482a0f9b7204265c54bba8520 Mon Sep 17 00:00:00 2001
> From: Joonwoo Park <joonwoop@codeaurora.org>
> Date: Mon, 26 Oct 2015 16:37:47 -0700
> Subject: [PATCH v2] sched: fix incorrect wait time and wait count statistics
>
> At present scheduler resets task's wait start timestamp when the task
> migrates to another rq. This misleads scheduler itself into reporting
> less wait time than actual by omitting time spent for waiting prior to
> migration and also more wait count than actual by counting migration as
> wait end event which can be seen by trace or /proc/<pid>/sched with
> CONFIG_SCHEDSTATS=y.
>
> Carry forward migrating task's wait time prior to migration and
> don't count migration as a wait end event to fix such statistics error.
>
> In order to determine whether task is migrating mark task->on_rq with
> TASK_ON_RQ_MIGRATING while dequeuing and enqueuing due to migration.
>
> To: Ingo Molnar <mingo@kernel.org>
> To: Peter Zijlstra <peterz@infradead.org>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
> ---
So now that you rely on TASK_ON_RQ_MIGRATING; I think you missed one
place that can migrate sched_fair tasks and doesn't set it.
Olav recently did a patch adding TASK_ON_RQ_MIGRATING to _every_
migration path, but that is (still) somewhat overkill. With your changes
we need it for sched_fair though.
So I think you need to change __migrate_swap_task(), which is used by
the NUMA scheduling to swap two running tasks.
Also, it might be prudent to extend the CONFIG_SCHED_DEBUG ifdef in
set_task_cpu() to test for this new requirement:
WARN_ON_ONCE(p->state == TASK_RUNNING &&
p->class == &fair_sched_class &&
p->on_rq != TASK_ON_RQ_MIGRATING);
> kernel/sched/core.c | 4 ++--
> kernel/sched/fair.c | 17 ++++++++++++++---
> 2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bcd214e..d9e4ad5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1069,8 +1069,8 @@ static struct rq *move_queued_task(struct rq *rq, struct task_struct *p, int new
> {
> lockdep_assert_held(&rq->lock);
>
> - dequeue_task(rq, p, 0);
> p->on_rq = TASK_ON_RQ_MIGRATING;
> + dequeue_task(rq, p, 0);
> set_task_cpu(p, new_cpu);
> raw_spin_unlock(&rq->lock);
>
> @@ -1078,8 +1078,8 @@ static struct rq *move_queued_task(struct rq *rq, struct task_struct *p, int new
>
> raw_spin_lock(&rq->lock);
> BUG_ON(task_cpu(p) != new_cpu);
> - p->on_rq = TASK_ON_RQ_QUEUED;
> enqueue_task(rq, p, 0);
> + p->on_rq = TASK_ON_RQ_QUEUED;
> check_preempt_curr(rq, p, 0);
>
> return rq;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9a5e60f..7609576 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -740,7 +740,11 @@ static void update_curr_fair(struct rq *rq)
> static inline void
> update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> - schedstat_set(se->statistics.wait_start, rq_clock(rq_of(cfs_rq)));
> + schedstat_set(se->statistics.wait_start,
> + task_on_rq_migrating(task_of(se)) &&
> + likely(rq_clock(rq_of(cfs_rq)) > se->statistics.wait_start) ?
> + rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start :
> + rq_clock(rq_of(cfs_rq)));
So I get that you want to avoid emitting code for !SCHEDSTATS; but that
is rather unreadable. Either use the GCC stmt-expr or wrap the lot in
#ifdef.
> }
>
> /*
> @@ -759,6 +763,13 @@ static void update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
> static void
> update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> + if (task_on_rq_migrating(task_of(se))) {
Maybe add a comment that we adjust the wait_start time-base and will
rebase on the new rq_clock in update_stats_wait_start().
> + schedstat_set(se->statistics.wait_start,
> + rq_clock(rq_of(cfs_rq)) -
> + se->statistics.wait_start);
> + return;
> + }
> +
> schedstat_set(se->statistics.wait_max, max(se->statistics.wait_max,
> rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start));
> schedstat_set(se->statistics.wait_count, se->statistics.wait_count + 1);
> @@ -5656,8 +5667,8 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
> {
> lockdep_assert_held(&env->src_rq->lock);
>
> - deactivate_task(env->src_rq, p, 0);
> p->on_rq = TASK_ON_RQ_MIGRATING;
> + deactivate_task(env->src_rq, p, 0);
> set_task_cpu(p, env->dst_cpu);
> }
>
> @@ -5790,8 +5801,8 @@ static void attach_task(struct rq *rq, struct task_struct *p)
> lockdep_assert_held(&rq->lock);
>
> BUG_ON(task_rq(p) != rq);
> - p->on_rq = TASK_ON_RQ_QUEUED;
> activate_task(rq, p, 0);
> + p->on_rq = TASK_ON_RQ_QUEUED;
> check_preempt_curr(rq, p, 0);
> }
next prev parent reply other threads:[~2015-10-28 0:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-25 5:23 [PATCH] sched: fix incorrect wait time and wait count statistics Joonwoo Park
2015-10-25 10:08 ` Peter Zijlstra
2015-10-25 10:26 ` Peter Zijlstra
2015-10-27 1:44 ` Joonwoo Park
2015-10-27 12:57 ` Peter Zijlstra [this message]
2015-10-28 2:40 ` Joonwoo Park
2015-10-28 4:46 ` [PATCH v4] " Joonwoo Park
2015-11-06 13:57 ` Peter Zijlstra
2015-11-07 2:41 ` Joonwoo Park
2015-11-09 10:32 ` Peter Zijlstra
2015-11-13 3:38 ` [PATCH v5] " Joonwoo Park
2015-11-23 16:20 ` [tip:sched/core] sched/core: Fix " tip-bot for Joonwoo Park
2015-10-27 19:17 ` [PATCH v3] sched: fix " Joonwoo Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151027125728.GA3201@worktop.amr.corp.intel.com \
--to=peterz@infradead.org \
--cc=joonwoop@codeaurora.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=ohaugan@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.