From: Yong Zhang <yong.zhang0@gmail.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: mingo@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org,
tglx@linutronix.de, venki@google.com, mingo@elte.hu,
linux-tip-commits@vger.kernel.org
Subject: Re: [tip:sched/core] sched: Do not account irq time to current task
Date: Mon, 29 Nov 2010 22:22:13 +0800 [thread overview]
Message-ID: <20101129142213.GA2573@zhy> (raw)
In-Reply-To: <1291031990.32004.24.camel@laptop>
On Mon, Nov 29, 2010 at 12:59:50PM +0100, Peter Zijlstra wrote:
> On Mon, 2010-11-29 at 16:45 +0800, Yong Zhang wrote:
> > On Tue, Oct 19, 2010 at 3:26 AM, tip-bot for Venkatesh Pallipadi
> > <venki@google.com> wrote:
> > > Commit-ID: 305e6835e05513406fa12820e40e4a8ecb63743c
> > > Gitweb: http://git.kernel.org/tip/305e6835e05513406fa12820e40e4a8ecb63743c
> > > Author: Venkatesh Pallipadi <venki@google.com>
> > > AuthorDate: Mon, 4 Oct 2010 17:03:21 -0700
> > > Committer: Ingo Molnar <mingo@elte.hu>
> > > CommitDate: Mon, 18 Oct 2010 20:52:26 +0200
> > >
> > > sched: Do not account irq time to current task
> > >
> > > Scheduler accounts both softirq and interrupt processing times to the
> > > currently running task. This means, if the interrupt processing was
> > > for some other task in the system, then the current task ends up being
> > > penalized as it gets shorter runtime than otherwise.
> > >
> > > Change sched task accounting to acoount only actual task time from
> > > currently running task. Now update_curr(), modifies the delta_exec to
> > > depend on rq->clock_task.
> > >
> > > Note that this change only handles CONFIG_IRQ_TIME_ACCOUNTING case. We can
> > > extend this to CONFIG_VIRT_CPU_ACCOUNTING with minimal effort. But, thats
> > > for later.
> > >
> > > This change will impact scheduling behavior in interrupt heavy conditions.
> > >
> > > Tested on a 4-way system with eth0 handled by CPU 2 and a network heavy
> > > task (nc) running on CPU 3 (and no RSS/RFS). With that I have CPU 2
> > > spending 75%+ of its time in irq processing. CPU 3 spending around 35%
> > > time running nc task.
> > >
> > > Now, if I run another CPU intensive task on CPU 2, without this change
> > > /proc/<pid>/schedstat shows 100% of time accounted to this task. With this
> > > change, it rightly shows less than 25% accounted to this task as remaining
> > > time is actually spent on irq processing.
> > >
> > > Signed-off-by: Venkatesh Pallipadi <venki@google.com>
> > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > LKML-Reference: <1286237003-12406-7-git-send-email-venki@google.com>
> > > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > ---
> > > kernel/sched.c | 43 ++++++++++++++++++++++++++++++++++++++++---
> > > kernel/sched_fair.c | 6 +++---
> > > kernel/sched_rt.c | 8 ++++----
> > > 3 files changed, 47 insertions(+), 10 deletions(-)
> > >
> >
> > [snip]
> >
> > > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> > > index ab77aa0..bea7d79 100644
> > > --- a/kernel/sched_rt.c
> > > +++ b/kernel/sched_rt.c
> > > @@ -609,7 +609,7 @@ static void update_curr_rt(struct rq *rq)
> > > if (!task_has_rt_policy(curr))
> > > return;
> > >
> > > - delta_exec = rq->clock - curr->se.exec_start;
> > > + delta_exec = rq->clock_task - curr->se.exec_start;
> > > if (unlikely((s64)delta_exec < 0))
> > > delta_exec = 0;
> > >
> > > @@ -618,7 +618,7 @@ static void update_curr_rt(struct rq *rq)
> > > curr->se.sum_exec_runtime += delta_exec;
> > > account_group_exec_runtime(curr, delta_exec);
> > >
> > > - curr->se.exec_start = rq->clock;
> > > + curr->se.exec_start = rq->clock_task;
> > > cpuacct_charge(curr, delta_exec);
> > >
> > > sched_rt_avg_update(rq, delta_exec);
> >
> > Seems the above changes to update_curr_rt() have some false positive
> > to rt_bandwidth control.
> > For example:
> > rt_period=1000000;
> > rt_runtime=950000;
> > then if in that period the irq_time is no zero(such as 50000), according to
> > the throttle mechanism, rt is not throttled. In the end we left no
> > time to others.
> > It seems that this break the semantic of throttle.
> >
> > Maybe we can revert the change to update_curr_rt()?
>
> No, that's totally correct.
>
> Its the correct and desired behaviour, IRQ time is not time spend
> running the RT tasks, hence they should not be accounted for it.
Right.
>
> If you still want to throttle RT tasks simply ensure their bandwidth
> constraint is lower than the available time.
But the available time is harder to calculated than before.
IRQ is random, so as to the irq_time.
But the unthrottle(do_sched_rt_period_timer()) runs in fixed
period which is based on hard clock.
Is that what we want?
Thanks,
Yong
next prev parent reply other threads:[~2010-11-29 14:22 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-05 0:03 Proper kernel irq time accounting -v4 Venkatesh Pallipadi
2010-10-05 0:03 ` [PATCH 1/8] si time accounting accounts bh_disable'd time to si -v4 Venkatesh Pallipadi
2010-10-18 19:24 ` [tip:sched/core] sched: Fix softirq time accounting tip-bot for Venkatesh Pallipadi
2010-10-05 0:03 ` [PATCH 2/8] Consolidate account_system_vtime extern declaration -v4 Venkatesh Pallipadi
2010-10-18 19:24 ` [tip:sched/core] sched: Consolidate account_system_vtime extern declaration tip-bot for Venkatesh Pallipadi
2010-10-18 19:27 ` [tip:sched/core] sched: Export account_system_vtime() tip-bot for Ingo Molnar
2010-10-05 0:03 ` [PATCH 3/8] Add a PF flag for ksoftirqd identification Venkatesh Pallipadi
2010-10-15 14:26 ` Peter Zijlstra
2010-10-15 14:46 ` Eric Dumazet
2010-10-18 19:25 ` [tip:sched/core] sched: " tip-bot for Venkatesh Pallipadi
2010-10-05 0:03 ` [PATCH 4/8] Add IRQ_TIME_ACCOUNTING, finer accounting of irq time -v4 Venkatesh Pallipadi
2010-10-15 14:28 ` Peter Zijlstra
2010-10-18 19:25 ` [tip:sched/core] sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time tip-bot for Venkatesh Pallipadi
2010-10-05 0:03 ` [PATCH 5/8] x86: Add IRQ_TIME_ACCOUNTING in x86 -v4 Venkatesh Pallipadi
2010-10-15 14:38 ` Peter Zijlstra
2010-10-18 19:26 ` [tip:sched/core] x86: Add IRQ_TIME_ACCOUNTING tip-bot for Venkatesh Pallipadi
2010-10-05 0:03 ` [PATCH 6/8] sched: Do not account irq time to current task -v4 Venkatesh Pallipadi
2010-10-18 19:26 ` [tip:sched/core] sched: Do not account irq time to current task tip-bot for Venkatesh Pallipadi
2010-11-29 8:45 ` Yong Zhang
2010-11-29 11:59 ` Peter Zijlstra
2010-11-29 14:22 ` Yong Zhang [this message]
2010-11-29 17:06 ` Raistlin
2010-11-30 5:57 ` Yong Zhang
2010-12-01 18:55 ` Venkatesh Pallipadi
2010-12-01 19:16 ` Peter Zijlstra
2010-10-05 0:03 ` [PATCH 7/8] sched: Remove irq time from available CPU power -v4 Venkatesh Pallipadi
2010-10-18 19:26 ` [tip:sched/core] sched: Remove irq time from available CPU power tip-bot for Venkatesh Pallipadi
2010-10-05 0:03 ` [PATCH 8/8] Call tick_check_idle before __irq_enter Venkatesh Pallipadi
2010-10-17 9:05 ` Yong Zhang
2010-10-18 9:15 ` Peter Zijlstra
2010-10-18 19:27 ` [tip:sched/core] sched: " tip-bot for Venkatesh Pallipadi
2010-10-12 19:00 ` Proper kernel irq time accounting -v4 Venkatesh Pallipadi
2010-10-14 16:12 ` Shaun Ruffell
2010-10-14 18:19 ` Venkatesh Pallipadi
2010-10-14 20:00 ` Shaun Ruffell
2010-10-15 15:11 ` Peter Zijlstra
2010-10-15 15:27 ` Peter Zijlstra
2010-10-15 17:13 ` Venkatesh Pallipadi
2010-10-15 17:20 ` Peter Zijlstra
2010-10-17 9:11 ` Yong Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101129142213.GA2573@zhy \
--to=yong.zhang0@gmail.com \
--cc=a.p.zijlstra@chello.nl \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=venki@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox