From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
"Spencer Candland" <spencer@bluehost.com>,
"Américo Wang" <xiyou.wangcong@gmail.com>,
linux-kernel@vger.kernel.org, "Ingo Molnar" <mingo@elte.hu>,
"Oleg Nesterov" <oleg@redhat.com>,
"Balbir Singh" <balbir@in.ibm.com>
Subject: Re: [PATCH] fix granularity of task_u/stime(), v2
Date: Fri, 20 Nov 2009 11:00:21 +0900 [thread overview]
Message-ID: <4B05F835.10401@jp.fujitsu.com> (raw)
In-Reply-To: <20091119181744.GA3743@dhcp-lab-161.englab.brq.redhat.com>
Stanislaw Gruszka wrote:
> On Tue, Nov 17, 2009 at 02:24:48PM +0100, Peter Zijlstra wrote:
>>> Seems issue reported then was exactly the same as reported now by
>>> you. Looks like commit 49048622eae698e5c4ae61f7e71200f265ccc529 just
>>> make probability of bug smaller and you did not note it until now.
>>>
>>> Could you please test this patch, if it solve all utime decrease
>>> problems for you:
>>>
>>> http://patchwork.kernel.org/patch/59795/
>>>
>>> If you confirm it work, I think we should apply it. Otherwise
>>> we need to go to propagate task_{u,s}time everywhere, which is not
>>> (my) preferred solution.
>> That patch will create another issue, it will allow a process to hide
>> from top by arranging to never run when the tick hits.
>
Yes, nowadays there are many threads on high speed hardware,
such process can exist all around, easier than before.
E.g. assume that there are 2 tasks:
Task A: interrupted by timer few times
(utime, stime, se.sum_sched_runtime) = (50, 50, 1000000000)
=> total of runtime is 1 sec, but utime + stime is 100 ms
Task B: interrupted by timer many times
(utime, stime, se.sum_sched_runtime) = (50, 50, 10000000)
=> total of runtime is 10 ms, but utime + stime is 100 ms
You can see task_[su]time() works well for these tasks.
> What about that?
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 1f8d028..9db1cbc 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -5194,7 +5194,7 @@ cputime_t task_utime(struct task_struct *p)
> }
> utime = (cputime_t)temp;
>
> - p->prev_utime = max(p->prev_utime, utime);
> + p->prev_utime = max(p->prev_utime, max(p->utime, utime));
> return p->prev_utime;
> }
I think this makes things worse.
without this patch:
Task A prev_utime: 500 ms (= accurate)
Task B prev_utime: 5 ms (= accurate)
with this patch:
Task A prev_utime: 500 ms (= accurate)
Task B prev_utime: 50 ms (= not accurate)
Note that task_stime() calculates prev_stime using this prev_utime:
without this patch:
Task A prev_stime: 500 ms (= accurate)
Task B prev_stime: 5 ms (= not accurate)
with this patch:
Task A prev_stime: 500 ms (= accurate)
Task B prev_stime: 0 ms (= not accurate)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index ce17760..8be5b75 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -914,8 +914,8 @@ void do_sys_times(struct tms *tms)
> struct task_cputime cputime;
> cputime_t cutime, cstime;
>
> - thread_group_cputime(current, &cputime);
> spin_lock_irq(¤t->sighand->siglock);
> + thread_group_cputime(current, &cputime);
> cutime = current->signal->cutime;
> cstime = current->signal->cstime;
> spin_unlock_irq(¤t->sighand->siglock);
>
> It's on top of Hidetoshi patch and fix utime decrease problem
> on my system.
How about the stime decrease problem which can be caused by same
logic?
According to my labeling, there are 2 unresolved problem [1]
"thread_group_cputime() vs exit" and [2] "use of task_s/utime()".
Still I believe the real fix for this problem is combination of
above fix for do_sys_times() (for problem[1]) and (I know it is
not preferred, but for [2]) the following:
>> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
>> >> index 5c9dc22..e065b8a 100644
>> >> --- a/kernel/posix-cpu-timers.c
>> >> +++ b/kernel/posix-cpu-timers.c
>> >> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>> >>
>> >> t = tsk;
>> >> do {
>> >> - times->utime = cputime_add(times->utime, t->utime);
>> >> - times->stime = cputime_add(times->stime, t->stime);
>> >> + times->utime = cputime_add(times->utime, task_utime(t));
>> >> + times->stime = cputime_add(times->stime, task_stime(t));
>> >> times->sum_exec_runtime += t->se.sum_exec_runtime;
>> >>
>> >> t = next_thread(t);
Think about this diff, assuming task C is in same group of task A and B:
sys_times() on C while A and B are living returns:
(utime, stime)
= task_[su]time(C) + ([su]time(A)+[su]time(B)+...) + in_signal(exited)
= task_[su]time(C) + ( (50,50) + (50,50) +...) + in_signal(exited)
If A exited, it increases:
(utime, stime)
= task_[su]time(C) + ([su]time(B)+...) + in_signal(exited)+task_[su]time(A)
= task_[su]time(C) + ( (50,50) +...) + in_signal(exited)+(500,500)
Otherwise if B exited, it decreases:
(utime, stime)
= task_[su]time(C) + ([su]time(A)+...) + in_signal(exited)+task_[su]time(B)
= task_[su]time(C) + ( (50,50) +...) + in_signal(exited)+(5,5)
With this fix, sys_times() returns:
(utime, stime)
= task_[su]time(C) + (task_[su]time(A)+task_[su]time(B)+...) + in_signal(exited)
= task_[su]time(C) + ( (500,500) + (5,5) +...) + in_signal(exited)
> Are we not doing something nasty here?
>
> cputime_t utime = p->utime, total = utime + p->stime;
> u64 temp;
>
> /*
> * Use CFS's precise accounting:
> */
> temp = (u64)nsecs_to_cputime(p->se.sum_exec_runtime);
>
> if (total) {
> temp *= utime;
> do_div(temp, total);
> }
> utime = (cputime_t)temp;
Not here, but doing do_div() for each thread could be said nasty.
I mean
__task_[su]time(sum(A, B, ...))
would be better than:
sum(task_[su]time(A)+task_[su]time(B)+...)
However it would bring another issue, because
__task_[su]time(sum(A, B, ...))
might not equal to
__task_[su]time(sum(B, ...)) + task_[su]time(A)
Thanks,
H.Seto
next prev parent reply other threads:[~2009-11-20 2:00 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-04 0:23 utime/stime decreasing on thread exit Spencer Candland
2009-11-04 6:49 ` Hidetoshi Seto
2009-11-05 5:24 ` Hidetoshi Seto
2009-11-09 14:49 ` Peter Zijlstra
2009-11-09 17:20 ` Oleg Nesterov
2009-11-09 17:27 ` Oleg Nesterov
2009-11-09 17:31 ` Peter Zijlstra
2009-11-09 19:23 ` Oleg Nesterov
2009-11-09 19:32 ` Peter Zijlstra
2009-11-10 10:44 ` Stanislaw Gruszka
2009-11-10 17:40 ` Oleg Nesterov
2009-11-10 18:24 ` Stanislaw Gruszka
2009-11-10 19:23 ` Oleg Nesterov
2009-11-17 12:48 ` Stanislaw Gruszka
2009-11-17 12:57 ` [PATCH] posix-cpu-timers: reset expire cache when no timer is running Stanislaw Gruszka
2009-11-10 5:42 ` utime/stime decreasing on thread exit Hidetoshi Seto
2009-11-10 5:47 ` [PATCH] fix granularity of task_u/stime() Hidetoshi Seto
2009-11-11 12:11 ` Stanislaw Gruszka
2009-11-12 0:00 ` Hidetoshi Seto
2009-11-12 2:49 ` Hidetoshi Seto
2009-11-12 2:55 ` Américo Wang
2009-11-12 4:16 ` Hidetoshi Seto
2009-11-12 4:33 ` [PATCH] fix granularity of task_u/stime(), v2 Hidetoshi Seto
2009-11-12 14:15 ` Peter Zijlstra
2009-11-12 14:49 ` Stanislaw Gruszka
2009-11-12 15:00 ` Peter Zijlstra
2009-11-12 15:40 ` Stanislaw Gruszka
2009-11-13 12:42 ` [PATCH] sys_times: fix utime/stime decreasing on thread exit Stanislaw Gruszka
2009-11-13 13:16 ` Peter Zijlstra
2009-11-13 14:12 ` Balbir Singh
2009-11-13 15:36 ` Stanislaw Gruszka
2009-11-13 17:05 ` Peter Zijlstra
2009-11-16 19:32 ` [PATCH] fix granularity of task_u/stime(), v2 Spencer Candland
2009-11-17 13:08 ` Stanislaw Gruszka
2009-11-17 13:24 ` Peter Zijlstra
2009-11-19 18:17 ` Stanislaw Gruszka
2009-11-20 2:00 ` Hidetoshi Seto [this message]
2009-11-23 10:09 ` Stanislaw Gruszka
2009-11-23 10:16 ` [PATCH] cputime: avoid do_sys_times() races with __exit_signal() Stanislaw Gruszka
2009-11-30 9:20 ` [PATCH 1/2] cputime: remove prev_{u,s}time if VIRT_CPU_ACCOUNTING Hidetoshi Seto
2009-11-30 9:21 ` [PATCH 2/2] cputime: introduce thread_group_times() Hidetoshi Seto
2009-11-30 14:54 ` Stanislaw Gruszka
2009-12-01 1:02 ` Hidetoshi Seto
2009-12-02 8:26 ` [PATCH -v2 1/2] sched, cputime: cleanups related to task_times() Hidetoshi Seto
2009-12-02 15:17 ` Peter Zijlstra
2009-12-02 15:29 ` Balbir Singh
2009-12-03 0:21 ` Hidetoshi Seto
2009-12-02 15:57 ` Peter Zijlstra
2009-12-02 17:33 ` [tip:sched/core] sched, cputime: Cleanups " tip-bot for Hidetoshi Seto
2009-12-02 8:28 ` [PATCH -v2 2/2] sched, cputime: introduce thread_group_times() Hidetoshi Seto
2009-12-02 15:58 ` Peter Zijlstra
2009-12-02 17:33 ` [tip:sched/core] sched, cputime: Introduce thread_group_times() tip-bot for Hidetoshi Seto
2009-12-02 8:29 ` reproducer: utime decreasing Hidetoshi Seto
2009-12-02 8:32 ` reproducer: invisible utime Hidetoshi Seto
2009-11-23 10:25 ` [PATCH] fix granularity of task_u/stime(), v2 Balbir Singh
2009-11-23 10:46 ` Stanislaw Gruszka
2009-11-24 5:33 ` Hidetoshi Seto
2009-11-18 22:38 ` Spencer Candland
2009-11-23 9:52 ` Stanislaw Gruszka
2009-11-12 18:12 ` [tip:sched/core] sched: Fix granularity of task_u/stime() tip-bot for Hidetoshi Seto
2009-11-13 9:40 ` Stanislaw Gruszka
2009-11-13 23:09 ` Ingo Molnar
2009-11-16 2:44 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B05F835.10401@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=balbir@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=sgruszka@redhat.com \
--cc=spencer@bluehost.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.