From: Peter Zijlstra <peterz@infradead.org>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Spencer Candland <spencer@bluehost.com>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: utime/stime decreasing on thread exit
Date: Mon, 09 Nov 2009 15:49:14 +0100 [thread overview]
Message-ID: <1257778154.4108.341.camel@laptop> (raw)
In-Reply-To: <4AF26176.4080307@jp.fujitsu.com>
On Thu, 2009-11-05 at 14:24 +0900, Hidetoshi Seto wrote:
> Problem [1]:
> thread_group_cputime() vs exit
>
> +void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> +{
> + struct sighand_struct *sighand;
> + struct signal_struct *sig;
> + struct task_struct *t;
> +
> + *times = INIT_CPUTIME;
> +
> + rcu_read_lock();
> + sighand = rcu_dereference(tsk->sighand);
> + if (!sighand)
> + goto out;
> +
> + sig = tsk->signal;
> +
> + t = tsk;
> + do {
> + times->utime = cputime_add(times->utime, t->utime);
> + times->stime = cputime_add(times->stime, t->stime);
> + times->sum_exec_runtime += t->se.sum_exec_runtime;
> +
> + t = next_thread(t);
> + } while (t != tsk);
> +
> + times->utime = cputime_add(times->utime, sig->utime);
> + times->stime = cputime_add(times->stime, sig->stime);
> + times->sum_exec_runtime += sig->sum_sched_runtime;
> +out:
> + rcu_read_unlock();
> +}
>
> If one of (thousands) threads do exit while a thread is doing do-while
> above, the s/utime of exited thread can be accounted twice, at do-while
> (before exit) and at cputime_add() at last (after exit).
>
> I suppose this is hard to fix: Taking lock on signal would solve this
> problem, but it could block all other threads long and cause serious
> performance issue and so on...
I just checked .22 and there we seem to hold p->sighand->siglock over
the full task iteration. So we might as well revert back to that if
people really mind counting things twice :-)
FWIW getrusage() also takes siglock over the task iteration.
Alternatively, we could try reading the sig->[us]time before doing the
loop, but I guess that's still racy in that we can then miss someone
altogether.
> Problem [2]:
> use of task_s/utime()
>
> I modified the test program more, to take times() 6 times and print them
> if utime decreased between 3rd and 4th.
> I noticed that I cannot explain that if the problem [1] was the root cause
> then why results show decreased value continuously, instead of an increased
> value at a point (like (v)(v)(V)(v)(v)(v)) which is expected.
>
> :
> times decreased : (104 984) (104 984) (104 984) (105 983) (105 983) (105 983)
> times decreased : (115 981) (116 980) (116 978) (117 977) (117 977) (119 979)
> times decreased : (116 980) (117 980) (117 980) (117 977) (118 979) (118 977)
> :
>
> And it seems that the more thread exits the more utime decreases.
>
> Soon I found:
>
> [kernel/exit.c]
> + sig->utime = cputime_add(sig->utime, task_utime(tsk));
> + sig->stime = cputime_add(sig->stime, task_stime(tsk));
>
> While the thread_group_cputime() accumulates raw s/utime in do-while loop,
> the signal struct accumulates adjusted s/utime of exited threads.
>
> I'm not sure how this adjustment works but applying the following patch
> makes the result little bit better:
>
> :
> times decreased : (436 741) (436 741) (437 744) (436 742) (436 742) (436 742)
> times decreased : (454 792) (454 792) (455 794) (454 792) (454 792) (454 792)
> times decreased : (503 941) (503 941) (504 943) (503 941) (503 941) (503 941)
> :
>
> But still decreasing(or increasing) continues, because there is a problem [1]
> at least.
>
> I think I couldn't handle this problem any more... Anybody can help?
Stick in a few trace_printk()s and see what happens?
> Subject: [PATCH] thread_group_cputime() should use task_s/utime()
>
> The signal struct accumulates adjusted cputime of exited threads,
> so thread_group_cputime() should use task_s/utime() instead of raw
> task->s/utime, to accumulate adjusted cputime of live threads.
>
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> ---
> kernel/posix-cpu-timers.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
> index 5c9dc22..e065b8a 100644
> --- a/kernel/posix-cpu-timers.c
> +++ b/kernel/posix-cpu-timers.c
> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>
> t = tsk;
> do {
> - times->utime = cputime_add(times->utime, t->utime);
> - times->stime = cputime_add(times->stime, t->stime);
> + times->utime = cputime_add(times->utime, task_utime(t));
> + times->stime = cputime_add(times->stime, task_stime(t));
> times->sum_exec_runtime += t->se.sum_exec_runtime;
>
> t = next_thread(t);
So what you're trying to say is that because __exit_signal() uses
task_[usg]time() to accumulate sig->[usg]time, we should use it too in
the loop over the live threads?
I'm thinking its the task_[usg]time() usage in __exit_signal() that's
the issue.
I tried running the modified test.c on a current -tip kernel but could
not observe the problem (dual-core opteron).
next prev parent reply other threads:[~2009-11-09 14:49 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-04 0:23 utime/stime decreasing on thread exit Spencer Candland
2009-11-04 6:49 ` Hidetoshi Seto
2009-11-05 5:24 ` Hidetoshi Seto
2009-11-09 14:49 ` Peter Zijlstra [this message]
2009-11-09 17:20 ` Oleg Nesterov
2009-11-09 17:27 ` Oleg Nesterov
2009-11-09 17:31 ` Peter Zijlstra
2009-11-09 19:23 ` Oleg Nesterov
2009-11-09 19:32 ` Peter Zijlstra
2009-11-10 10:44 ` Stanislaw Gruszka
2009-11-10 17:40 ` Oleg Nesterov
2009-11-10 18:24 ` Stanislaw Gruszka
2009-11-10 19:23 ` Oleg Nesterov
2009-11-17 12:48 ` Stanislaw Gruszka
2009-11-17 12:57 ` [PATCH] posix-cpu-timers: reset expire cache when no timer is running Stanislaw Gruszka
2009-11-10 5:42 ` utime/stime decreasing on thread exit Hidetoshi Seto
2009-11-10 5:47 ` [PATCH] fix granularity of task_u/stime() Hidetoshi Seto
2009-11-11 12:11 ` Stanislaw Gruszka
2009-11-12 0:00 ` Hidetoshi Seto
2009-11-12 2:49 ` Hidetoshi Seto
2009-11-12 2:55 ` Américo Wang
2009-11-12 4:16 ` Hidetoshi Seto
2009-11-12 4:33 ` [PATCH] fix granularity of task_u/stime(), v2 Hidetoshi Seto
2009-11-12 14:15 ` Peter Zijlstra
2009-11-12 14:49 ` Stanislaw Gruszka
2009-11-12 15:00 ` Peter Zijlstra
2009-11-12 15:40 ` Stanislaw Gruszka
2009-11-13 12:42 ` [PATCH] sys_times: fix utime/stime decreasing on thread exit Stanislaw Gruszka
2009-11-13 13:16 ` Peter Zijlstra
2009-11-13 14:12 ` Balbir Singh
2009-11-13 15:36 ` Stanislaw Gruszka
2009-11-13 17:05 ` Peter Zijlstra
2009-11-16 19:32 ` [PATCH] fix granularity of task_u/stime(), v2 Spencer Candland
2009-11-17 13:08 ` Stanislaw Gruszka
2009-11-17 13:24 ` Peter Zijlstra
2009-11-19 18:17 ` Stanislaw Gruszka
2009-11-20 2:00 ` Hidetoshi Seto
2009-11-23 10:09 ` Stanislaw Gruszka
2009-11-23 10:16 ` [PATCH] cputime: avoid do_sys_times() races with __exit_signal() Stanislaw Gruszka
2009-11-30 9:20 ` [PATCH 1/2] cputime: remove prev_{u,s}time if VIRT_CPU_ACCOUNTING Hidetoshi Seto
2009-11-30 9:21 ` [PATCH 2/2] cputime: introduce thread_group_times() Hidetoshi Seto
2009-11-30 14:54 ` Stanislaw Gruszka
2009-12-01 1:02 ` Hidetoshi Seto
2009-12-02 8:26 ` [PATCH -v2 1/2] sched, cputime: cleanups related to task_times() Hidetoshi Seto
2009-12-02 15:17 ` Peter Zijlstra
2009-12-02 15:29 ` Balbir Singh
2009-12-03 0:21 ` Hidetoshi Seto
2009-12-02 15:57 ` Peter Zijlstra
2009-12-02 17:33 ` [tip:sched/core] sched, cputime: Cleanups " tip-bot for Hidetoshi Seto
2009-12-02 8:28 ` [PATCH -v2 2/2] sched, cputime: introduce thread_group_times() Hidetoshi Seto
2009-12-02 15:58 ` Peter Zijlstra
2009-12-02 17:33 ` [tip:sched/core] sched, cputime: Introduce thread_group_times() tip-bot for Hidetoshi Seto
2009-12-02 8:29 ` reproducer: utime decreasing Hidetoshi Seto
2009-12-02 8:32 ` reproducer: invisible utime Hidetoshi Seto
2009-11-23 10:25 ` [PATCH] fix granularity of task_u/stime(), v2 Balbir Singh
2009-11-23 10:46 ` Stanislaw Gruszka
2009-11-24 5:33 ` Hidetoshi Seto
2009-11-18 22:38 ` Spencer Candland
2009-11-23 9:52 ` Stanislaw Gruszka
2009-11-12 18:12 ` [tip:sched/core] sched: Fix granularity of task_u/stime() tip-bot for Hidetoshi Seto
2009-11-13 9:40 ` Stanislaw Gruszka
2009-11-13 23:09 ` Ingo Molnar
2009-11-16 2:44 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1257778154.4108.341.camel@laptop \
--to=peterz@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=spencer@bluehost.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.