public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Spencer Candland <spencer@bluehost.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: utime/stime decreasing on thread exit
Date: Mon, 09 Nov 2009 15:49:14 +0100	[thread overview]
Message-ID: <1257778154.4108.341.camel@laptop> (raw)
In-Reply-To: <4AF26176.4080307@jp.fujitsu.com>

On Thu, 2009-11-05 at 14:24 +0900, Hidetoshi Seto wrote:

> Problem [1]:
>   thread_group_cputime() vs exit
> 
> +void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> +{
> +       struct sighand_struct *sighand;
> +       struct signal_struct *sig;
> +       struct task_struct *t;
> +
> +       *times = INIT_CPUTIME;
> +
> +       rcu_read_lock();
> +       sighand = rcu_dereference(tsk->sighand);
> +       if (!sighand)
> +               goto out;
> +
> +       sig = tsk->signal;
> +
> +       t = tsk;
> +       do {
> +               times->utime = cputime_add(times->utime, t->utime);
> +               times->stime = cputime_add(times->stime, t->stime);
> +               times->sum_exec_runtime += t->se.sum_exec_runtime;
> +
> +               t = next_thread(t);
> +       } while (t != tsk);
> +
> +       times->utime = cputime_add(times->utime, sig->utime);
> +       times->stime = cputime_add(times->stime, sig->stime);
> +       times->sum_exec_runtime += sig->sum_sched_runtime;
> +out:
> +       rcu_read_unlock();
> +}
> 
> If one of (thousands) threads do exit while a thread is doing do-while
> above, the s/utime of exited thread can be accounted twice, at do-while
> (before exit) and at cputime_add() at last (after exit).
> 
> I suppose this is hard to fix: Taking lock on signal would solve this
> problem, but it could block all other threads long and cause serious
> performance issue and so on...

I just checked .22 and there we seem to hold p->sighand->siglock over
the full task iteration. So we might as well revert back to that if
people really mind counting things twice :-)

FWIW getrusage() also takes siglock over the task iteration.

Alternatively, we could try reading the sig->[us]time before doing the
loop, but I guess that's still racy in that we can then miss someone
altogether.

> Problem [2]:
>   use of task_s/utime()
> 
> I modified the test program more, to take times() 6 times and print them
> if utime decreased between 3rd and 4th.
> I noticed that I cannot explain that if the problem [1] was the root cause
> then why results show decreased value continuously, instead of an increased
> value at a point (like (v)(v)(V)(v)(v)(v)) which is expected.
> 
>  :
> times decreased : (104 984) (104 984) (104 984) (105 983) (105 983) (105 983)
> times decreased : (115 981) (116 980) (116 978) (117 977) (117 977) (119 979)
> times decreased : (116 980) (117 980) (117 980) (117 977) (118 979) (118 977)
>  :
> 
> And it seems that the more thread exits the more utime decreases.
> 
> Soon I found:
> 
> [kernel/exit.c]
> +               sig->utime = cputime_add(sig->utime, task_utime(tsk));
> +               sig->stime = cputime_add(sig->stime, task_stime(tsk));
> 
> While the thread_group_cputime() accumulates raw s/utime in do-while loop,
> the signal struct accumulates adjusted s/utime of exited threads.
> 
> I'm not sure how this adjustment works but applying the following patch
> makes the result little bit better:
> 
>  :
> times decreased : (436 741) (436 741) (437 744) (436 742) (436 742) (436 742)
> times decreased : (454 792) (454 792) (455 794) (454 792) (454 792) (454 792)
> times decreased : (503 941) (503 941) (504 943) (503 941) (503 941) (503 941)
>  :
> 
> But still decreasing(or increasing) continues, because there is a problem [1]
> at least.
> 
> I think I couldn't handle this problem any more... Anybody can help?

Stick in a few trace_printk()s and see what happens?

> Subject: [PATCH] thread_group_cputime() should use task_s/utime()
> 
> The signal struct accumulates adjusted cputime of exited threads,
> so thread_group_cputime() should use task_s/utime() instead of raw
> task->s/utime, to accumulate adjusted cputime of live threads.
> 
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
> ---
>  kernel/posix-cpu-timers.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
> index 5c9dc22..e065b8a 100644
> --- a/kernel/posix-cpu-timers.c
> +++ b/kernel/posix-cpu-timers.c
> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>  
>  	t = tsk;
>  	do {
> -		times->utime = cputime_add(times->utime, t->utime);
> -		times->stime = cputime_add(times->stime, t->stime);
> +		times->utime = cputime_add(times->utime, task_utime(t));
> +		times->stime = cputime_add(times->stime, task_stime(t));
>  		times->sum_exec_runtime += t->se.sum_exec_runtime;
>  
>  		t = next_thread(t);

So what you're trying to say is that because __exit_signal() uses
task_[usg]time() to accumulate sig->[usg]time, we should use it too in
the loop over the live threads?

I'm thinking its the task_[usg]time() usage in __exit_signal() that's
the issue.

I tried running the modified test.c on a current -tip kernel but could
not observe the problem (dual-core opteron).



  reply	other threads:[~2009-11-09 14:49 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-04  0:23 utime/stime decreasing on thread exit Spencer Candland
2009-11-04  6:49 ` Hidetoshi Seto
2009-11-05  5:24   ` Hidetoshi Seto
2009-11-09 14:49     ` Peter Zijlstra [this message]
2009-11-09 17:20       ` Oleg Nesterov
2009-11-09 17:27         ` Oleg Nesterov
2009-11-09 17:31         ` Peter Zijlstra
2009-11-09 19:23           ` Oleg Nesterov
2009-11-09 19:32             ` Peter Zijlstra
2009-11-10 10:44             ` Stanislaw Gruszka
2009-11-10 17:40               ` Oleg Nesterov
2009-11-10 18:24                 ` Stanislaw Gruszka
2009-11-10 19:23                   ` Oleg Nesterov
2009-11-17 12:48                     ` Stanislaw Gruszka
2009-11-17 12:57                       ` [PATCH] posix-cpu-timers: reset expire cache when no timer is running Stanislaw Gruszka
2009-11-10  5:42       ` utime/stime decreasing on thread exit Hidetoshi Seto
2009-11-10  5:47         ` [PATCH] fix granularity of task_u/stime() Hidetoshi Seto
2009-11-11 12:11           ` Stanislaw Gruszka
2009-11-12  0:00             ` Hidetoshi Seto
2009-11-12  2:49               ` Hidetoshi Seto
2009-11-12  2:55                 ` Américo Wang
2009-11-12  4:16                   ` Hidetoshi Seto
2009-11-12  4:33                     ` [PATCH] fix granularity of task_u/stime(), v2 Hidetoshi Seto
2009-11-12 14:15                       ` Peter Zijlstra
2009-11-12 14:49                       ` Stanislaw Gruszka
2009-11-12 15:00                         ` Peter Zijlstra
2009-11-12 15:40                           ` Stanislaw Gruszka
2009-11-13 12:42                             ` [PATCH] sys_times: fix utime/stime decreasing on thread exit Stanislaw Gruszka
2009-11-13 13:16                               ` Peter Zijlstra
2009-11-13 14:12                                 ` Balbir Singh
2009-11-13 15:36                                 ` Stanislaw Gruszka
2009-11-13 17:05                                   ` Peter Zijlstra
2009-11-16 19:32                             ` [PATCH] fix granularity of task_u/stime(), v2 Spencer Candland
2009-11-17 13:08                               ` Stanislaw Gruszka
2009-11-17 13:24                                 ` Peter Zijlstra
2009-11-19 18:17                                   ` Stanislaw Gruszka
2009-11-20  2:00                                     ` Hidetoshi Seto
2009-11-23 10:09                                       ` Stanislaw Gruszka
2009-11-23 10:16                                         ` [PATCH] cputime: avoid do_sys_times() races with __exit_signal() Stanislaw Gruszka
2009-11-30  9:20                                           ` [PATCH 1/2] cputime: remove prev_{u,s}time if VIRT_CPU_ACCOUNTING Hidetoshi Seto
2009-11-30  9:21                                           ` [PATCH 2/2] cputime: introduce thread_group_times() Hidetoshi Seto
2009-11-30 14:54                                             ` Stanislaw Gruszka
2009-12-01  1:02                                               ` Hidetoshi Seto
2009-12-02  8:26                                           ` [PATCH -v2 1/2] sched, cputime: cleanups related to task_times() Hidetoshi Seto
2009-12-02 15:17                                             ` Peter Zijlstra
2009-12-02 15:29                                               ` Balbir Singh
2009-12-03  0:21                                                 ` Hidetoshi Seto
2009-12-02 15:57                                             ` Peter Zijlstra
2009-12-02 17:33                                             ` [tip:sched/core] sched, cputime: Cleanups " tip-bot for Hidetoshi Seto
2009-12-02  8:28                                           ` [PATCH -v2 2/2] sched, cputime: introduce thread_group_times() Hidetoshi Seto
2009-12-02 15:58                                             ` Peter Zijlstra
2009-12-02 17:33                                             ` [tip:sched/core] sched, cputime: Introduce thread_group_times() tip-bot for Hidetoshi Seto
2009-12-02  8:29                                           ` reproducer: utime decreasing Hidetoshi Seto
2009-12-02  8:32                                           ` reproducer: invisible utime Hidetoshi Seto
2009-11-23 10:25                                         ` [PATCH] fix granularity of task_u/stime(), v2 Balbir Singh
2009-11-23 10:46                                           ` Stanislaw Gruszka
2009-11-24  5:33                                         ` Hidetoshi Seto
2009-11-18 22:38                                 ` Spencer Candland
2009-11-23  9:52                         ` Stanislaw Gruszka
2009-11-12 18:12                       ` [tip:sched/core] sched: Fix granularity of task_u/stime() tip-bot for Hidetoshi Seto
2009-11-13  9:40                         ` Stanislaw Gruszka
2009-11-13 23:09                         ` Ingo Molnar
2009-11-16  2:44                           ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1257778154.4108.341.camel@laptop \
    --to=peterz@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=spencer@bluehost.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox