All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Spencer Candland <spencer@bluehost.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Oleg Nesterov <oleg@redhat.com>,
	sgruszka@redhat.com
Subject: Re: utime/stime decreasing on thread exit
Date: Tue, 10 Nov 2009 14:42:20 +0900	[thread overview]
Message-ID: <4AF8FD3C.2090008@jp.fujitsu.com> (raw)
In-Reply-To: <1257778154.4108.341.camel@laptop>

Peter Zijlstra wrote:
> On Thu, 2009-11-05 at 14:24 +0900, Hidetoshi Seto wrote:
> 
>> Problem [1]:
>>   thread_group_cputime() vs exit
(snip)
> 
> I just checked .22 and there we seem to hold p->sighand->siglock over
> the full task iteration. So we might as well revert back to that if
> people really mind counting things twice :-)
> 
> FWIW getrusage() also takes siglock over the task iteration.
> 
> Alternatively, we could try reading the sig->[us]time before doing the
> loop, but I guess that's still racy in that we can then miss someone
> altogether.

Right.  So "before or after" will not be any help.

As you pointed there are some other functions taking siglock over the
task iteration, so making do_sys_times() do same will be acceptable.
In other words using many threads have risks, which might be solved in
future.  

I agree that the following patch is right fix for this.

 [PATCH 1/2] posix-cpu-timers: avoid do_sys_times() races with __exit_signal()
 http://marc.info/?l=linux-kernel&m=124505545131145



>> Problem [2]:
>>   use of task_s/utime()
>>
(snip)
>> [kernel/exit.c]
>> +               sig->utime = cputime_add(sig->utime, task_utime(tsk));
>> +               sig->stime = cputime_add(sig->stime, task_stime(tsk));
>>
>> While the thread_group_cputime() accumulates raw s/utime in do-while loop,
>> the signal struct accumulates adjusted s/utime of exited threads.
>>
>> I'm not sure how this adjustment works but applying the following patch
>> makes the result little bit better:
>>
>>  :
>> times decreased : (436 741) (436 741) (437 744) (436 742) (436 742) (436 742)
>> times decreased : (454 792) (454 792) (455 794) (454 792) (454 792) (454 792)
>> times decreased : (503 941) (503 941) (504 943) (503 941) (503 941) (503 941)
>>  :
>>
>> But still decreasing(or increasing) continues, because there is a problem [1]
>> at least.
>>
>> I think I couldn't handle this problem any more... Anybody can help?
> 
> Stick in a few trace_printk()s and see what happens?

Nice idea.
I tried it and show the result in later of this mail.

>> Subject: [PATCH] thread_group_cputime() should use task_s/utime()
>>
>> The signal struct accumulates adjusted cputime of exited threads,
>> so thread_group_cputime() should use task_s/utime() instead of raw
>> task->s/utime, to accumulate adjusted cputime of live threads.
>>
>> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>> ---
>>  kernel/posix-cpu-timers.c |    4 ++--
>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
>> index 5c9dc22..e065b8a 100644
>> --- a/kernel/posix-cpu-timers.c
>> +++ b/kernel/posix-cpu-timers.c
>> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>>  
>>  	t = tsk;
>>  	do {
>> -		times->utime = cputime_add(times->utime, t->utime);
>> -		times->stime = cputime_add(times->stime, t->stime);
>> +		times->utime = cputime_add(times->utime, task_utime(t));
>> +		times->stime = cputime_add(times->stime, task_stime(t));
>>  		times->sum_exec_runtime += t->se.sum_exec_runtime;
>>  
>>  		t = next_thread(t);
> 
> So what you're trying to say is that because __exit_signal() uses
> task_[usg]time() to accumulate sig->[usg]time, we should use it too in
> the loop over the live threads?

Right.  Thank you for trying to understand.

> 
> I'm thinking its the task_[usg]time() usage in __exit_signal() that's
> the issue.

It likely means reverting:

  commit 49048622eae698e5c4ae61f7e71200f265ccc529
  Author: Balbir Singh <balbir@linux.vnet.ibm.com>
  Date:   Fri Sep 5 18:12:23 2008 +0200
    sched: fix process time monotonicity

I'm not sure the reason why the task_[usg]time() have introduced, but
removing them would be one of solutions.

> I tried running the modified test.c on a current -tip kernel but could
> not observe the problem (dual-core opteron).

It might not happen if your box is quite fast (otherwise slow).
Changing loop in test.c (i.e. cycles in user-land) might help the
reproductivity...



Here is the result of additional test: 

I put a trace_printk() in the __exit_signal(), to print tsk->s/utime,
task_s/utime() and tsk->se.sum_exec_rumtime.
(And tsk->prev_s/utime before calling task_s/utime())

           <...>-2857  [006]   112.731732: release_task: (37 22)to(40 20), sxr 57480477, (0 0)
           <...>-5077  [009]   526.272338: release_task: (0 27)to(10 20), sxr 27997019, (0 0)
           <...>-4999  [009]   526.272396: release_task: (1 27)to(10 20), sxr 27967513, (0 0)
           <...>-4992  [006]   526.328591: release_task: (2 34)to(10 30), sxr 35823013, (0 0)
           <...>-5022  [012]   526.329183: release_task: (0 27)to(10 20), sxr 27761854, (0 0)
           <...>-4996  [010]   526.329203: release_task: (3 38)to(10 30), sxr 39200207, (0 0)

... It clearly probes that there is the 3rd problem.

Problem [3]:
   granularity of task_s/utime()

According to the git log, originally task_s/utime() were designed to
return clock_t but later changed to return cputime_t by following
commit:

  commit efe567fc8281661524ffa75477a7c4ca9b466c63
  Author: Christian Borntraeger <borntraeger@de.ibm.com>
  Date:   Thu Aug 23 15:18:02 2007 +0200

It only changes the type of return value, but not the implementation.
As the result the granularity of task_s/utime() is still that of clock_t,
not that of cputime_t. 

So using task_s/utime() in __exit_signal() makes values accumulated to the
signal struct to be rounded and coarse grained. 

After applying a fix (will follow to this mail), return values are changed
to be fine grained:

           <...>-5438  [006]   135.212289: release_task: (0 28)to(0 26), sxr 26648558, (0 0)
           <...>-5402  [015]   135.213193: release_task: (0 27)to(0 26), sxr 26725886, (0 0)
           <...>-5408  [011]   135.214172: release_task: (0 28)to(0 26), sxr 26607882, (0 0)
           <...>-5419  [005]   135.214410: release_task: (1 27)to(1 25), sxr 26612615, (0 0)
           <...>-5350  [009]   135.214937: release_task: (0 28)to(0 27), sxr 27028388, (0 0)
           <...>-5443  [008]   135.216748: release_task: (0 28)to(0 26), sxr 26372691, (0 0)

But it cannot stop adjusted values become smaller than tsk->s/utime.
So some approach to solve problem [2] is required.


Thanks,
H.Seto



  parent reply	other threads:[~2009-11-10  5:42 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-04  0:23 utime/stime decreasing on thread exit Spencer Candland
2009-11-04  6:49 ` Hidetoshi Seto
2009-11-05  5:24   ` Hidetoshi Seto
2009-11-09 14:49     ` Peter Zijlstra
2009-11-09 17:20       ` Oleg Nesterov
2009-11-09 17:27         ` Oleg Nesterov
2009-11-09 17:31         ` Peter Zijlstra
2009-11-09 19:23           ` Oleg Nesterov
2009-11-09 19:32             ` Peter Zijlstra
2009-11-10 10:44             ` Stanislaw Gruszka
2009-11-10 17:40               ` Oleg Nesterov
2009-11-10 18:24                 ` Stanislaw Gruszka
2009-11-10 19:23                   ` Oleg Nesterov
2009-11-17 12:48                     ` Stanislaw Gruszka
2009-11-17 12:57                       ` [PATCH] posix-cpu-timers: reset expire cache when no timer is running Stanislaw Gruszka
2009-11-10  5:42       ` Hidetoshi Seto [this message]
2009-11-10  5:47         ` [PATCH] fix granularity of task_u/stime() Hidetoshi Seto
2009-11-11 12:11           ` Stanislaw Gruszka
2009-11-12  0:00             ` Hidetoshi Seto
2009-11-12  2:49               ` Hidetoshi Seto
2009-11-12  2:55                 ` Américo Wang
2009-11-12  4:16                   ` Hidetoshi Seto
2009-11-12  4:33                     ` [PATCH] fix granularity of task_u/stime(), v2 Hidetoshi Seto
2009-11-12 14:15                       ` Peter Zijlstra
2009-11-12 14:49                       ` Stanislaw Gruszka
2009-11-12 15:00                         ` Peter Zijlstra
2009-11-12 15:40                           ` Stanislaw Gruszka
2009-11-13 12:42                             ` [PATCH] sys_times: fix utime/stime decreasing on thread exit Stanislaw Gruszka
2009-11-13 13:16                               ` Peter Zijlstra
2009-11-13 14:12                                 ` Balbir Singh
2009-11-13 15:36                                 ` Stanislaw Gruszka
2009-11-13 17:05                                   ` Peter Zijlstra
2009-11-16 19:32                             ` [PATCH] fix granularity of task_u/stime(), v2 Spencer Candland
2009-11-17 13:08                               ` Stanislaw Gruszka
2009-11-17 13:24                                 ` Peter Zijlstra
2009-11-19 18:17                                   ` Stanislaw Gruszka
2009-11-20  2:00                                     ` Hidetoshi Seto
2009-11-23 10:09                                       ` Stanislaw Gruszka
2009-11-23 10:16                                         ` [PATCH] cputime: avoid do_sys_times() races with __exit_signal() Stanislaw Gruszka
2009-11-30  9:20                                           ` [PATCH 1/2] cputime: remove prev_{u,s}time if VIRT_CPU_ACCOUNTING Hidetoshi Seto
2009-11-30  9:21                                           ` [PATCH 2/2] cputime: introduce thread_group_times() Hidetoshi Seto
2009-11-30 14:54                                             ` Stanislaw Gruszka
2009-12-01  1:02                                               ` Hidetoshi Seto
2009-12-02  8:26                                           ` [PATCH -v2 1/2] sched, cputime: cleanups related to task_times() Hidetoshi Seto
2009-12-02 15:17                                             ` Peter Zijlstra
2009-12-02 15:29                                               ` Balbir Singh
2009-12-03  0:21                                                 ` Hidetoshi Seto
2009-12-02 15:57                                             ` Peter Zijlstra
2009-12-02 17:33                                             ` [tip:sched/core] sched, cputime: Cleanups " tip-bot for Hidetoshi Seto
2009-12-02  8:28                                           ` [PATCH -v2 2/2] sched, cputime: introduce thread_group_times() Hidetoshi Seto
2009-12-02 15:58                                             ` Peter Zijlstra
2009-12-02 17:33                                             ` [tip:sched/core] sched, cputime: Introduce thread_group_times() tip-bot for Hidetoshi Seto
2009-12-02  8:29                                           ` reproducer: utime decreasing Hidetoshi Seto
2009-12-02  8:32                                           ` reproducer: invisible utime Hidetoshi Seto
2009-11-23 10:25                                         ` [PATCH] fix granularity of task_u/stime(), v2 Balbir Singh
2009-11-23 10:46                                           ` Stanislaw Gruszka
2009-11-24  5:33                                         ` Hidetoshi Seto
2009-11-18 22:38                                 ` Spencer Candland
2009-11-23  9:52                         ` Stanislaw Gruszka
2009-11-12 18:12                       ` [tip:sched/core] sched: Fix granularity of task_u/stime() tip-bot for Hidetoshi Seto
2009-11-13  9:40                         ` Stanislaw Gruszka
2009-11-13 23:09                         ` Ingo Molnar
2009-11-16  2:44                           ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF8FD3C.2090008@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sgruszka@redhat.com \
    --cc=spencer@bluehost.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.