* [RFC,PATCH 2/2] cputimers/proc: do_task_stat()->thread_group_times() is racy and O(n) under ->siglock
@ 2010-03-24 20:45 Oleg Nesterov
2010-03-24 20:54 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2010-03-24 20:45 UTC (permalink / raw)
To: Andrew Morton, Americo Wang, Balbir Singh, Eric W. Biederman,
Hidetoshi Seto, Ingo Molnar, Peter Zijlstra, Roland McGrath,
Spencer Candland, Stanislaw Gruszka
Cc: linux-kernel
Nowadays ->siglock is overloaded, it would be really nice to change
do_task_stat() to walk through the list of threads lockless. And note
that we are doing while_each_thread() twice!
while_each_thread() is rcu-safe, but thread_group_times() also needs
->siglock to serialize the modifications of signal_struct->prev_Xtime
members.
(however, please note that currently do_task_stat() can race with
wait_task_zombie() which calls thread_group_times() without siglock).
This patch changes the code back to use thread_group_cputime(), as we
did before 0cf55e1e "sched, cputime: Introduce thread_group_times()".
Of course, this makes the output from /proc/pid/stat less accurate, but
otoh this allows us to make do_task_stat() (apart from ->tty bits).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/array.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- 34-rc1/fs/proc/array.c~PROC_5_DTS_TGTS_IS_PITA 2010-03-24 20:47:19.000000000 +0100
+++ 34-rc1/fs/proc/array.c 2010-03-24 20:47:51.000000000 +0100
@@ -403,6 +403,7 @@ static int do_task_stat(struct seq_file
if (lock_task_sighand(task, &flags)) {
struct signal_struct *sig = task->signal;
+ struct task_cputime cputime;
if (sig->tty) {
struct pid *pgrp = tty_get_pgrp(sig->tty);
@@ -433,8 +434,11 @@ static int do_task_stat(struct seq_file
min_flt += sig->min_flt;
maj_flt += sig->maj_flt;
- thread_group_times(task, &utime, &stime);
+
gtime = cputime_add(gtime, sig->gtime);
+ thread_group_cputime(task, &cputime);
+ utime = cputime.utime;
+ stime = cputime.stime;
}
sid = task_session_nr_ns(task, ns);
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC,PATCH 2/2] cputimers/proc: do_task_stat()->thread_group_times() is racy and O(n) under ->siglock
2010-03-24 20:45 [RFC,PATCH 2/2] cputimers/proc: do_task_stat()->thread_group_times() is racy and O(n) under ->siglock Oleg Nesterov
@ 2010-03-24 20:54 ` Peter Zijlstra
2010-03-25 12:12 ` Oleg Nesterov
0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2010-03-24 20:54 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Americo Wang, Balbir Singh, Eric W. Biederman,
Hidetoshi Seto, Ingo Molnar, Roland McGrath, Spencer Candland,
Stanislaw Gruszka, linux-kernel
On Wed, 2010-03-24 at 21:45 +0100, Oleg Nesterov wrote:
> Nowadays ->siglock is overloaded, it would be really nice to change
> do_task_stat() to walk through the list of threads lockless. And note
> that we are doing while_each_thread() twice!
>
> while_each_thread() is rcu-safe, but thread_group_times() also needs
> ->siglock to serialize the modifications of signal_struct->prev_Xtime
> members.
>
>
Right, so from what I remember the issue is that, yes top et al rely on
that monotonicity, but more importantly I think
clock_gettime(CLOCK_PROCESS_CPUTIME_ID) should indeed use ->siglock to
ensure it serializes against do_exit() so that either we iterate the
thread or get the accumulated runtime from signal_struct but not both
(or neither).
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC,PATCH 2/2] cputimers/proc: do_task_stat()->thread_group_times() is racy and O(n) under ->siglock
2010-03-24 20:54 ` Peter Zijlstra
@ 2010-03-25 12:12 ` Oleg Nesterov
2010-03-25 12:19 ` Peter Zijlstra
2010-03-26 7:59 ` Stanislaw Gruszka
0 siblings, 2 replies; 5+ messages in thread
From: Oleg Nesterov @ 2010-03-25 12:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, Americo Wang, Balbir Singh, Eric W. Biederman,
Hidetoshi Seto, Ingo Molnar, Roland McGrath, Spencer Candland,
Stanislaw Gruszka, linux-kernel
On 03/24, Peter Zijlstra wrote:
>
> On Wed, 2010-03-24 at 21:45 +0100, Oleg Nesterov wrote:
> > Nowadays ->siglock is overloaded, it would be really nice to change
> > do_task_stat() to walk through the list of threads lockless. And note
> > that we are doing while_each_thread() twice!
> >
> > while_each_thread() is rcu-safe, but thread_group_times() also needs
> > ->siglock to serialize the modifications of signal_struct->prev_Xtime
> > members.
First of all, let me reply to myself. I see that I wasn't clear at all.
This patch does the first step to remove one reason for ->siglock
(modification of ->prev_Xtime). But this is very minor, I guess we
could change thread_group_times() to take signal->cputimer->lock.
The goal was to call thread_group_cputime() lockless under rcu lock
(either directly, or via thread_group_times(), this doesn't matter)
to avoid while_each_thread() under ->siglock.
And in this case /proc/pid/stat can't report utime/stime atomically.
Whatever we do we can race with exit, so it doesn't make sense to
play with ->prev_Xtime.
> Right, so from what I remember the issue is that, yes top et al rely on
> that monotonicity,
Really? So, do you think the change above will break user-space?
How sad :/
> but more importantly I think
> clock_gettime(CLOCK_PROCESS_CPUTIME_ID) should indeed use ->siglock to
> ensure it serializes against do_exit() so that either we iterate the
> thread or get the accumulated runtime from signal_struct but not both
> (or neither).
Oh. I forgot everything I knew about posix-cpu-timers... But, it seems,
posix_cpu_clock_get() calls thread_group_cputime() under tasklist and
thus can't race with exit.
Oleg.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC,PATCH 2/2] cputimers/proc: do_task_stat()->thread_group_times() is racy and O(n) under ->siglock
2010-03-25 12:12 ` Oleg Nesterov
@ 2010-03-25 12:19 ` Peter Zijlstra
2010-03-26 7:59 ` Stanislaw Gruszka
1 sibling, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2010-03-25 12:19 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Americo Wang, Balbir Singh, Eric W. Biederman,
Hidetoshi Seto, Ingo Molnar, Roland McGrath, Spencer Candland,
Stanislaw Gruszka, linux-kernel
On Thu, 2010-03-25 at 13:12 +0100, Oleg Nesterov wrote:
> > Right, so from what I remember the issue is that, yes top et al rely on
> > that monotonicity,
>
> Really? So, do you think the change above will break user-space?
>
> How sad :/
IIRC top can give very funny results if you break it hard enough, it
likes to give 9999% cputime if the thing goes backwards over the sample
interval.
But I'm not sure your race is large enough to ever show up like that, so
it might all just work out, but in general it does tend to require
monotonic times.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC,PATCH 2/2] cputimers/proc: do_task_stat()->thread_group_times() is racy and O(n) under ->siglock
2010-03-25 12:12 ` Oleg Nesterov
2010-03-25 12:19 ` Peter Zijlstra
@ 2010-03-26 7:59 ` Stanislaw Gruszka
1 sibling, 0 replies; 5+ messages in thread
From: Stanislaw Gruszka @ 2010-03-26 7:59 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Andrew Morton, Americo Wang, Balbir Singh,
Eric W. Biederman, Hidetoshi Seto, Ingo Molnar, Roland McGrath,
Spencer Candland, linux-kernel
On Thu, Mar 25, 2010 at 01:12:50PM +0100, Oleg Nesterov wrote:
> > but more importantly I think
> > clock_gettime(CLOCK_PROCESS_CPUTIME_ID) should indeed use ->siglock to
> > ensure it serializes against do_exit() so that either we iterate the
> > thread or get the accumulated runtime from signal_struct but not both
> > (or neither).
>
> Oh. I forgot everything I knew about posix-cpu-timers... But, it seems,
> posix_cpu_clock_get() calls thread_group_cputime() under tasklist and
> thus can't race with exit.
We assure thread_group_cputime() is called with one of: tasklist_lock
or ->siglock to avoid races with __exit_signal. Except oom killer and
elf core-dump code where is no lock, where we assume exit is not called
or we don't care of inaccurate results.
Stanislaw
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-03-26 8:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-24 20:45 [RFC,PATCH 2/2] cputimers/proc: do_task_stat()->thread_group_times() is racy and O(n) under ->siglock Oleg Nesterov
2010-03-24 20:54 ` Peter Zijlstra
2010-03-25 12:12 ` Oleg Nesterov
2010-03-25 12:19 ` Peter Zijlstra
2010-03-26 7:59 ` Stanislaw Gruszka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.