* [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
@ 2010-06-10 23:09 Oleg Nesterov
2010-06-11 14:35 ` Stanislaw Gruszka
2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov
0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2010-06-10 23:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Stanislaw Gruszka, Thomas Gleixner, linux-kernel
thread_group_cputime() looks as if it is rcu-safe, but in fact this
was wrong until ea6d290c which pins task->signal to task_struct.
It checks ->sighand != NULL under rcu, but this can't help if ->signal
can go away. Fortunately the caller either holds ->siglock, or it is
fastpath_timer_check() which uses current and checks exit_state == 0.
- Since ea6d290c commit tsk->signal is stable, we can read it first
and avoid the initialization from INIT_CPUTIME.
- Even if tsk->signal is always valid, we still have to check it
is safe to use next_thread() under rcu_read_lock(). Currently
the code checks ->sighand != NULL, change it to use pid_alive()
which is commonly used to ensure the task wasn't unhashed before
we take rcu_read_lock().
Add the comment to explain this check.
- Change the main loop to use the while_each_thread() helper.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/posix-cpu-timers.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
--- 35-rc2/kernel/posix-cpu-timers.c~4_TG_CPUTIME 2010-06-11 00:47:33.000000000 +0200
+++ 35-rc2/kernel/posix-cpu-timers.c 2010-06-11 01:07:48.000000000 +0200
@@ -232,31 +232,24 @@ static int cpu_clock_sample(const clocki
void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
{
- struct sighand_struct *sighand;
- struct signal_struct *sig;
+ struct signal_struct *sig = tsk->signal;
struct task_struct *t;
- *times = INIT_CPUTIME;
+ times->utime = sig->utime;
+ times->stime = sig->stime;
+ times->sum_exec_runtime = sig->sum_sched_runtime;
rcu_read_lock();
- sighand = rcu_dereference(tsk->sighand);
- if (!sighand)
+ /* make sure we can trust tsk->thread_group list */
+ if (!likely(pid_alive(tsk)))
goto out;
- sig = tsk->signal;
-
t = tsk;
do {
times->utime = cputime_add(times->utime, t->utime);
times->stime = cputime_add(times->stime, t->stime);
times->sum_exec_runtime += t->se.sum_exec_runtime;
-
- t = next_thread(t);
- } while (t != tsk);
-
- times->utime = cputime_add(times->utime, sig->utime);
- times->stime = cputime_add(times->stime, sig->stime);
- times->sum_exec_runtime += sig->sum_sched_runtime;
+ } while_each_thread(tsk, t);
out:
rcu_read_unlock();
}
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check 2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov @ 2010-06-11 14:35 ` Stanislaw Gruszka 2010-06-11 15:15 ` Oleg Nesterov 2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov 1 sibling, 1 reply; 6+ messages in thread From: Stanislaw Gruszka @ 2010-06-11 14:35 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote: > thread_group_cputime() looks as if it is rcu-safe, but in fact this > was wrong until ea6d290c which pins task->signal to task_struct. > It checks ->sighand != NULL under rcu, but this can't help if ->signal > can go away. Fortunately the caller either holds ->siglock, or it is > fastpath_timer_check() which uses current and checks exit_state == 0. Hmm, I thought we avoided calling thread_group_cputime() from fastpatch_timer_check(), but seems it is still possible when we call run_posix_cpu_timers() on two different cpus simultaneously ... > - Since ea6d290c commit tsk->signal is stable, we can read it first > and avoid the initialization from INIT_CPUTIME. > > - Even if tsk->signal is always valid, we still have to check it > is safe to use next_thread() under rcu_read_lock(). Currently > the code checks ->sighand != NULL, change it to use pid_alive() > which is commonly used to ensure the task wasn't unhashed before > we take rcu_read_lock(). I'm not sure how important are values of almost dead task, but perhaps would be better to return times form all threads using as base sig->curr_target in loop. Stanislaw ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check 2010-06-11 14:35 ` Stanislaw Gruszka @ 2010-06-11 15:15 ` Oleg Nesterov 2010-06-11 16:40 ` Stanislaw Gruszka 0 siblings, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2010-06-11 15:15 UTC (permalink / raw) To: Stanislaw Gruszka Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel On 06/11, Stanislaw Gruszka wrote: > > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote: > > thread_group_cputime() looks as if it is rcu-safe, but in fact this > > was wrong until ea6d290c which pins task->signal to task_struct. > > It checks ->sighand != NULL under rcu, but this can't help if ->signal > > can go away. Fortunately the caller either holds ->siglock, or it is > > fastpath_timer_check() which uses current and checks exit_state == 0. > > Hmm, I thought we avoided calling thread_group_cputime() from > fastpatch_timer_check(), but seems it is still possible when we > call run_posix_cpu_timers() on two different cpus simultaneously ... No, we can't. thread_group_cputimer() does test-and-set ->running under cputimer->lock. But when I sent these patches, I realized we have another race here (with or without these patches). I am already doing the fix. > > - Since ea6d290c commit tsk->signal is stable, we can read it first > > and avoid the initialization from INIT_CPUTIME. > > > > - Even if tsk->signal is always valid, we still have to check it > > is safe to use next_thread() under rcu_read_lock(). Currently > > the code checks ->sighand != NULL, change it to use pid_alive() > > which is commonly used to ensure the task wasn't unhashed before > > we take rcu_read_lock(). > > I'm not sure how important are values of almost dead task, but > perhaps would be better to return times form all threads > using as base sig->curr_target in loop. Could you clarify? Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check 2010-06-11 15:15 ` Oleg Nesterov @ 2010-06-11 16:40 ` Stanislaw Gruszka 2010-06-11 16:57 ` Oleg Nesterov 0 siblings, 1 reply; 6+ messages in thread From: Stanislaw Gruszka @ 2010-06-11 16:40 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel On Fri, Jun 11, 2010 at 05:15:33PM +0200, Oleg Nesterov wrote: > On 06/11, Stanislaw Gruszka wrote: > > > > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote: > > > thread_group_cputime() looks as if it is rcu-safe, but in fact this > > > was wrong until ea6d290c which pins task->signal to task_struct. > > > It checks ->sighand != NULL under rcu, but this can't help if ->signal > > > can go away. Fortunately the caller either holds ->siglock, or it is > > > fastpath_timer_check() which uses current and checks exit_state == 0. > > > > Hmm, I thought we avoided calling thread_group_cputime() from > > fastpatch_timer_check(), but seems it is still possible when we > > call run_posix_cpu_timers() on two different cpus simultaneously ... > > No, we can't. thread_group_cputimer() does test-and-set ->running > under cputimer->lock. > > But when I sent these patches, I realized we have another race here > (with or without these patches). I am already doing the fix. Don't know what you catch, I was thinking about: cpu0 cpu1 fastpath_timer_check(): if (sig->cputimer.running) { struct task_cputime group_sample; stop_process_timers(): spin_lock_irqsave(&cputimer->lock, flags); cputimer->running = 0; spin_unlock_irqrestore(&cputimer->lock, flags); thread_group_cputimer(tsk, &group_sample); > > > - Since ea6d290c commit tsk->signal is stable, we can read it first > > > and avoid the initialization from INIT_CPUTIME. > > > > > > - Even if tsk->signal is always valid, we still have to check it > > > is safe to use next_thread() under rcu_read_lock(). Currently > > > the code checks ->sighand != NULL, change it to use pid_alive() > > > which is commonly used to ensure the task wasn't unhashed before > > > we take rcu_read_lock(). > > > > I'm not sure how important are values of almost dead task, but > > perhaps would be better to return times form all threads > > using as base sig->curr_target in loop. > > Could you clarify? Avoid pid_alive check and loop starting from sig->curr_target: t = tsk = sig->curr_target; do { times->utime = cputime_add(times->utime, t->utime); times->stime = cputime_add(times->stime, t->stime); times->sum_exec_runtime += t->se.sum_exec_runtime; } while_each_thread(tsk, t); I don't know what are rules regarding accessing sig->curr_target, but if this is done under sighand->siglock we should be safe. Question if if we always have lock taken, we tried to assure that in the past, but if we really do? Stanislaw ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check 2010-06-11 16:40 ` Stanislaw Gruszka @ 2010-06-11 16:57 ` Oleg Nesterov 0 siblings, 0 replies; 6+ messages in thread From: Oleg Nesterov @ 2010-06-11 16:57 UTC (permalink / raw) To: Stanislaw Gruszka Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel On 06/11, Stanislaw Gruszka wrote: > > On Fri, Jun 11, 2010 at 05:15:33PM +0200, Oleg Nesterov wrote: > > On 06/11, Stanislaw Gruszka wrote: > > > > > > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote: > > > > thread_group_cputime() looks as if it is rcu-safe, but in fact this > > > > was wrong until ea6d290c which pins task->signal to task_struct. > > > > It checks ->sighand != NULL under rcu, but this can't help if ->signal > > > > can go away. Fortunately the caller either holds ->siglock, or it is > > > > fastpath_timer_check() which uses current and checks exit_state == 0. > > > > > > Hmm, I thought we avoided calling thread_group_cputime() from > > > fastpatch_timer_check(), but seems it is still possible when we > > > call run_posix_cpu_timers() on two different cpus simultaneously ... > > > > No, we can't. thread_group_cputimer() does test-and-set ->running > > under cputimer->lock. > > > > But when I sent these patches, I realized we have another race here > > (with or without these patches). I am already doing the fix. > > Don't know what you catch, I was thinking about: > > cpu0 cpu1 > > fastpath_timer_check(): > > if (sig->cputimer.running) { > struct task_cputime group_sample; > stop_process_timers(): > > spin_lock_irqsave(&cputimer->lock, flags); > cputimer->running = 0; > spin_unlock_irqrestore(&cputimer->lock, flags); > > thread_group_cputimer(tsk, &group_sample); Yes, I was thinking about this race too. Please wait a bit, I'll send the patch. In short: it is safe to call thread_group_cputime() lockless, but thread_group_cputimer() must not be called without siglock/tasklist (oh, and imho we should rename them somehow, their names are almost identical). And in fact fastpath_timer_check() does not need thread_group_cputimer(). > > > > - Since ea6d290c commit tsk->signal is stable, we can read it first > > > > and avoid the initialization from INIT_CPUTIME. > > > > > > > > - Even if tsk->signal is always valid, we still have to check it > > > > is safe to use next_thread() under rcu_read_lock(). Currently > > > > the code checks ->sighand != NULL, change it to use pid_alive() > > > > which is commonly used to ensure the task wasn't unhashed before > > > > we take rcu_read_lock(). > > > > > > I'm not sure how important are values of almost dead task, but > > > perhaps would be better to return times form all threads > > > using as base sig->curr_target in loop. > > > > Could you clarify? > > Avoid pid_alive check and loop starting from sig->curr_target: > > t = tsk = sig->curr_target; > do { > times->utime = cputime_add(times->utime, t->utime); > times->stime = cputime_add(times->stime, t->stime); > times->sum_exec_runtime += t->se.sum_exec_runtime; > } while_each_thread(tsk, t); > > I don't know what are rules regarding accessing sig->curr_target, but > if this is done under sighand->siglock we should be safe. Question > if if we always have lock taken, we tried to assure that in the past, > but if we really do? Ah, you are talking about thread_group_cputime(). Without ->siglock this is not safe. We can change __exit_signal() to nullify ->curr_target in the group_dead case, then the code above could check sig->curr_target != NULL. But this is too subtle imho, and not needed. Instead we should move group_leader into ->signal (and kill signal->leader_pid). I am going to do more cleanups in this area "later". Anyway. This all has nothing to do with this patch. The 4/5 change in thread_group_cputime() is cleanup, and it ccan help to make /proc/pid/stat /proc/pid/status lockless. With or without 5/5 thread_group_cputime() can be called lockless and race with exit/fork. This is fine by itself, but this is wrong because the caller sets ->running. Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip:sched/core] sched: thread_group_cputime: Simplify, document the "alive" check 2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov 2010-06-11 14:35 ` Stanislaw Gruszka @ 2010-06-18 10:20 ` tip-bot for Oleg Nesterov 1 sibling, 0 replies; 6+ messages in thread From: tip-bot for Oleg Nesterov @ 2010-06-18 10:20 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, a.p.zijlstra, oleg, tglx, mingo Commit-ID: bfac7009180901f57f20a73c53c3e57b1ce75a1b Gitweb: http://git.kernel.org/tip/bfac7009180901f57f20a73c53c3e57b1ce75a1b Author: Oleg Nesterov <oleg@redhat.com> AuthorDate: Fri, 11 Jun 2010 01:09:56 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 18 Jun 2010 10:46:56 +0200 sched: thread_group_cputime: Simplify, document the "alive" check thread_group_cputime() looks as if it is rcu-safe, but in fact this was wrong until ea6d290c which pins task->signal to task_struct. It checks ->sighand != NULL under rcu, but this can't help if ->signal can go away. Fortunately the caller either holds ->siglock, or it is fastpath_timer_check() which uses current and checks exit_state == 0. - Since ea6d290c commit tsk->signal is stable, we can read it first and avoid the initialization from INIT_CPUTIME. - Even if tsk->signal is always valid, we still have to check it is safe to use next_thread() under rcu_read_lock(). Currently the code checks ->sighand != NULL, change it to use pid_alive() which is commonly used to ensure the task wasn't unhashed before we take rcu_read_lock(). Add the comment to explain this check. - Change the main loop to use the while_each_thread() helper. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100610230956.GA25921@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/posix-cpu-timers.c | 21 +++++++-------------- 1 files changed, 7 insertions(+), 14 deletions(-) diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c index 9829646..bf2a650 100644 --- a/kernel/posix-cpu-timers.c +++ b/kernel/posix-cpu-timers.c @@ -232,31 +232,24 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p, void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) { - struct sighand_struct *sighand; - struct signal_struct *sig; + struct signal_struct *sig = tsk->signal; struct task_struct *t; - *times = INIT_CPUTIME; + times->utime = sig->utime; + times->stime = sig->stime; + times->sum_exec_runtime = sig->sum_sched_runtime; rcu_read_lock(); - sighand = rcu_dereference(tsk->sighand); - if (!sighand) + /* make sure we can trust tsk->thread_group list */ + if (!likely(pid_alive(tsk))) goto out; - sig = tsk->signal; - t = tsk; do { times->utime = cputime_add(times->utime, t->utime); times->stime = cputime_add(times->stime, t->stime); times->sum_exec_runtime += t->se.sum_exec_runtime; - - t = next_thread(t); - } while (t != tsk); - - times->utime = cputime_add(times->utime, sig->utime); - times->stime = cputime_add(times->stime, sig->stime); - times->sum_exec_runtime += sig->sum_sched_runtime; + } while_each_thread(tsk, t); out: rcu_read_unlock(); } ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-06-18 10:21 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov 2010-06-11 14:35 ` Stanislaw Gruszka 2010-06-11 15:15 ` Oleg Nesterov 2010-06-11 16:40 ` Stanislaw Gruszka 2010-06-11 16:57 ` Oleg Nesterov 2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.