* [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
@ 2010-06-10 23:09 Oleg Nesterov
2010-06-11 14:35 ` Stanislaw Gruszka
2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov
0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2010-06-10 23:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Stanislaw Gruszka, Thomas Gleixner, linux-kernel
thread_group_cputime() looks as if it is rcu-safe, but in fact this
was wrong until ea6d290c which pins task->signal to task_struct.
It checks ->sighand != NULL under rcu, but this can't help if ->signal
can go away. Fortunately the caller either holds ->siglock, or it is
fastpath_timer_check() which uses current and checks exit_state == 0.
- Since ea6d290c commit tsk->signal is stable, we can read it first
and avoid the initialization from INIT_CPUTIME.
- Even if tsk->signal is always valid, we still have to check it
is safe to use next_thread() under rcu_read_lock(). Currently
the code checks ->sighand != NULL, change it to use pid_alive()
which is commonly used to ensure the task wasn't unhashed before
we take rcu_read_lock().
Add the comment to explain this check.
- Change the main loop to use the while_each_thread() helper.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/posix-cpu-timers.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
--- 35-rc2/kernel/posix-cpu-timers.c~4_TG_CPUTIME 2010-06-11 00:47:33.000000000 +0200
+++ 35-rc2/kernel/posix-cpu-timers.c 2010-06-11 01:07:48.000000000 +0200
@@ -232,31 +232,24 @@ static int cpu_clock_sample(const clocki
void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
{
- struct sighand_struct *sighand;
- struct signal_struct *sig;
+ struct signal_struct *sig = tsk->signal;
struct task_struct *t;
- *times = INIT_CPUTIME;
+ times->utime = sig->utime;
+ times->stime = sig->stime;
+ times->sum_exec_runtime = sig->sum_sched_runtime;
rcu_read_lock();
- sighand = rcu_dereference(tsk->sighand);
- if (!sighand)
+ /* make sure we can trust tsk->thread_group list */
+ if (!likely(pid_alive(tsk)))
goto out;
- sig = tsk->signal;
-
t = tsk;
do {
times->utime = cputime_add(times->utime, t->utime);
times->stime = cputime_add(times->stime, t->stime);
times->sum_exec_runtime += t->se.sum_exec_runtime;
-
- t = next_thread(t);
- } while (t != tsk);
-
- times->utime = cputime_add(times->utime, sig->utime);
- times->stime = cputime_add(times->stime, sig->stime);
- times->sum_exec_runtime += sig->sum_sched_runtime;
+ } while_each_thread(tsk, t);
out:
rcu_read_unlock();
}
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check 2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov @ 2010-06-11 14:35 ` Stanislaw Gruszka 2010-06-11 15:15 ` Oleg Nesterov 2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov 1 sibling, 1 reply; 6+ messages in thread From: Stanislaw Gruszka @ 2010-06-11 14:35 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote: > thread_group_cputime() looks as if it is rcu-safe, but in fact this > was wrong until ea6d290c which pins task->signal to task_struct. > It checks ->sighand != NULL under rcu, but this can't help if ->signal > can go away. Fortunately the caller either holds ->siglock, or it is > fastpath_timer_check() which uses current and checks exit_state == 0. Hmm, I thought we avoided calling thread_group_cputime() from fastpatch_timer_check(), but seems it is still possible when we call run_posix_cpu_timers() on two different cpus simultaneously ... > - Since ea6d290c commit tsk->signal is stable, we can read it first > and avoid the initialization from INIT_CPUTIME. > > - Even if tsk->signal is always valid, we still have to check it > is safe to use next_thread() under rcu_read_lock(). Currently > the code checks ->sighand != NULL, change it to use pid_alive() > which is commonly used to ensure the task wasn't unhashed before > we take rcu_read_lock(). I'm not sure how important are values of almost dead task, but perhaps would be better to return times form all threads using as base sig->curr_target in loop. Stanislaw ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check 2010-06-11 14:35 ` Stanislaw Gruszka @ 2010-06-11 15:15 ` Oleg Nesterov 2010-06-11 16:40 ` Stanislaw Gruszka 0 siblings, 1 reply; 6+ messages in thread From: Oleg Nesterov @ 2010-06-11 15:15 UTC (permalink / raw) To: Stanislaw Gruszka Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel On 06/11, Stanislaw Gruszka wrote: > > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote: > > thread_group_cputime() looks as if it is rcu-safe, but in fact this > > was wrong until ea6d290c which pins task->signal to task_struct. > > It checks ->sighand != NULL under rcu, but this can't help if ->signal > > can go away. Fortunately the caller either holds ->siglock, or it is > > fastpath_timer_check() which uses current and checks exit_state == 0. > > Hmm, I thought we avoided calling thread_group_cputime() from > fastpatch_timer_check(), but seems it is still possible when we > call run_posix_cpu_timers() on two different cpus simultaneously ... No, we can't. thread_group_cputimer() does test-and-set ->running under cputimer->lock. But when I sent these patches, I realized we have another race here (with or without these patches). I am already doing the fix. > > - Since ea6d290c commit tsk->signal is stable, we can read it first > > and avoid the initialization from INIT_CPUTIME. > > > > - Even if tsk->signal is always valid, we still have to check it > > is safe to use next_thread() under rcu_read_lock(). Currently > > the code checks ->sighand != NULL, change it to use pid_alive() > > which is commonly used to ensure the task wasn't unhashed before > > we take rcu_read_lock(). > > I'm not sure how important are values of almost dead task, but > perhaps would be better to return times form all threads > using as base sig->curr_target in loop. Could you clarify? Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check 2010-06-11 15:15 ` Oleg Nesterov @ 2010-06-11 16:40 ` Stanislaw Gruszka 2010-06-11 16:57 ` Oleg Nesterov 0 siblings, 1 reply; 6+ messages in thread From: Stanislaw Gruszka @ 2010-06-11 16:40 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel On Fri, Jun 11, 2010 at 05:15:33PM +0200, Oleg Nesterov wrote: > On 06/11, Stanislaw Gruszka wrote: > > > > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote: > > > thread_group_cputime() looks as if it is rcu-safe, but in fact this > > > was wrong until ea6d290c which pins task->signal to task_struct. > > > It checks ->sighand != NULL under rcu, but this can't help if ->signal > > > can go away. Fortunately the caller either holds ->siglock, or it is > > > fastpath_timer_check() which uses current and checks exit_state == 0. > > > > Hmm, I thought we avoided calling thread_group_cputime() from > > fastpatch_timer_check(), but seems it is still possible when we > > call run_posix_cpu_timers() on two different cpus simultaneously ... > > No, we can't. thread_group_cputimer() does test-and-set ->running > under cputimer->lock. > > But when I sent these patches, I realized we have another race here > (with or without these patches). I am already doing the fix. Don't know what you catch, I was thinking about: cpu0 cpu1 fastpath_timer_check(): if (sig->cputimer.running) { struct task_cputime group_sample; stop_process_timers(): spin_lock_irqsave(&cputimer->lock, flags); cputimer->running = 0; spin_unlock_irqrestore(&cputimer->lock, flags); thread_group_cputimer(tsk, &group_sample); > > > - Since ea6d290c commit tsk->signal is stable, we can read it first > > > and avoid the initialization from INIT_CPUTIME. > > > > > > - Even if tsk->signal is always valid, we still have to check it > > > is safe to use next_thread() under rcu_read_lock(). Currently > > > the code checks ->sighand != NULL, change it to use pid_alive() > > > which is commonly used to ensure the task wasn't unhashed before > > > we take rcu_read_lock(). > > > > I'm not sure how important are values of almost dead task, but > > perhaps would be better to return times form all threads > > using as base sig->curr_target in loop. > > Could you clarify? Avoid pid_alive check and loop starting from sig->curr_target: t = tsk = sig->curr_target; do { times->utime = cputime_add(times->utime, t->utime); times->stime = cputime_add(times->stime, t->stime); times->sum_exec_runtime += t->se.sum_exec_runtime; } while_each_thread(tsk, t); I don't know what are rules regarding accessing sig->curr_target, but if this is done under sighand->siglock we should be safe. Question if if we always have lock taken, we tried to assure that in the past, but if we really do? Stanislaw ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check 2010-06-11 16:40 ` Stanislaw Gruszka @ 2010-06-11 16:57 ` Oleg Nesterov 0 siblings, 0 replies; 6+ messages in thread From: Oleg Nesterov @ 2010-06-11 16:57 UTC (permalink / raw) To: Stanislaw Gruszka Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel On 06/11, Stanislaw Gruszka wrote: > > On Fri, Jun 11, 2010 at 05:15:33PM +0200, Oleg Nesterov wrote: > > On 06/11, Stanislaw Gruszka wrote: > > > > > > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote: > > > > thread_group_cputime() looks as if it is rcu-safe, but in fact this > > > > was wrong until ea6d290c which pins task->signal to task_struct. > > > > It checks ->sighand != NULL under rcu, but this can't help if ->signal > > > > can go away. Fortunately the caller either holds ->siglock, or it is > > > > fastpath_timer_check() which uses current and checks exit_state == 0. > > > > > > Hmm, I thought we avoided calling thread_group_cputime() from > > > fastpatch_timer_check(), but seems it is still possible when we > > > call run_posix_cpu_timers() on two different cpus simultaneously ... > > > > No, we can't. thread_group_cputimer() does test-and-set ->running > > under cputimer->lock. > > > > But when I sent these patches, I realized we have another race here > > (with or without these patches). I am already doing the fix. > > Don't know what you catch, I was thinking about: > > cpu0 cpu1 > > fastpath_timer_check(): > > if (sig->cputimer.running) { > struct task_cputime group_sample; > stop_process_timers(): > > spin_lock_irqsave(&cputimer->lock, flags); > cputimer->running = 0; > spin_unlock_irqrestore(&cputimer->lock, flags); > > thread_group_cputimer(tsk, &group_sample); Yes, I was thinking about this race too. Please wait a bit, I'll send the patch. In short: it is safe to call thread_group_cputime() lockless, but thread_group_cputimer() must not be called without siglock/tasklist (oh, and imho we should rename them somehow, their names are almost identical). And in fact fastpath_timer_check() does not need thread_group_cputimer(). > > > > - Since ea6d290c commit tsk->signal is stable, we can read it first > > > > and avoid the initialization from INIT_CPUTIME. > > > > > > > > - Even if tsk->signal is always valid, we still have to check it > > > > is safe to use next_thread() under rcu_read_lock(). Currently > > > > the code checks ->sighand != NULL, change it to use pid_alive() > > > > which is commonly used to ensure the task wasn't unhashed before > > > > we take rcu_read_lock(). > > > > > > I'm not sure how important are values of almost dead task, but > > > perhaps would be better to return times form all threads > > > using as base sig->curr_target in loop. > > > > Could you clarify? > > Avoid pid_alive check and loop starting from sig->curr_target: > > t = tsk = sig->curr_target; > do { > times->utime = cputime_add(times->utime, t->utime); > times->stime = cputime_add(times->stime, t->stime); > times->sum_exec_runtime += t->se.sum_exec_runtime; > } while_each_thread(tsk, t); > > I don't know what are rules regarding accessing sig->curr_target, but > if this is done under sighand->siglock we should be safe. Question > if if we always have lock taken, we tried to assure that in the past, > but if we really do? Ah, you are talking about thread_group_cputime(). Without ->siglock this is not safe. We can change __exit_signal() to nullify ->curr_target in the group_dead case, then the code above could check sig->curr_target != NULL. But this is too subtle imho, and not needed. Instead we should move group_leader into ->signal (and kill signal->leader_pid). I am going to do more cleanups in this area "later". Anyway. This all has nothing to do with this patch. The 4/5 change in thread_group_cputime() is cleanup, and it ccan help to make /proc/pid/stat /proc/pid/status lockless. With or without 5/5 thread_group_cputime() can be called lockless and race with exit/fork. This is fine by itself, but this is wrong because the caller sets ->running. Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip:sched/core] sched: thread_group_cputime: Simplify, document the "alive" check 2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov 2010-06-11 14:35 ` Stanislaw Gruszka @ 2010-06-18 10:20 ` tip-bot for Oleg Nesterov 1 sibling, 0 replies; 6+ messages in thread From: tip-bot for Oleg Nesterov @ 2010-06-18 10:20 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, a.p.zijlstra, oleg, tglx, mingo Commit-ID: bfac7009180901f57f20a73c53c3e57b1ce75a1b Gitweb: http://git.kernel.org/tip/bfac7009180901f57f20a73c53c3e57b1ce75a1b Author: Oleg Nesterov <oleg@redhat.com> AuthorDate: Fri, 11 Jun 2010 01:09:56 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 18 Jun 2010 10:46:56 +0200 sched: thread_group_cputime: Simplify, document the "alive" check thread_group_cputime() looks as if it is rcu-safe, but in fact this was wrong until ea6d290c which pins task->signal to task_struct. It checks ->sighand != NULL under rcu, but this can't help if ->signal can go away. Fortunately the caller either holds ->siglock, or it is fastpath_timer_check() which uses current and checks exit_state == 0. - Since ea6d290c commit tsk->signal is stable, we can read it first and avoid the initialization from INIT_CPUTIME. - Even if tsk->signal is always valid, we still have to check it is safe to use next_thread() under rcu_read_lock(). Currently the code checks ->sighand != NULL, change it to use pid_alive() which is commonly used to ensure the task wasn't unhashed before we take rcu_read_lock(). Add the comment to explain this check. - Change the main loop to use the while_each_thread() helper. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100610230956.GA25921@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/posix-cpu-timers.c | 21 +++++++-------------- 1 files changed, 7 insertions(+), 14 deletions(-) diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c index 9829646..bf2a650 100644 --- a/kernel/posix-cpu-timers.c +++ b/kernel/posix-cpu-timers.c @@ -232,31 +232,24 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p, void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) { - struct sighand_struct *sighand; - struct signal_struct *sig; + struct signal_struct *sig = tsk->signal; struct task_struct *t; - *times = INIT_CPUTIME; + times->utime = sig->utime; + times->stime = sig->stime; + times->sum_exec_runtime = sig->sum_sched_runtime; rcu_read_lock(); - sighand = rcu_dereference(tsk->sighand); - if (!sighand) + /* make sure we can trust tsk->thread_group list */ + if (!likely(pid_alive(tsk))) goto out; - sig = tsk->signal; - t = tsk; do { times->utime = cputime_add(times->utime, t->utime); times->stime = cputime_add(times->stime, t->stime); times->sum_exec_runtime += t->se.sum_exec_runtime; - - t = next_thread(t); - } while (t != tsk); - - times->utime = cputime_add(times->utime, sig->utime); - times->stime = cputime_add(times->stime, sig->stime); - times->sum_exec_runtime += sig->sum_sched_runtime; + } while_each_thread(tsk, t); out: rcu_read_unlock(); } ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-06-18 10:21 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov 2010-06-11 14:35 ` Stanislaw Gruszka 2010-06-11 15:15 ` Oleg Nesterov 2010-06-11 16:40 ` Stanislaw Gruszka 2010-06-11 16:57 ` Oleg Nesterov 2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox