[PATCH 4/5] thread_group_cputime: simplify, document the "alive" check

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
@ 2010-06-10 23:09 Oleg Nesterov
  2010-06-11 14:35 ` Stanislaw Gruszka
  2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov
  0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2010-06-10 23:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stanislaw Gruszka, Thomas Gleixner, linux-kernel

thread_group_cputime() looks as if it is rcu-safe, but in fact this
was wrong until ea6d290c which pins task->signal to task_struct.
It checks ->sighand != NULL under rcu, but this can't help if ->signal
can go away. Fortunately the caller either holds ->siglock, or it is
fastpath_timer_check() which uses current and checks exit_state == 0.

- Since ea6d290c commit tsk->signal is stable, we can read it first
  and avoid the initialization from INIT_CPUTIME.

- Even if tsk->signal is always valid, we still have to check it
  is safe to use next_thread() under rcu_read_lock(). Currently
  the code checks ->sighand != NULL, change it to use pid_alive()
  which is commonly used to ensure the task wasn't unhashed before
  we take rcu_read_lock().

  Add the comment to explain this check.

- Change the main loop to use the while_each_thread() helper.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 kernel/posix-cpu-timers.c |   21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

--- 35-rc2/kernel/posix-cpu-timers.c~4_TG_CPUTIME	2010-06-11 00:47:33.000000000 +0200
+++ 35-rc2/kernel/posix-cpu-timers.c	2010-06-11 01:07:48.000000000 +0200
@@ -232,31 +232,24 @@ static int cpu_clock_sample(const clocki
 
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 {
-	struct sighand_struct *sighand;
-	struct signal_struct *sig;
+	struct signal_struct *sig = tsk->signal;
 	struct task_struct *t;
 
-	*times = INIT_CPUTIME;
+	times->utime = sig->utime;
+	times->stime = sig->stime;
+	times->sum_exec_runtime = sig->sum_sched_runtime;
 
 	rcu_read_lock();
-	sighand = rcu_dereference(tsk->sighand);
-	if (!sighand)
+	/* make sure we can trust tsk->thread_group list */
+	if (!likely(pid_alive(tsk)))
 		goto out;
 
-	sig = tsk->signal;
-
 	t = tsk;
 	do {
 		times->utime = cputime_add(times->utime, t->utime);
 		times->stime = cputime_add(times->stime, t->stime);
 		times->sum_exec_runtime += t->se.sum_exec_runtime;
-
-		t = next_thread(t);
-	} while (t != tsk);
-
-	times->utime = cputime_add(times->utime, sig->utime);
-	times->stime = cputime_add(times->stime, sig->stime);
-	times->sum_exec_runtime += sig->sum_sched_runtime;
+	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();
 }


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
  2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov
@ 2010-06-11 14:35 ` Stanislaw Gruszka
  2010-06-11 15:15   ` Oleg Nesterov
  2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov
  1 sibling, 1 reply; 6+ messages in thread
From: Stanislaw Gruszka @ 2010-06-11 14:35 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel

On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote:
> thread_group_cputime() looks as if it is rcu-safe, but in fact this
> was wrong until ea6d290c which pins task->signal to task_struct.
> It checks ->sighand != NULL under rcu, but this can't help if ->signal
> can go away. Fortunately the caller either holds ->siglock, or it is
> fastpath_timer_check() which uses current and checks exit_state == 0.

Hmm, I thought we avoided calling thread_group_cputime() from
fastpatch_timer_check(), but seems it is still possible when we
call run_posix_cpu_timers() on two different cpus simultaneously ...

> - Since ea6d290c commit tsk->signal is stable, we can read it first
>   and avoid the initialization from INIT_CPUTIME.
> 
> - Even if tsk->signal is always valid, we still have to check it
>   is safe to use next_thread() under rcu_read_lock(). Currently
>   the code checks ->sighand != NULL, change it to use pid_alive()
>   which is commonly used to ensure the task wasn't unhashed before
>   we take rcu_read_lock().

I'm not sure how important are values of almost dead task, but
perhaps would be better to return times form all threads
using as base sig->curr_target in loop.

Stanislaw

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
  2010-06-11 14:35 ` Stanislaw Gruszka
@ 2010-06-11 15:15   ` Oleg Nesterov
  2010-06-11 16:40     ` Stanislaw Gruszka
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2010-06-11 15:15 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel

On 06/11, Stanislaw Gruszka wrote:
>
> On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote:
> > thread_group_cputime() looks as if it is rcu-safe, but in fact this
> > was wrong until ea6d290c which pins task->signal to task_struct.
> > It checks ->sighand != NULL under rcu, but this can't help if ->signal
> > can go away. Fortunately the caller either holds ->siglock, or it is
> > fastpath_timer_check() which uses current and checks exit_state == 0.
>
> Hmm, I thought we avoided calling thread_group_cputime() from
> fastpatch_timer_check(), but seems it is still possible when we
> call run_posix_cpu_timers() on two different cpus simultaneously ...

No, we can't. thread_group_cputimer() does test-and-set ->running
under cputimer->lock.

But when I sent these patches, I realized we have another race here
(with or without these patches). I am already doing the fix.

> > - Since ea6d290c commit tsk->signal is stable, we can read it first
> >   and avoid the initialization from INIT_CPUTIME.
> >
> > - Even if tsk->signal is always valid, we still have to check it
> >   is safe to use next_thread() under rcu_read_lock(). Currently
> >   the code checks ->sighand != NULL, change it to use pid_alive()
> >   which is commonly used to ensure the task wasn't unhashed before
> >   we take rcu_read_lock().
>
> I'm not sure how important are values of almost dead task, but
> perhaps would be better to return times form all threads
> using as base sig->curr_target in loop.

Could you clarify?

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
  2010-06-11 15:15   ` Oleg Nesterov
@ 2010-06-11 16:40     ` Stanislaw Gruszka
  2010-06-11 16:57       ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Stanislaw Gruszka @ 2010-06-11 16:40 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel

On Fri, Jun 11, 2010 at 05:15:33PM +0200, Oleg Nesterov wrote:
> On 06/11, Stanislaw Gruszka wrote:
> >
> > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote:
> > > thread_group_cputime() looks as if it is rcu-safe, but in fact this
> > > was wrong until ea6d290c which pins task->signal to task_struct.
> > > It checks ->sighand != NULL under rcu, but this can't help if ->signal
> > > can go away. Fortunately the caller either holds ->siglock, or it is
> > > fastpath_timer_check() which uses current and checks exit_state == 0.
> >
> > Hmm, I thought we avoided calling thread_group_cputime() from
> > fastpatch_timer_check(), but seems it is still possible when we
> > call run_posix_cpu_timers() on two different cpus simultaneously ...
> 
> No, we can't. thread_group_cputimer() does test-and-set ->running
> under cputimer->lock.
> 
> But when I sent these patches, I realized we have another race here
> (with or without these patches). I am already doing the fix.

Don't know what you catch, I was thinking about:

cpu0							cpu1

fastpath_timer_check():

if (sig->cputimer.running) {
  struct task_cputime group_sample;
							stop_process_timers():	
        						
							spin_lock_irqsave(&cputimer->lock, flags);
        						cputimer->running = 0;
        						spin_unlock_irqrestore(&cputimer->lock, flags);
				
  thread_group_cputimer(tsk, &group_sample);
					
> > > - Since ea6d290c commit tsk->signal is stable, we can read it first
> > >   and avoid the initialization from INIT_CPUTIME.
> > >
> > > - Even if tsk->signal is always valid, we still have to check it
> > >   is safe to use next_thread() under rcu_read_lock(). Currently
> > >   the code checks ->sighand != NULL, change it to use pid_alive()
> > >   which is commonly used to ensure the task wasn't unhashed before
> > >   we take rcu_read_lock().
> >
> > I'm not sure how important are values of almost dead task, but
> > perhaps would be better to return times form all threads
> > using as base sig->curr_target in loop.
> 
> Could you clarify?

Avoid pid_alive check and loop starting from sig->curr_target:
 
       t = tsk = sig->curr_target;
       do {
                times->utime = cputime_add(times->utime, t->utime);
                times->stime = cputime_add(times->stime, t->stime);
                times->sum_exec_runtime += t->se.sum_exec_runtime;
       } while_each_thread(tsk, t);

I don't know what are rules regarding accessing sig->curr_target, but
if this is done under sighand->siglock we should be safe. Question
if if we always have lock taken, we tried to assure that in the past,
but if we really do?
 
Stanislaw

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
  2010-06-11 16:40     ` Stanislaw Gruszka
@ 2010-06-11 16:57       ` Oleg Nesterov
  0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2010-06-11 16:57 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel

On 06/11, Stanislaw Gruszka wrote:
>
> On Fri, Jun 11, 2010 at 05:15:33PM +0200, Oleg Nesterov wrote:
> > On 06/11, Stanislaw Gruszka wrote:
> > >
> > > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote:
> > > > thread_group_cputime() looks as if it is rcu-safe, but in fact this
> > > > was wrong until ea6d290c which pins task->signal to task_struct.
> > > > It checks ->sighand != NULL under rcu, but this can't help if ->signal
> > > > can go away. Fortunately the caller either holds ->siglock, or it is
> > > > fastpath_timer_check() which uses current and checks exit_state == 0.
> > >
> > > Hmm, I thought we avoided calling thread_group_cputime() from
> > > fastpatch_timer_check(), but seems it is still possible when we
> > > call run_posix_cpu_timers() on two different cpus simultaneously ...
> >
> > No, we can't. thread_group_cputimer() does test-and-set ->running
> > under cputimer->lock.
> >
> > But when I sent these patches, I realized we have another race here
> > (with or without these patches). I am already doing the fix.
>
> Don't know what you catch, I was thinking about:
>
> cpu0							cpu1
>
> fastpath_timer_check():
>
> if (sig->cputimer.running) {
>   struct task_cputime group_sample;
> 							stop_process_timers():	
>         						
> 							spin_lock_irqsave(&cputimer->lock, flags);
>         						cputimer->running = 0;
>         						spin_unlock_irqrestore(&cputimer->lock, flags);
> 				
>   thread_group_cputimer(tsk, &group_sample);

Yes, I was thinking about this race too. Please wait a bit, I'll send
the patch.

In short: it is safe to call thread_group_cputime() lockless, but
thread_group_cputimer() must not be called without siglock/tasklist
(oh, and imho we should rename them somehow, their names are almost
 identical). And in fact fastpath_timer_check() does not need
thread_group_cputimer().

> > > > - Since ea6d290c commit tsk->signal is stable, we can read it first
> > > >   and avoid the initialization from INIT_CPUTIME.
> > > >
> > > > - Even if tsk->signal is always valid, we still have to check it
> > > >   is safe to use next_thread() under rcu_read_lock(). Currently
> > > >   the code checks ->sighand != NULL, change it to use pid_alive()
> > > >   which is commonly used to ensure the task wasn't unhashed before
> > > >   we take rcu_read_lock().
> > >
> > > I'm not sure how important are values of almost dead task, but
> > > perhaps would be better to return times form all threads
> > > using as base sig->curr_target in loop.
> >
> > Could you clarify?
>
> Avoid pid_alive check and loop starting from sig->curr_target:
>
>        t = tsk = sig->curr_target;
>        do {
>                 times->utime = cputime_add(times->utime, t->utime);
>                 times->stime = cputime_add(times->stime, t->stime);
>                 times->sum_exec_runtime += t->se.sum_exec_runtime;
>        } while_each_thread(tsk, t);
>
> I don't know what are rules regarding accessing sig->curr_target, but
> if this is done under sighand->siglock we should be safe. Question
> if if we always have lock taken, we tried to assure that in the past,
> but if we really do?

Ah, you are talking about thread_group_cputime().

Without ->siglock this is not safe. We can change __exit_signal() to
nullify ->curr_target in the group_dead case, then the code above
could check sig->curr_target != NULL.

But this is too subtle imho, and not needed. Instead we should move
group_leader into ->signal (and kill signal->leader_pid). I am going
to do more cleanups in this area "later".


Anyway. This all has nothing to do with this patch. The 4/5 change
in thread_group_cputime() is cleanup, and it ccan help to make
/proc/pid/stat /proc/pid/status lockless.

With or without 5/5 thread_group_cputime() can be called lockless
and race with exit/fork. This is fine by itself, but this is wrong
because the caller sets ->running.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:sched/core] sched: thread_group_cputime: Simplify, document the "alive" check
  2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov
  2010-06-11 14:35 ` Stanislaw Gruszka
@ 2010-06-18 10:20 ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Oleg Nesterov @ 2010-06-18 10:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, oleg, tglx, mingo

Commit-ID:  bfac7009180901f57f20a73c53c3e57b1ce75a1b
Gitweb:     http://git.kernel.org/tip/bfac7009180901f57f20a73c53c3e57b1ce75a1b
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Fri, 11 Jun 2010 01:09:56 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 18 Jun 2010 10:46:56 +0200

sched: thread_group_cputime: Simplify, document the "alive" check

thread_group_cputime() looks as if it is rcu-safe, but in fact this
was wrong until ea6d290c which pins task->signal to task_struct.
It checks ->sighand != NULL under rcu, but this can't help if ->signal
can go away. Fortunately the caller either holds ->siglock, or it is
fastpath_timer_check() which uses current and checks exit_state == 0.

- Since ea6d290c commit tsk->signal is stable, we can read it first
  and avoid the initialization from INIT_CPUTIME.

- Even if tsk->signal is always valid, we still have to check it
  is safe to use next_thread() under rcu_read_lock(). Currently
  the code checks ->sighand != NULL, change it to use pid_alive()
  which is commonly used to ensure the task wasn't unhashed before
  we take rcu_read_lock().

  Add the comment to explain this check.

- Change the main loop to use the while_each_thread() helper.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100610230956.GA25921@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/posix-cpu-timers.c |   21 +++++++--------------
 1 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 9829646..bf2a650 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -232,31 +232,24 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 {
-	struct sighand_struct *sighand;
-	struct signal_struct *sig;
+	struct signal_struct *sig = tsk->signal;
 	struct task_struct *t;
 
-	*times = INIT_CPUTIME;
+	times->utime = sig->utime;
+	times->stime = sig->stime;
+	times->sum_exec_runtime = sig->sum_sched_runtime;
 
 	rcu_read_lock();
-	sighand = rcu_dereference(tsk->sighand);
-	if (!sighand)
+	/* make sure we can trust tsk->thread_group list */
+	if (!likely(pid_alive(tsk)))
 		goto out;
 
-	sig = tsk->signal;
-
 	t = tsk;
 	do {
 		times->utime = cputime_add(times->utime, t->utime);
 		times->stime = cputime_add(times->stime, t->stime);
 		times->sum_exec_runtime += t->se.sum_exec_runtime;
-
-		t = next_thread(t);
-	} while (t != tsk);
-
-	times->utime = cputime_add(times->utime, sig->utime);
-	times->stime = cputime_add(times->stime, sig->stime);
-	times->sum_exec_runtime += sig->sum_sched_runtime;
+	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-06-18 10:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov
2010-06-11 14:35 ` Stanislaw Gruszka
2010-06-11 15:15   ` Oleg Nesterov
2010-06-11 16:40     ` Stanislaw Gruszka
2010-06-11 16:57       ` Oleg Nesterov
2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.