[PATCH 4/5] thread_group_cputime: simplify, document the "alive" check

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
@ 2010-06-10 23:09 Oleg Nesterov
  2010-06-11 14:35 ` Stanislaw Gruszka
  2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov
  0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2010-06-10 23:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stanislaw Gruszka, Thomas Gleixner, linux-kernel

thread_group_cputime() looks as if it is rcu-safe, but in fact this
was wrong until ea6d290c which pins task->signal to task_struct.
It checks ->sighand != NULL under rcu, but this can't help if ->signal
can go away. Fortunately the caller either holds ->siglock, or it is
fastpath_timer_check() which uses current and checks exit_state == 0.

- Since ea6d290c commit tsk->signal is stable, we can read it first
  and avoid the initialization from INIT_CPUTIME.

- Even if tsk->signal is always valid, we still have to check it
  is safe to use next_thread() under rcu_read_lock(). Currently
  the code checks ->sighand != NULL, change it to use pid_alive()
  which is commonly used to ensure the task wasn't unhashed before
  we take rcu_read_lock().

  Add the comment to explain this check.

- Change the main loop to use the while_each_thread() helper.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 kernel/posix-cpu-timers.c |   21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

--- 35-rc2/kernel/posix-cpu-timers.c~4_TG_CPUTIME	2010-06-11 00:47:33.000000000 +0200
+++ 35-rc2/kernel/posix-cpu-timers.c	2010-06-11 01:07:48.000000000 +0200
@@ -232,31 +232,24 @@ static int cpu_clock_sample(const clocki
 
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 {
-	struct sighand_struct *sighand;
-	struct signal_struct *sig;
+	struct signal_struct *sig = tsk->signal;
 	struct task_struct *t;
 
-	*times = INIT_CPUTIME;
+	times->utime = sig->utime;
+	times->stime = sig->stime;
+	times->sum_exec_runtime = sig->sum_sched_runtime;
 
 	rcu_read_lock();
-	sighand = rcu_dereference(tsk->sighand);
-	if (!sighand)
+	/* make sure we can trust tsk->thread_group list */
+	if (!likely(pid_alive(tsk)))
 		goto out;
 
-	sig = tsk->signal;
-
 	t = tsk;
 	do {
 		times->utime = cputime_add(times->utime, t->utime);
 		times->stime = cputime_add(times->stime, t->stime);
 		times->sum_exec_runtime += t->se.sum_exec_runtime;
-
-		t = next_thread(t);
-	} while (t != tsk);
-
-	times->utime = cputime_add(times->utime, sig->utime);
-	times->stime = cputime_add(times->stime, sig->stime);
-	times->sum_exec_runtime += sig->sum_sched_runtime;
+	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();
 }


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
  2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov
@ 2010-06-11 14:35 ` Stanislaw Gruszka
  2010-06-11 15:15   ` Oleg Nesterov
  2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov
  1 sibling, 1 reply; 6+ messages in thread
From: Stanislaw Gruszka @ 2010-06-11 14:35 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel

On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote:
> thread_group_cputime() looks as if it is rcu-safe, but in fact this
> was wrong until ea6d290c which pins task->signal to task_struct.
> It checks ->sighand != NULL under rcu, but this can't help if ->signal
> can go away. Fortunately the caller either holds ->siglock, or it is
> fastpath_timer_check() which uses current and checks exit_state == 0.

Hmm, I thought we avoided calling thread_group_cputime() from
fastpatch_timer_check(), but seems it is still possible when we
call run_posix_cpu_timers() on two different cpus simultaneously ...

> - Since ea6d290c commit tsk->signal is stable, we can read it first
>   and avoid the initialization from INIT_CPUTIME.
> 
> - Even if tsk->signal is always valid, we still have to check it
>   is safe to use next_thread() under rcu_read_lock(). Currently
>   the code checks ->sighand != NULL, change it to use pid_alive()
>   which is commonly used to ensure the task wasn't unhashed before
>   we take rcu_read_lock().

I'm not sure how important are values of almost dead task, but
perhaps would be better to return times form all threads
using as base sig->curr_target in loop.

Stanislaw

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
  2010-06-11 14:35 ` Stanislaw Gruszka
@ 2010-06-11 15:15   ` Oleg Nesterov
  2010-06-11 16:40     ` Stanislaw Gruszka
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2010-06-11 15:15 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel

On 06/11, Stanislaw Gruszka wrote:
>
> On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote:
> > thread_group_cputime() looks as if it is rcu-safe, but in fact this
> > was wrong until ea6d290c which pins task->signal to task_struct.
> > It checks ->sighand != NULL under rcu, but this can't help if ->signal
> > can go away. Fortunately the caller either holds ->siglock, or it is
> > fastpath_timer_check() which uses current and checks exit_state == 0.
>
> Hmm, I thought we avoided calling thread_group_cputime() from
> fastpatch_timer_check(), but seems it is still possible when we
> call run_posix_cpu_timers() on two different cpus simultaneously ...

No, we can't. thread_group_cputimer() does test-and-set ->running
under cputimer->lock.

But when I sent these patches, I realized we have another race here
(with or without these patches). I am already doing the fix.

> > - Since ea6d290c commit tsk->signal is stable, we can read it first
> >   and avoid the initialization from INIT_CPUTIME.
> >
> > - Even if tsk->signal is always valid, we still have to check it
> >   is safe to use next_thread() under rcu_read_lock(). Currently
> >   the code checks ->sighand != NULL, change it to use pid_alive()
> >   which is commonly used to ensure the task wasn't unhashed before
> >   we take rcu_read_lock().
>
> I'm not sure how important are values of almost dead task, but
> perhaps would be better to return times form all threads
> using as base sig->curr_target in loop.

Could you clarify?

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
  2010-06-11 15:15   ` Oleg Nesterov
@ 2010-06-11 16:40     ` Stanislaw Gruszka
  2010-06-11 16:57       ` Oleg Nesterov
  0 siblings, 1 reply; 6+ messages in thread
From: Stanislaw Gruszka @ 2010-06-11 16:40 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel

On Fri, Jun 11, 2010 at 05:15:33PM +0200, Oleg Nesterov wrote:
> On 06/11, Stanislaw Gruszka wrote:
> >
> > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote:
> > > thread_group_cputime() looks as if it is rcu-safe, but in fact this
> > > was wrong until ea6d290c which pins task->signal to task_struct.
> > > It checks ->sighand != NULL under rcu, but this can't help if ->signal
> > > can go away. Fortunately the caller either holds ->siglock, or it is
> > > fastpath_timer_check() which uses current and checks exit_state == 0.
> >
> > Hmm, I thought we avoided calling thread_group_cputime() from
> > fastpatch_timer_check(), but seems it is still possible when we
> > call run_posix_cpu_timers() on two different cpus simultaneously ...
> 
> No, we can't. thread_group_cputimer() does test-and-set ->running
> under cputimer->lock.
> 
> But when I sent these patches, I realized we have another race here
> (with or without these patches). I am already doing the fix.

Don't know what you catch, I was thinking about:

cpu0							cpu1

fastpath_timer_check():

if (sig->cputimer.running) {
  struct task_cputime group_sample;
							stop_process_timers():	
        						
							spin_lock_irqsave(&cputimer->lock, flags);
        						cputimer->running = 0;
        						spin_unlock_irqrestore(&cputimer->lock, flags);
				
  thread_group_cputimer(tsk, &group_sample);
					
> > > - Since ea6d290c commit tsk->signal is stable, we can read it first
> > >   and avoid the initialization from INIT_CPUTIME.
> > >
> > > - Even if tsk->signal is always valid, we still have to check it
> > >   is safe to use next_thread() under rcu_read_lock(). Currently
> > >   the code checks ->sighand != NULL, change it to use pid_alive()
> > >   which is commonly used to ensure the task wasn't unhashed before
> > >   we take rcu_read_lock().
> >
> > I'm not sure how important are values of almost dead task, but
> > perhaps would be better to return times form all threads
> > using as base sig->curr_target in loop.
> 
> Could you clarify?

Avoid pid_alive check and loop starting from sig->curr_target:
 
       t = tsk = sig->curr_target;
       do {
                times->utime = cputime_add(times->utime, t->utime);
                times->stime = cputime_add(times->stime, t->stime);
                times->sum_exec_runtime += t->se.sum_exec_runtime;
       } while_each_thread(tsk, t);

I don't know what are rules regarding accessing sig->curr_target, but
if this is done under sighand->siglock we should be safe. Question
if if we always have lock taken, we tried to assure that in the past,
but if we really do?
 
Stanislaw

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check
  2010-06-11 16:40     ` Stanislaw Gruszka
@ 2010-06-11 16:57       ` Oleg Nesterov
  0 siblings, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2010-06-11 16:57 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel

On 06/11, Stanislaw Gruszka wrote:
>
> On Fri, Jun 11, 2010 at 05:15:33PM +0200, Oleg Nesterov wrote:
> > On 06/11, Stanislaw Gruszka wrote:
> > >
> > > On Fri, Jun 11, 2010 at 01:09:56AM +0200, Oleg Nesterov wrote:
> > > > thread_group_cputime() looks as if it is rcu-safe, but in fact this
> > > > was wrong until ea6d290c which pins task->signal to task_struct.
> > > > It checks ->sighand != NULL under rcu, but this can't help if ->signal
> > > > can go away. Fortunately the caller either holds ->siglock, or it is
> > > > fastpath_timer_check() which uses current and checks exit_state == 0.
> > >
> > > Hmm, I thought we avoided calling thread_group_cputime() from
> > > fastpatch_timer_check(), but seems it is still possible when we
> > > call run_posix_cpu_timers() on two different cpus simultaneously ...
> >
> > No, we can't. thread_group_cputimer() does test-and-set ->running
> > under cputimer->lock.
> >
> > But when I sent these patches, I realized we have another race here
> > (with or without these patches). I am already doing the fix.
>
> Don't know what you catch, I was thinking about:
>
> cpu0							cpu1
>
> fastpath_timer_check():
>
> if (sig->cputimer.running) {
>   struct task_cputime group_sample;
> 							stop_process_timers():	
>         						
> 							spin_lock_irqsave(&cputimer->lock, flags);
>         						cputimer->running = 0;
>         						spin_unlock_irqrestore(&cputimer->lock, flags);
> 				
>   thread_group_cputimer(tsk, &group_sample);

Yes, I was thinking about this race too. Please wait a bit, I'll send
the patch.

In short: it is safe to call thread_group_cputime() lockless, but
thread_group_cputimer() must not be called without siglock/tasklist
(oh, and imho we should rename them somehow, their names are almost
 identical). And in fact fastpath_timer_check() does not need
thread_group_cputimer().

> > > > - Since ea6d290c commit tsk->signal is stable, we can read it first
> > > >   and avoid the initialization from INIT_CPUTIME.
> > > >
> > > > - Even if tsk->signal is always valid, we still have to check it
> > > >   is safe to use next_thread() under rcu_read_lock(). Currently
> > > >   the code checks ->sighand != NULL, change it to use pid_alive()
> > > >   which is commonly used to ensure the task wasn't unhashed before
> > > >   we take rcu_read_lock().
> > >
> > > I'm not sure how important are values of almost dead task, but
> > > perhaps would be better to return times form all threads
> > > using as base sig->curr_target in loop.
> >
> > Could you clarify?
>
> Avoid pid_alive check and loop starting from sig->curr_target:
>
>        t = tsk = sig->curr_target;
>        do {
>                 times->utime = cputime_add(times->utime, t->utime);
>                 times->stime = cputime_add(times->stime, t->stime);
>                 times->sum_exec_runtime += t->se.sum_exec_runtime;
>        } while_each_thread(tsk, t);
>
> I don't know what are rules regarding accessing sig->curr_target, but
> if this is done under sighand->siglock we should be safe. Question
> if if we always have lock taken, we tried to assure that in the past,
> but if we really do?

Ah, you are talking about thread_group_cputime().

Without ->siglock this is not safe. We can change __exit_signal() to
nullify ->curr_target in the group_dead case, then the code above
could check sig->curr_target != NULL.

But this is too subtle imho, and not needed. Instead we should move
group_leader into ->signal (and kill signal->leader_pid). I am going
to do more cleanups in this area "later".


Anyway. This all has nothing to do with this patch. The 4/5 change
in thread_group_cputime() is cleanup, and it ccan help to make
/proc/pid/stat /proc/pid/status lockless.

With or without 5/5 thread_group_cputime() can be called lockless
and race with exit/fork. This is fine by itself, but this is wrong
because the caller sets ->running.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:sched/core] sched: thread_group_cputime: Simplify, document the "alive" check
  2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov
  2010-06-11 14:35 ` Stanislaw Gruszka
@ 2010-06-18 10:20 ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Oleg Nesterov @ 2010-06-18 10:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, oleg, tglx, mingo

Commit-ID:  bfac7009180901f57f20a73c53c3e57b1ce75a1b
Gitweb:     http://git.kernel.org/tip/bfac7009180901f57f20a73c53c3e57b1ce75a1b
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Fri, 11 Jun 2010 01:09:56 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 18 Jun 2010 10:46:56 +0200

sched: thread_group_cputime: Simplify, document the "alive" check

thread_group_cputime() looks as if it is rcu-safe, but in fact this
was wrong until ea6d290c which pins task->signal to task_struct.
It checks ->sighand != NULL under rcu, but this can't help if ->signal
can go away. Fortunately the caller either holds ->siglock, or it is
fastpath_timer_check() which uses current and checks exit_state == 0.

- Since ea6d290c commit tsk->signal is stable, we can read it first
  and avoid the initialization from INIT_CPUTIME.

- Even if tsk->signal is always valid, we still have to check it
  is safe to use next_thread() under rcu_read_lock(). Currently
  the code checks ->sighand != NULL, change it to use pid_alive()
  which is commonly used to ensure the task wasn't unhashed before
  we take rcu_read_lock().

  Add the comment to explain this check.

- Change the main loop to use the while_each_thread() helper.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100610230956.GA25921@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/posix-cpu-timers.c |   21 +++++++--------------
 1 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 9829646..bf2a650 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -232,31 +232,24 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 {
-	struct sighand_struct *sighand;
-	struct signal_struct *sig;
+	struct signal_struct *sig = tsk->signal;
 	struct task_struct *t;
 
-	*times = INIT_CPUTIME;
+	times->utime = sig->utime;
+	times->stime = sig->stime;
+	times->sum_exec_runtime = sig->sum_sched_runtime;
 
 	rcu_read_lock();
-	sighand = rcu_dereference(tsk->sighand);
-	if (!sighand)
+	/* make sure we can trust tsk->thread_group list */
+	if (!likely(pid_alive(tsk)))
 		goto out;
 
-	sig = tsk->signal;
-
 	t = tsk;
 	do {
 		times->utime = cputime_add(times->utime, t->utime);
 		times->stime = cputime_add(times->stime, t->stime);
 		times->sum_exec_runtime += t->se.sum_exec_runtime;
-
-		t = next_thread(t);
-	} while (t != tsk);
-
-	times->utime = cputime_add(times->utime, sig->utime);
-	times->stime = cputime_add(times->stime, sig->stime);
-	times->sum_exec_runtime += sig->sum_sched_runtime;
+	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-06-18 10:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-10 23:09 [PATCH 4/5] thread_group_cputime: simplify, document the "alive" check Oleg Nesterov
2010-06-11 14:35 ` Stanislaw Gruszka
2010-06-11 15:15   ` Oleg Nesterov
2010-06-11 16:40     ` Stanislaw Gruszka
2010-06-11 16:57       ` Oleg Nesterov
2010-06-18 10:20 ` [tip:sched/core] sched: thread_group_cputime: Simplify, " tip-bot for Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox