public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] Avoid race w/ posix-cpu-timer and exiting tasks
@ 2006-06-13  0:25 john stultz
  0 siblings, 0 replies; 4+ messages in thread
From: john stultz @ 2006-06-13  0:25 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, Steven Rostedt, lkml

Hey Ingo,
	We've occasionally come across OOPSes in posix-cpu-timer thread (as
well as tripping over the BUG_ON(tsk->exit_state there) where it appears
the task we're processing exits out on us while we're using it. 

Thus this fix tries to avoid running the posix-cpu-timers on a task that
is exiting.

I'm not sure if it is the proper fix, so I wanted some extra eyes to
look it over. We're testing it to see if we can still trigger any of the
OOPSes (the BUG_ON is removed, so that won't catch us anymore), but if
you have any thoughts I'd be interested in them.

thanks
-john

--- 2.6-rt/kernel/posix-cpu-timers.c	2006-06-11 15:38:58.000000000 -0500
+++ devrt/kernel/posix-cpu-timers.c	2006-06-12 10:52:20.000000000 -0500
@@ -1290,12 +1290,15 @@
 
 #undef	UNEXPIRED
 
-	BUG_ON(tsk->exit_state);
-
 	/*
 	 * Double-check with locks held.
 	 */
 	read_lock(&tasklist_lock);
+	/* Make sure the task doesn't exit under us. */
+	if(unlikely(tsk->exit_state)) {
+		read_unlock(&tasklist_lock);
+		return;
+	}
 	spin_lock(&tsk->sighand->siglock);
 
 	/*




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH] Avoid race w/ posix-cpu-timer and exiting tasks
  2006-06-14  2:49 [RFC][PATCH] Avoid race w/ posix-cpu-timer and exiting tasks Oleg Nesterov
@ 2006-06-13 23:05 ` john stultz
  2006-06-15  1:24   ` Oleg Nesterov
  0 siblings, 1 reply; 4+ messages in thread
From: john stultz @ 2006-06-13 23:05 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Thomas Gleixner, Steven Rostedt, linux-kernel,
	Roland McGrath

On Wed, 2006-06-14 at 06:49 +0400, Oleg Nesterov wrote:
> john stultz wrote:
> >
> > Hey Ingo,
> > 	We've occasionally come across OOPSes in posix-cpu-timer thread (as
> > well as tripping over the BUG_ON(tsk->exit_state there) where it appears
> > the task we're processing exits out on us while we're using it. 
> >
> > Thus this fix tries to avoid running the posix-cpu-timers on a task that
> > is exiting.
> >
> > --- 2.6-rt/kernel/posix-cpu-timers.c	2006-06-11 15:38:58.000000000 -0500
> > +++ devrt/kernel/posix-cpu-timers.c	2006-06-12 10:52:20.000000000 -0500
> > @@ -1290,12 +1290,15 @@
> >
> >  #undef	UNEXPIRED
> >
> > -	BUG_ON(tsk->exit_state);
> > -
> >  	/*
> >  	 * Double-check with locks held.
> >  	 */
> >  	read_lock(&tasklist_lock);
> > +	/* Make sure the task doesn't exit under us. */
> > +	if(unlikely(tsk->exit_state)) {
> > +		read_unlock(&tasklist_lock);
> > +		return;
> > +	}
> >  	spin_lock(&tsk->sighand->siglock);
> 
> I strongly believe this BUG_ON() is indeed wrong, and I did a similar patch
> a long ago:
> 
> 	[PATCH] posix-timers: remove false BUG_ON() from run_posix_cpu_timers()
> 	Commit 3de463c7d9d58f8cf3395268230cb20a4c15bffa

Thanks for the pointer! Hmm.  That patch is very similar to what I'm
trying to avoid. 

Just to be clear for everyone else, that commit id is for Linus' git
tree. The patch I'm proposing is against the -rt tree, where the
run_posix_timers() function has been converted to a kthread instead of
being run from the timer interrupt context.

This allows the possibility of the the run_posix_timers_thread racing
against the task its running on behalf and the task exiting, and I hope
that is what my patch resolves (although I'm not confident its 100%).

The tsk->signal check from the patch above looks like it would avoid
this as well. Is there a specific benefit to checking that over
exit_state?

If anyone has further feedback or info about why the above was reverted
it would be appreciated.

thanks
-john







^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH] Avoid race w/ posix-cpu-timer and exiting tasks
@ 2006-06-14  2:49 Oleg Nesterov
  2006-06-13 23:05 ` john stultz
  0 siblings, 1 reply; 4+ messages in thread
From: Oleg Nesterov @ 2006-06-14  2:49 UTC (permalink / raw)
  To: john stultz
  Cc: Ingo Molnar, Thomas Gleixner, Steven Rostedt, linux-kernel,
	Roland McGrath

john stultz wrote:
>
> Hey Ingo,
> 	We've occasionally come across OOPSes in posix-cpu-timer thread (as
> well as tripping over the BUG_ON(tsk->exit_state there) where it appears
> the task we're processing exits out on us while we're using it. 
>
> Thus this fix tries to avoid running the posix-cpu-timers on a task that
> is exiting.
>
> --- 2.6-rt/kernel/posix-cpu-timers.c	2006-06-11 15:38:58.000000000 -0500
> +++ devrt/kernel/posix-cpu-timers.c	2006-06-12 10:52:20.000000000 -0500
> @@ -1290,12 +1290,15 @@
>
>  #undef	UNEXPIRED
>
> -	BUG_ON(tsk->exit_state);
> -
>  	/*
>  	 * Double-check with locks held.
>  	 */
>  	read_lock(&tasklist_lock);
> +	/* Make sure the task doesn't exit under us. */
> +	if(unlikely(tsk->exit_state)) {
> +		read_unlock(&tasklist_lock);
> +		return;
> +	}
>  	spin_lock(&tsk->sighand->siglock);

I strongly believe this BUG_ON() is indeed wrong, and I did a similar patch
a long ago:

	[PATCH] posix-timers: remove false BUG_ON() from run_posix_cpu_timers()
	Commit 3de463c7d9d58f8cf3395268230cb20a4c15bffa

However it was reverted due to some unclear problems (I think those problems
were not related to this patch).

Instead this patch was added:

	[PATCH] Yet more posix-cpu-timer fixes
	Commit 3de463c7d9d58f8cf3395268230cb20a4c15bffa

and I still think this patch is not correct.

Quoting myself:
>
> Roland McGrath wrote:
> >
> > @@ -566,6 +566,9 @@ static void arm_timer(struct k_itimer *t
> >         struct cpu_timer_list *next;
> >         unsigned long i;
> >
> > +       if (CPUCLOCK_PERTHREAD(timer->it_clock) && (p->flags & PF_EXITING))
> > +               return;
> > +
>
> Why CPUCLOCK_PERTHREAD() ?.
>
> Also, this is racy, no? Why should arm_timer() see PF_EXITING which is
> set on another cpu without any barriers/locking? After all, arm_timer()
> can test PF_EXITING before do_exit() sets this flag, but set ->it_xxx_expires
> after do_exit() resets it.

Oleg.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH] Avoid race w/ posix-cpu-timer and exiting tasks
  2006-06-13 23:05 ` john stultz
@ 2006-06-15  1:24   ` Oleg Nesterov
  0 siblings, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2006-06-15  1:24 UTC (permalink / raw)
  To: john stultz
  Cc: Ingo Molnar, Thomas Gleixner, Steven Rostedt, linux-kernel,
	Roland McGrath

On 06/13, john stultz wrote:
>
> The tsk->signal check from the patch above looks like it would avoid
> this as well. Is there a specific benefit to checking that over
> exit_state?

->exit_state is protected by tasklist_lock, and it would be nice to
avoid it in run_posix_cpu_timers(). (I guess we could remove it right
now, but I forgot the code). Yes, currently it doesn't matter because
tsk == current.

Personally I dislike the testing of ->exit_state != 0 because unlike
PF_EXITING or ->sighand/->signal it is changed from 0 to 1 in the middle
of do_exit() path. Imho it should be used only by do_exit/do_wait path,
but maybe this is just me.

Btw, I think there is another problem,

	check_process_timers:

			t = tsk;
			do {

				...
				
				do {
					t = next_thread(t);
				} while (unlikely(t->flags & PF_EXITING));
			} while (t != tsk);


This can hang if the local timer interrupt happens right after do_exit()
sets PF_EXITING ?

Oleg.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-06-14 21:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-14  2:49 [RFC][PATCH] Avoid race w/ posix-cpu-timer and exiting tasks Oleg Nesterov
2006-06-13 23:05 ` john stultz
2006-06-15  1:24   ` Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2006-06-13  0:25 john stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox