Q: down_killable() is racy? or schedule() is not right?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Q: down_killable() is racy? or schedule() is not right?
@ 2008-06-03 12:33 Oleg Nesterov
  2008-06-03 12:58 ` Matthew Wilcox
  2008-06-04 11:09 ` Dmitry Adamushko
  0 siblings, 2 replies; 5+ messages in thread
From: Oleg Nesterov @ 2008-06-03 12:33 UTC (permalink / raw)
  To: Ingo Molnar, Matthew Wilcox, Peter Zijlstra; +Cc: linux-kernel

I just noticed we have generic semaphores, a couple of questions.

	down():

		spin_lock_irqsave(&sem->lock, flags);
		...
		__down(sem);

Why _irqsave ? we must not do down() with irqs disabled, and of course
__down() restores/clears irqs unconditionally.

Another question,

	__down_common(TASK_KILLABLE):

			if (state == TASK_KILLABLE && fatal_signal_pending(task))
				goto interrupted;

			/* --- WINDOW --- */

			__set_task_state(task, TASK_KILLABLE);
			schedule_timeout(timeout);

This looks racy. If SIGKILL comes in the WINDOW above, the event is lost.
The task will wait for up() or timeout with the fatal signal pending, and
it is not possible to wakeup it via kill() again.

This is easy to fix, but I wonder if we should change schedule() instead.
Note that __down_common() does 2 checks,

		if (state == TASK_INTERRUPTIBLE && signal_pending(task))
			goto interrupted;
		if (state == TASK_KILLABLE && fatal_signal_pending(task))
			goto interrupted;

they look very symmetrical, but the first one is OK, and the second is racy.
Also, I think we have the similar issues with lock_page_killable().

How about something like

	int signal_pending_state(struct task_struct *tsk)
	{
		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
			return 0;
		if (signal_pending(tsk))
			return 0;

		return (state & TASK_INTERRUPTIBLE) ||
			__fatal_signal_pending(tsk);
	}

now,

	--- kernel/sched.c
	+++ kernel/sched.c
	@@ -4510,8 +4510,7 @@ need_resched_nonpreemptible:
		clear_tsk_need_resched(prev);

		if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
	-		if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
	-				signal_pending(prev))) {
	+		if (unlikely(signal_pending_state(prev))) {
				prev->state = TASK_RUNNING;
			} else {
				deactivate_task(rq, prev, 1);

Thoughts?

Oleg.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q: down_killable() is racy? or schedule() is not right?
  2008-06-03 12:33 Q: down_killable() is racy? or schedule() is not right? Oleg Nesterov
@ 2008-06-03 12:58 ` Matthew Wilcox
  2008-06-03 16:13   ` Oleg Nesterov
  2008-06-04 11:09 ` Dmitry Adamushko
  1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2008-06-03 12:58 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel

On Tue, Jun 03, 2008 at 04:33:09PM +0400, Oleg Nesterov wrote:
> I just noticed we have generic semaphores, a couple of questions.
> 
> 	down():
> 
> 		spin_lock_irqsave(&sem->lock, flags);
> 		...
> 		__down(sem);
> 
> Why _irqsave ? we must not do down() with irqs disabled, and of course
> __down() restores/clears irqs unconditionally.

How about reading the fine comments?

I would paste it, but Debian has fucked up my X copy and paste.  Line 13
of kernel/semaphore.c.

> 	__down_common(TASK_KILLABLE):
> 
> 			if (state == TASK_KILLABLE && fatal_signal_pending(task))
> 				goto interrupted;
> 
> 			/* --- WINDOW --- */
> 
> 			__set_task_state(task, TASK_KILLABLE);
> 			schedule_timeout(timeout);
> 
> This looks racy. If SIGKILL comes in the WINDOW above, the event is lost.
> The task will wait for up() or timeout with the fatal signal pending, and
> it is not possible to wakeup it via kill() again.

Hmmm.  I think you're right.  But mutex.c has the same problem, then.
The wait_event_* macros get this right -- they set the task state before
they check for a signal.

> This is easy to fix, but I wonder if we should change schedule() instead.
> Note that __down_common() does 2 checks,
> 
> 		if (state == TASK_INTERRUPTIBLE && signal_pending(task))
> 			goto interrupted;
> 		if (state == TASK_KILLABLE && fatal_signal_pending(task))
> 			goto interrupted;
> 
> they look very symmetrical, but the first one is OK, and the second is racy.

Oh, because of the special casing in sched.c.  Why not just move the
__set_task_state before the checks for signals pending?  We'd have to
reset to TASK_RUNNING at the 'interrupted:' label, but that's OK.

> Also, I think we have the similar issues with lock_page_killable().

I don't think so because __wait_on_bit_lock sets the state before
checking the 'action' (sync_page_killable).

> How about something like
> 
> 	int signal_pending_state(struct task_struct *tsk)
> 	{
> 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
> 			return 0;
> 		if (signal_pending(tsk))
> 			return 0;
> 
> 		return (state & TASK_INTERRUPTIBLE) ||
> 			__fatal_signal_pending(tsk);
> 	}
> 
> now,
> 
> 	--- kernel/sched.c
> 	+++ kernel/sched.c
> 	@@ -4510,8 +4510,7 @@ need_resched_nonpreemptible:
> 		clear_tsk_need_resched(prev);
> 	 
> 		if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
> 	-		if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
> 	-				signal_pending(prev))) {
> 	+		if (unlikely(signal_pending_state(prev))) {
> 				prev->state = TASK_RUNNING;
> 			} else {
> 				deactivate_task(rq, prev, 1);
> 
> Thoughts?

That might be worth doing anyway, but I'd leave that up to Ingo.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q: down_killable() is racy? or schedule() is not right?
  2008-06-03 12:58 ` Matthew Wilcox
@ 2008-06-03 16:13   ` Oleg Nesterov
  0 siblings, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2008-06-03 16:13 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel

On 06/03, Matthew Wilcox wrote:
>
> On Tue, Jun 03, 2008 at 04:33:09PM +0400, Oleg Nesterov wrote:
> > 
> > Why _irqsave ? we must not do down() with irqs disabled, and of course
> > __down() restores/clears irqs unconditionally.
> 
> How about reading the fine comments?

Thanks,

> I would paste it, but Debian has fucked up my X copy and paste.  Line 13
> of kernel/semaphore.c.
> 
> > 	__down_common(TASK_KILLABLE):
> > 
> > 			if (state == TASK_KILLABLE && fatal_signal_pending(task))
> > 				goto interrupted;
> > 
> > 			/* --- WINDOW --- */
> > 
> > 			__set_task_state(task, TASK_KILLABLE);
> > 			schedule_timeout(timeout);
> > 
> > This looks racy. If SIGKILL comes in the WINDOW above, the event is lost.
> > The task will wait for up() or timeout with the fatal signal pending, and
> > it is not possible to wakeup it via kill() again.
> 
> Hmmm.  I think you're right.  But mutex.c has the same problem, then.

and do_wait_for_common()

> > This is easy to fix, but I wonder if we should change schedule() instead.
> > Note that __down_common() does 2 checks,
> > 
> > 		if (state == TASK_INTERRUPTIBLE && signal_pending(task))
> > 			goto interrupted;
> > 		if (state == TASK_KILLABLE && fatal_signal_pending(task))
> > 			goto interrupted;
> > 
> > they look very symmetrical, but the first one is OK, and the second is racy.
> 
> Oh, because of the special casing in sched.c.  Why not just move the
> __set_task_state before the checks for signals pending?

Yes sure, this all is fixeable (we need set_task_state() of course).

But please compare

	current->state = TASK_INTERRUPTIBLE;
	schedule();
and
	current->state = TASK_KILLABLE;
	schedule();

it seems to me that it is not nice they behave "differently".

> > Also, I think we have the similar issues with lock_page_killable().
>
> I don't think so because __wait_on_bit_lock sets the state before
> checking the 'action' (sync_page_killable).

You are right.

> > 	int signal_pending_state(struct task_struct *tsk)
> > 	{
> > 		if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
> > 			return 0;
> > 		if (signal_pending(tsk))
> > 			return 0;
> > 
> > 		return (state & TASK_INTERRUPTIBLE) ||
> > 			__fatal_signal_pending(tsk);
> > 	}
> > 
> > now,
> > 
> > 	--- kernel/sched.c
> > 	+++ kernel/sched.c
> > 	@@ -4510,8 +4510,7 @@ need_resched_nonpreemptible:
> > 		clear_tsk_need_resched(prev);
> > 	 
> > 		if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
> > 	-		if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
> > 	-				signal_pending(prev))) {
> > 	+		if (unlikely(signal_pending_state(prev))) {
> > 				prev->state = TASK_RUNNING;
> > 			} else {
> > 				deactivate_task(rq, prev, 1);
> > 
> > Thoughts?
> 
> That might be worth doing anyway, but I'd leave that up to Ingo.

Yes, we need Ingo's opinion ;)

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q: down_killable() is racy? or schedule() is not right?
  2008-06-03 12:33 Q: down_killable() is racy? or schedule() is not right? Oleg Nesterov
  2008-06-03 12:58 ` Matthew Wilcox
@ 2008-06-04 11:09 ` Dmitry Adamushko
  2008-06-09 11:43   ` Ingo Molnar
  1 sibling, 1 reply; 5+ messages in thread
From: Dmitry Adamushko @ 2008-06-04 11:09 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Matthew Wilcox, Peter Zijlstra, linux-kernel

2008/6/3 Oleg Nesterov <oleg@tv-sign.ru>:
> I just noticed we have generic semaphores, a couple of questions.
>
>        down():
>
>                spin_lock_irqsave(&sem->lock, flags);
>                ...
>                __down(sem);
>
> Why _irqsave ? we must not do down() with irqs disabled, and of course
> __down() restores/clears irqs unconditionally.
>
>
> Another question,
>
>        __down_common(TASK_KILLABLE):
>
>                        if (state == TASK_KILLABLE && fatal_signal_pending(task))
>                                goto interrupted;
>
>                        /* --- WINDOW --- */
>
>                        __set_task_state(task, TASK_KILLABLE);
>                        schedule_timeout(timeout);
>
> This looks racy. If SIGKILL comes in the WINDOW above, the event is lost.
> The task will wait for up() or timeout with the fatal signal pending, and
> it is not possible to wakeup it via kill() again.
>
> This is easy to fix, but I wonder if we should change schedule() instead.

[ for what it's worth ] I think, you are definitely right here.

The schedule() would be the right place to fix it. At the very least,
because otherwise callers are obliged to always check for
fatal_signal_pending(task) before scheduling with state ==
TASK_KILLABLE. e.g. schedule_timeout_killable().

Not very nice, IMHO.


>        int signal_pending_state(struct task_struct *tsk)
>        {
>                if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
>                        return 0;
>                if (signal_pending(tsk))
>                        return 0;

I guess, it should be ! signal_pending(tsk).


>
>                return (state & TASK_INTERRUPTIBLE) ||
>                        __fatal_signal_pending(tsk);
>        }
>
>                if (state == TASK_INTERRUPTIBLE && signal_pending(task))
>                        goto interrupted;
>                if (state == TASK_KILLABLE && fatal_signal_pending(task))


>
> Oleg.
>

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q: down_killable() is racy? or schedule() is not right?
  2008-06-04 11:09 ` Dmitry Adamushko
@ 2008-06-09 11:43   ` Ingo Molnar
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2008-06-09 11:43 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Oleg Nesterov, Matthew Wilcox, Peter Zijlstra, linux-kernel


* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:

> > This looks racy. If SIGKILL comes in the WINDOW above, the event is 
> > lost. The task will wait for up() or timeout with the fatal signal 
> > pending, and it is not possible to wakeup it via kill() again.
> >
> > This is easy to fix, but I wonder if we should change schedule() 
> > instead.
> 
> [ for what it's worth ] I think, you are definitely right here.
> 
> The schedule() would be the right place to fix it. At the very least, 
> because otherwise callers are obliged to always check for 
> fatal_signal_pending(task) before scheduling with state == 
> TASK_KILLABLE. e.g. schedule_timeout_killable().
> 
> Not very nice, IMHO.

i guess we should fix this in schedule() - is there a patch i could try?

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-06-09 11:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-03 12:33 Q: down_killable() is racy? or schedule() is not right? Oleg Nesterov
2008-06-03 12:58 ` Matthew Wilcox
2008-06-03 16:13   ` Oleg Nesterov
2008-06-04 11:09 ` Dmitry Adamushko
2008-06-09 11:43   ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.