* [RFC] 2.5 TASK_INTERRUPTIBLE preemption race
@ 2003-04-14 21:54 Joe Korty
2003-04-14 22:00 ` Robert Love
0 siblings, 1 reply; 4+ messages in thread
From: Joe Korty @ 2003-04-14 21:54 UTC (permalink / raw)
To: Andrew Morton, Robert Love, Alan Cox; +Cc: linux-kernel
Hi Andrew, Robert, Alan, Everyone,
The following 2.5 code fragment seems unsafe in a preempt
environment. A preemption could occur between the time when
TASK_UNINTERRUPTIBLE is set and the `spin_lock_irqsave'. This would
cause the task to switch out and never come back, as it would have
been switched away while in TASK_UNINTERRUPTIBLE without yet having
put itself onto the semaphore wait queue, where it could be found
later by a wakeup service.
void __down(struct semaphore * sem)
{
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);
unsigned long flags;
tsk->state = TASK_UNINTERRUPTIBLE;
spin_lock_irqsave(&sem->wait.lock, flags);
add_wait_queue_exclusive_locked(&sem->wait, &wait);
....
}
Is this analysis correct? If it is, perhaps there is an alternative
to fixing these cases individually: make the TASK_INTERRUPTIBLE/
TASK_UNINTERRUPTIBLE states block preemption. In which case the
'set_current_state(TASK_RUNNING)' macro would need to include the
same preemption check as 'preemption_enable'.
I suspect there is already some mechanism in place to prevent this
problem, as I have never seen this lockup happen on any of my
2.4-preempt systems.
Joe
PS: here is an example where the preemption race appears harmless.
If a preemption happens between the 'set_current_state' and
'schedule', it only causes the 'schedule' to NOP: the preemption, on
return, would have changed the state from TASK_UNINTERRUPTIBLE back
to TASK_RUNNING.
void __wait_on_inode(struct inode *inode)
{
DECLARE_WAITQUEUE(wait, current);
wait_queue_head_t *wq = i_waitq_head(inode);
add_wait_queue(wq, &wait);
repeat:
set_current_state(TASK_UNINTERRUPTIBLE);
if (inode->i_state & I_LOCK) {
schedule();
goto repeat;
}
remove_wait_queue(wq, &wait);
__set_current_state(TASK_RUNNING);
}
PPS: the above may need a 'mb()' between the 'add_wait_queue' and
'set_current_state' regardless.
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [RFC] 2.5 TASK_INTERRUPTIBLE preemption race
2003-04-14 21:54 [RFC] 2.5 TASK_INTERRUPTIBLE preemption race Joe Korty
@ 2003-04-14 22:00 ` Robert Love
2003-04-14 22:27 ` Joe Korty
0 siblings, 1 reply; 4+ messages in thread
From: Robert Love @ 2003-04-14 22:00 UTC (permalink / raw)
To: Joe Korty; +Cc: Andrew Morton, Alan Cox, linux-kernel
On Mon, 2003-04-14 at 17:54, Joe Korty wrote:
> Is this analysis correct? If it is, perhaps there is an alternative
> to fixing these cases individually: make the TASK_INTERRUPTIBLE/
> TASK_UNINTERRUPTIBLE states block preemption. In which case the
> 'set_current_state(TASK_RUNNING)' macro would need to include the
> same preemption check as 'preemption_enable'.
Thankfully you are wrong or we would have some serious problems :)
See kernel/sched.c :: preempt_schedule() where we set p->preempt_count
to PREEMPT_ACTIVE.
Then see kernel/sched.c :: schedule() where we short-circuit the
remove-from-runqueue code if PREEMPT_ACTIVE is set.
Thus, it is safe to preempt regardless of the task's state. It will
eventually reschedule.
Robert Love
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] 2.5 TASK_INTERRUPTIBLE preemption race
2003-04-14 22:00 ` Robert Love
@ 2003-04-14 22:27 ` Joe Korty
2003-04-14 22:30 ` Robert Love
0 siblings, 1 reply; 4+ messages in thread
From: Joe Korty @ 2003-04-14 22:27 UTC (permalink / raw)
To: Robert Love; +Cc: Andrew Morton, Alan Cox, linux-kernel
On Mon, Apr 14, 2003 at 06:00:42PM -0400, Robert Love wrote:
> On Mon, 2003-04-14 at 17:54, Joe Korty wrote:
>
> > Is this analysis correct? If it is, perhaps there is an alternative
> > to fixing these cases individually: make the TASK_INTERRUPTIBLE/
> > TASK_UNINTERRUPTIBLE states block preemption. In which case the
> > 'set_current_state(TASK_RUNNING)' macro would need to include the
> > same preemption check as 'preemption_enable'.
>
> Thankfully you are wrong or we would have some serious problems :)
>
> See kernel/sched.c :: preempt_schedule() where we set p->preempt_count
> to PREEMPT_ACTIVE.
>
> Then see kernel/sched.c :: schedule() where we short-circuit the
> remove-from-runqueue code if PREEMPT_ACTIVE is set.
>
> Thus, it is safe to preempt regardless of the task's state. It will
> eventually reschedule.
>
> Robert Love
I see. It is because the 'goto pick_next_task' skips the
'deactivate_task' call. Therefore the previous task remains on the
run queue in spite of its TASK_UNINTERRUPTIBLE state. Clever!
Joe
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] 2.5 TASK_INTERRUPTIBLE preemption race
2003-04-14 22:27 ` Joe Korty
@ 2003-04-14 22:30 ` Robert Love
0 siblings, 0 replies; 4+ messages in thread
From: Robert Love @ 2003-04-14 22:30 UTC (permalink / raw)
To: Joe Korty; +Cc: Andrew Morton, Alan Cox, linux-kernel
On Mon, 2003-04-14 at 18:27, Joe Korty wrote:
> I see. It is because the 'goto pick_next_task' skips the
> 'deactivate_task' call. Therefore the previous task remains on the
> run queue in spite of its TASK_UNINTERRUPTIBLE state. Clever!
Yep :)
We actually did away with it for a bit in mid 2.5... it turned out to
not be worth it. Its a little odd to use the flag like that, but its
quite simply and the results are good.
Means we can safely schedule anytime, no matter the state. There are
tons of other places where we would not want to sleep if it were not for
PREEMPT_ACTIVE.
Robert Love
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-04-14 22:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-14 21:54 [RFC] 2.5 TASK_INTERRUPTIBLE preemption race Joe Korty
2003-04-14 22:00 ` Robert Love
2003-04-14 22:27 ` Joe Korty
2003-04-14 22:30 ` Robert Love
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox