* Oops from local semaphore race condition
@ 2002-05-21 0:11 Ron Niles
2002-05-21 12:26 ` David Woodhouse
0 siblings, 1 reply; 4+ messages in thread
From: Ron Niles @ 2002-05-21 0:11 UTC (permalink / raw)
To: linux-kernel
Many places in the kernel (mostly drivers, like in scsi_error.c,
scsi_sleep()) have the following construct:
void function(void)
{
DECLARE_MUTEX_LOCKED(sem);
/* let another thread do the work and up() when done */
notify_handler(&sem);
down(&sem);
}
I have used this construct very often and under stress testing my driver got
a few Oops in __up_wakeup, as if the semaphore went corrupt. Then I realized
it can possibly go corrupt, due to a race condition which lets down()
continue before up() is complete:
down decrements sem->counter to -1
up increments sem->counter to 0
down enters __down, but does not sleep since sem->counter has already been
incremented
up enters __up, tries to wake_up(&sem->wait) but sem is now garbage since
function has exited.
I think this race is possible only on SMP. This problem seems to show up
more if I have heavy irq activity which I guess will interrupt down()
between the --sem->counter and __down(), and also interrupt up() between
++sem->counter and __up().
I am interested in some opinions:
(1) does the following up_atomic() function instead of up() indeed fix the
race condition?
(2) it is worthwile to put it into semaphore.c and semaphore.h?
(3) is it worthwhile to fix the condition in scsi modules and other drivers
that do this?
void up_atomic(struct semaphore *sem)
{
spin_lock_irq(&semaphore_lock);
up(sem);
spin_unlock_irq(&semaphore_lock);
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Oops from local semaphore race condition
2002-05-21 0:11 Oops from local semaphore race condition Ron Niles
@ 2002-05-21 12:26 ` David Woodhouse
0 siblings, 0 replies; 4+ messages in thread
From: David Woodhouse @ 2002-05-21 12:26 UTC (permalink / raw)
To: Ron Niles; +Cc: linux-kernel
Ron.Niles@falconstor.com said:
> Then I realized it can possibly go corrupt, due to a race condition
> which lets down() continue before up() is complete:
This is what completions were added for.
--
dwmw2
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Oops from local semaphore race condition
@ 2002-05-21 19:41 Ron Niles
2002-05-21 21:16 ` Bob Miller
0 siblings, 1 reply; 4+ messages in thread
From: Ron Niles @ 2002-05-21 19:41 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-kernel
Ron.Niles@falconstor.com said:
>> Then I realized it can possibly go corrupt, due to a race condition
>> which lets down() continue before up() is complete:
From: David Woodhouse [mailto:dwmw2@infradead.org]
>This is what completions were added for.
Thanks, struct completion is the best way; it's gonna be tough to maintain
backward compatibility though.
One comment; it looks like the implementation in sched.c should more
properly be using wq_write_lock_irqsave on the lock.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Oops from local semaphore race condition
2002-05-21 19:41 Ron Niles
@ 2002-05-21 21:16 ` Bob Miller
0 siblings, 0 replies; 4+ messages in thread
From: Bob Miller @ 2002-05-21 21:16 UTC (permalink / raw)
To: Ron Niles; +Cc: David Woodhouse, linux-kernel
On Tue, May 21, 2002 at 03:41:19PM -0400, Ron Niles wrote:
>
>
> Ron.Niles@falconstor.com said:
> >> Then I realized it can possibly go corrupt, due to a race condition
> >> which lets down() continue before up() is complete:
>
> From: David Woodhouse [mailto:dwmw2@infradead.org]
>
> >This is what completions were added for.
>
> Thanks, struct completion is the best way; it's gonna be tough to maintain
> backward compatibility though.
>
> One comment; it looks like the implementation in sched.c should more
> properly be using wq_write_lock_irqsave on the lock.
>
I sent patches to Linus to fix this back in February. Dave Jones picked
them up and they are still in his tree. I don't know when/if he is going
to forward them onto Linus for inclusion into his tree.
--
Bob Miller Email: rem@osdl.org
Open Source Development Lab Phone: 503.626.2455 Ext. 17
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2002-05-21 21:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-21 0:11 Oops from local semaphore race condition Ron Niles
2002-05-21 12:26 ` David Woodhouse
-- strict thread matches above, loose matches on Subject: below --
2002-05-21 19:41 Ron Niles
2002-05-21 21:16 ` Bob Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox