public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Oops from local semaphore race condition
@ 2002-05-21  0:11 Ron Niles
  2002-05-21 12:26 ` David Woodhouse
  0 siblings, 1 reply; 4+ messages in thread
From: Ron Niles @ 2002-05-21  0:11 UTC (permalink / raw)
  To: linux-kernel


Many places in the kernel (mostly drivers, like in scsi_error.c,
scsi_sleep()) have the following construct:

void function(void)
{
	DECLARE_MUTEX_LOCKED(sem);

	/* let another thread do the work and up() when done */
	notify_handler(&sem);
	down(&sem);
}

I have used this construct very often and under stress testing my driver got
a few Oops in __up_wakeup, as if the semaphore went corrupt. Then I realized
it can possibly go corrupt, due to a race condition which lets down()
continue before up() is complete:

down decrements sem->counter to -1
   up increments sem->counter to 0
down enters __down, but does not sleep since sem->counter has already been
incremented
   up enters __up, tries to wake_up(&sem->wait) but sem is now garbage since
function has exited.

I think this race is possible only on SMP. This problem seems to show up
more if I have heavy irq activity which I guess will interrupt down()
between the --sem->counter and __down(), and also interrupt up() between
++sem->counter and __up().

I am interested in some opinions:
(1) does the following up_atomic() function instead of up() indeed fix the
race condition?
(2) it is worthwile to put it into semaphore.c and semaphore.h?
(3) is it worthwhile to fix the condition in scsi modules and other drivers
that do this?

void up_atomic(struct semaphore *sem)
{
	spin_lock_irq(&semaphore_lock);
	up(sem);		
	spin_unlock_irq(&semaphore_lock);
}


^ permalink raw reply	[flat|nested] 4+ messages in thread
* RE: Oops from local semaphore race condition
@ 2002-05-21 19:41 Ron Niles
  2002-05-21 21:16 ` Bob Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Ron Niles @ 2002-05-21 19:41 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-kernel



Ron.Niles@falconstor.com said:
>> Then I realized it can possibly go corrupt, due to a race condition
>> which lets down() continue before up() is complete:

From: David Woodhouse [mailto:dwmw2@infradead.org]

>This is what completions were added for.

Thanks, struct completion is the best way; it's gonna be tough to maintain
backward compatibility though.

One comment; it looks like the implementation in sched.c should more
properly be using wq_write_lock_irqsave on the lock.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-05-21 21:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-21  0:11 Oops from local semaphore race condition Ron Niles
2002-05-21 12:26 ` David Woodhouse
  -- strict thread matches above, loose matches on Subject: below --
2002-05-21 19:41 Ron Niles
2002-05-21 21:16 ` Bob Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox