Brian King wrote: > The following patch fixes a race condition in sg > of sg_cmd_done racing with sg_release. I've seen this > bug hit several times on test machines and the following > patch fixes it. The race is that if srp->done is set > and the waiting thread gets a spurious wakeup immediately > afterwards, then the waiting thread can end up executing > and completing, then getting closed, freeing sfp before > the wake_up_interruptible is called, which then will result > in an oops. The oops is fixed by locking around the > setting srp->done to 1 and the wake_up, and also locking > around the checking of srp->done, which guarantees that > the wake_up_interruptible will always occur before the > sleeping thread gets a chance to run. Brian, Thanks for this analysis; I'm still trying to get my head around it :-) These "spurious wakeups" are a worry. Does your testing produce them or do they occur randomly? The critical part of your patch seems to be the write lock in sg_cmd_done() making sure srp->done is set _and_ wake_up_interruptible() gets executed before the __wait_event_interruptible macro in sg_read() or sg_remove(_sfp) can proceed. I thought about making srp->done atomic (of type atomic_t) but that does not seem to be sufficient. So I'm happy for James to apply this patch. An attempt to consolidate... Almost all of the changes to sg that were discussed in the "2.6.6-rc3 ia64 smp_call_function() called with interrupts disabled" thread (about a month ago) are now in lk 2.6.7-rc2. The janitors have also been at work with "__user" type modifiers. Some changes that Patrick Mansfield proposed are not in rc2: - up max number of devices to 32K - revert vmalloc()s to kmalloc()s since the former are problematic in big configurations The attached patch merges Brian patches with those proposed by Patrick. The patch is against lk 2.6.7-rc2 (and sg hasn't changed between rc2 and rc2-bk5). Doug Gilbert