Ian Kent wrote: > On Wed, 24 Sep 2003, Arjan van de Ven wrote: > > >>On Wed, 2003-09-24 at 15:01, Ian Kent wrote: >> >>>This is a corrected patch for the autofs4 daedlock problem I posted about >>>@@ -206,6 +207,11 @@ >>> >>> interruptible_sleep_on(&wq->queue); >>> >>>+ if (waitqueue_active(&wq->queue) && current != wq->owner) { >>>+ set_current_state(TASK_INTERRUPTIBLE); >>>+ schedule_timeout(wq->wait_ctr * (HZ/10)); >>>+ } >>>+ >> >>this really really looks like you're trying to pamper over a bug by >>changing the timing somewhere instead of fixing it... > > > Agreed. > > >>also are you sure the deadlock isn't because of the racey use of >>interruptible_sleep_on ? >> I think the deadlock itself needs to be properly identified. Could you explain where the deadlock is actually occuring? I briefed over the automount 4 code as well as autofs4 and I don't see the deadlock. The 'owner' in the case of an expiry will be a child process of the daemon, within a call to ioctl(EXPIRE_MULTI), correct? Having it be released from the waitqueue first should not affect flow of execution and released from deadlock. I don't see how having it wake up before before any other racing processes solves anything. I think Arjan is right in that the race is do to the nautilus process entering the sleep_on after the a call to wake_up(&wq->queue). I don't know if a change to using a workqueue is best.. how about refactoring that chunk of code to use wait_event_interruptible on the queue, which should be clear of any waitqueue/sleep_on races. > > > OK so maybe I should have suggestions instead of comments. > > Please elaborate. > How about you try out this quick patch I threw together. Mike Waychison