public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Torokhov <dtor@insightbb.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>,
	linux1394-devel@lists.sourceforge.net,
	Greg Kroah-Hartman <gregkh@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394"
Date: Sun, 19 Nov 2006 22:41:28 -0500	[thread overview]
Message-ID: <200611192241.30740.dtor@insightbb.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.0611191431120.15059-100000@netrider.rowland.org>

On Sunday 19 November 2006 14:54, Alan Stern wrote:
> On Sun, 19 Nov 2006, Stefan Richter wrote:
> 
> > I wrote:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=6706 :
> > ...
> > > Right now I don't see a sane fix but I will have a few nights sleep over
> > > it...
> > 
> > A couple of reboots and a slightly charred pizza later I found out the
> > following:
> > 
> > 1. If Alan's 2.6.16 patch is reverted, the deadlock is gone as expected.
> >    See bugzilla for the reverting patch.
> > 
> > 2. The following patch works too, without the need to revert Alan's
> >    driver core changes.
> > 
> > 3. Now that I have an at least unsane (sic) fix for the deadlock, a new
> >    bug in eth1394's remove code was revealed. This is a separate issue
> >    and logged as http://bugzilla.kernel.org/show_bug.cgi?id=7550 .
> > 
> > Please comment on the patch below.
> > 
> > 
> > From: Stefan Richter <stefanr@s5r6.in-berlin.de>
> > Subject: ieee1394: nodemgr: fix deadlock in shutdown
> > 
> > If "modprobe ohci1394" was quickly followed by "modprobe -r ohci1394",
> > say with 1 second pause in between, the modprobe -r got stuck in
> > uninterruptible sleep in kthread_stop.  At the same time the knodemgrd
> > slept uninterruptibly in bus_rescan_devices_helper.
> > 
> > This was a regression since Linux 2.6.16,
> > 	commit bf74ad5bc41727d5f2f1c6bedb2c1fac394de731
> > 	"Hold the device's parent's lock during probe and remove"
> > 
> > The fix lets ieee1394's nodemgr temporarily counteract the driver core's
> > downed parent->sem.  Thus bus_rescan_devices_helper can proceed and
> > knodemgrd terminates properly.
> > 
> > Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
> > ---
> >  drivers/ieee1394/nodemgr.c |   11 +++++++++++
> >  1 files changed, 11 insertions(+)
> > 
> > Index: linux-2.6.19-rc4/drivers/ieee1394/nodemgr.c
> > ===================================================================
> > --- linux-2.6.19-rc4.orig/drivers/ieee1394/nodemgr.c	2006-11-18 23:31:35.000000000 +0100
> > +++ linux-2.6.19-rc4/drivers/ieee1394/nodemgr.c	2006-11-19 15:14:50.000000000 +0100
> > @@ -1873,8 +1873,19 @@ static void nodemgr_remove_host(struct h
> >  {
> >  	struct host_info *hi = hpsb_get_hostinfo(&nodemgr_highlevel, host);
> >  
> > +	/* Here comes a potential deadlock. A "modprobe -r ohci1394" calls
> > +	 * nodemgr_remove_host from driver_detach which takes the parent->sem.
> > +	 * Meanwhile, knodemgrd may be running into bus_rescan_devices_helper
> > +	 * which would block on the same semaphore. Therefore lift the
> > +	 * semaphore until knodemgrd exited. */
> >  	if (hi) {
> > +		/* up(&host->device.sem);	--- apparently not required */
> > +		if (host->device.parent)
> > +			up(&host->device.parent->sem);
> >  		kthread_stop(hi->thread);
> > +		if (host->device.parent)
> > +			down(&host->device.parent->sem);
> > +		/* down(&host->device.sem);	--- apparently not required */
> >  		nodemgr_remove_host_dev(&host->device);
> >  	}
> >  }
> 
> Obviously this patch isn't pretty.  It's also incorrect, because it 
> reacquires the parent's semaphore while holding the child's -- that's 
> another recipe for deadlock.
> 
> Knowing nothing at all about ieee1394, I get the feeling that the culprit
> here is a strange subsystem design.  In fact, I don't understand exactly
> what's going wrong.  Evidently the rmmod thread owns the locks for both
> the host being removed and its parent, and it wants to stop knodemgrd,
> which is waiting to acquire the host's parent's lock because it is
> attempting to rescan the parent.  Is that right?
> 

I was actually looking at the driver core recently and was surprised that
USB crapped all over it with its locking requirements. I don't think its a
good idea to enforce one subsystem's lcoking rules onto everyone.

Would there be alot of objections if we add bus->[un]lock_device() and
hide all uglies there? If methods are not set when bus is registered then
standard dev->sem lock/unlock will be used.

-- 
Dmitry

  parent reply	other threads:[~2006-11-20  3:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-19  0:12 deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394" Stefan Richter
2006-11-19 14:23 ` Stefan Richter
2006-11-19 19:54   ` Alan Stern
2006-11-19 21:55     ` Stefan Richter
2006-11-20 19:31       ` Alan Stern
2006-11-20 20:45         ` Stefan Richter
2006-11-20 21:28           ` Alan Stern
2006-11-21  0:21             ` Stefan Richter
2006-11-21  0:49               ` Alan Stern
2006-11-21  8:05                 ` Stefan Richter
2006-11-21 21:50                   ` [PATCH] ieee1394: nodemgr: fix deadlock in shutdown Stefan Richter
2006-11-22  2:02                     ` Alan Stern
2006-11-22 20:09                       ` Stefan Richter
2006-11-20  3:41     ` Dmitry Torokhov [this message]
2006-11-20 19:18       ` deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394" Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200611192241.30740.dtor@insightbb.com \
    --to=dtor@insightbb.com \
    --cc=gregkh@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux1394-devel@lists.sourceforge.net \
    --cc=stefanr@s5r6.in-berlin.de \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox