From: Dmitry Torokhov <dtor@insightbb.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>,
linux1394-devel@lists.sourceforge.net,
Greg Kroah-Hartman <gregkh@suse.de>,
linux-kernel@vger.kernel.org
Subject: Re: deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394"
Date: Sun, 19 Nov 2006 22:41:28 -0500 [thread overview]
Message-ID: <200611192241.30740.dtor@insightbb.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.0611191431120.15059-100000@netrider.rowland.org>
On Sunday 19 November 2006 14:54, Alan Stern wrote:
> On Sun, 19 Nov 2006, Stefan Richter wrote:
>
> > I wrote:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=6706 :
> > ...
> > > Right now I don't see a sane fix but I will have a few nights sleep over
> > > it...
> >
> > A couple of reboots and a slightly charred pizza later I found out the
> > following:
> >
> > 1. If Alan's 2.6.16 patch is reverted, the deadlock is gone as expected.
> > See bugzilla for the reverting patch.
> >
> > 2. The following patch works too, without the need to revert Alan's
> > driver core changes.
> >
> > 3. Now that I have an at least unsane (sic) fix for the deadlock, a new
> > bug in eth1394's remove code was revealed. This is a separate issue
> > and logged as http://bugzilla.kernel.org/show_bug.cgi?id=7550 .
> >
> > Please comment on the patch below.
> >
> >
> > From: Stefan Richter <stefanr@s5r6.in-berlin.de>
> > Subject: ieee1394: nodemgr: fix deadlock in shutdown
> >
> > If "modprobe ohci1394" was quickly followed by "modprobe -r ohci1394",
> > say with 1 second pause in between, the modprobe -r got stuck in
> > uninterruptible sleep in kthread_stop. At the same time the knodemgrd
> > slept uninterruptibly in bus_rescan_devices_helper.
> >
> > This was a regression since Linux 2.6.16,
> > commit bf74ad5bc41727d5f2f1c6bedb2c1fac394de731
> > "Hold the device's parent's lock during probe and remove"
> >
> > The fix lets ieee1394's nodemgr temporarily counteract the driver core's
> > downed parent->sem. Thus bus_rescan_devices_helper can proceed and
> > knodemgrd terminates properly.
> >
> > Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
> > ---
> > drivers/ieee1394/nodemgr.c | 11 +++++++++++
> > 1 files changed, 11 insertions(+)
> >
> > Index: linux-2.6.19-rc4/drivers/ieee1394/nodemgr.c
> > ===================================================================
> > --- linux-2.6.19-rc4.orig/drivers/ieee1394/nodemgr.c 2006-11-18 23:31:35.000000000 +0100
> > +++ linux-2.6.19-rc4/drivers/ieee1394/nodemgr.c 2006-11-19 15:14:50.000000000 +0100
> > @@ -1873,8 +1873,19 @@ static void nodemgr_remove_host(struct h
> > {
> > struct host_info *hi = hpsb_get_hostinfo(&nodemgr_highlevel, host);
> >
> > + /* Here comes a potential deadlock. A "modprobe -r ohci1394" calls
> > + * nodemgr_remove_host from driver_detach which takes the parent->sem.
> > + * Meanwhile, knodemgrd may be running into bus_rescan_devices_helper
> > + * which would block on the same semaphore. Therefore lift the
> > + * semaphore until knodemgrd exited. */
> > if (hi) {
> > + /* up(&host->device.sem); --- apparently not required */
> > + if (host->device.parent)
> > + up(&host->device.parent->sem);
> > kthread_stop(hi->thread);
> > + if (host->device.parent)
> > + down(&host->device.parent->sem);
> > + /* down(&host->device.sem); --- apparently not required */
> > nodemgr_remove_host_dev(&host->device);
> > }
> > }
>
> Obviously this patch isn't pretty. It's also incorrect, because it
> reacquires the parent's semaphore while holding the child's -- that's
> another recipe for deadlock.
>
> Knowing nothing at all about ieee1394, I get the feeling that the culprit
> here is a strange subsystem design. In fact, I don't understand exactly
> what's going wrong. Evidently the rmmod thread owns the locks for both
> the host being removed and its parent, and it wants to stop knodemgrd,
> which is waiting to acquire the host's parent's lock because it is
> attempting to rescan the parent. Is that right?
>
I was actually looking at the driver core recently and was surprised that
USB crapped all over it with its locking requirements. I don't think its a
good idea to enforce one subsystem's lcoking rules onto everyone.
Would there be alot of objections if we add bus->[un]lock_device() and
hide all uglies there? If methods are not set when bus is registered then
standard dev->sem lock/unlock will be used.
--
Dmitry
next prev parent reply other threads:[~2006-11-20 3:41 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-19 0:12 deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394" Stefan Richter
2006-11-19 14:23 ` Stefan Richter
2006-11-19 19:54 ` Alan Stern
2006-11-19 21:55 ` Stefan Richter
2006-11-20 19:31 ` Alan Stern
2006-11-20 20:45 ` Stefan Richter
2006-11-20 21:28 ` Alan Stern
2006-11-21 0:21 ` Stefan Richter
2006-11-21 0:49 ` Alan Stern
2006-11-21 8:05 ` Stefan Richter
2006-11-21 21:50 ` [PATCH] ieee1394: nodemgr: fix deadlock in shutdown Stefan Richter
2006-11-22 2:02 ` Alan Stern
2006-11-22 20:09 ` Stefan Richter
2006-11-20 3:41 ` Dmitry Torokhov [this message]
2006-11-20 19:18 ` deadlock in "modprobe -r ohci1394" shortly after "modprobe ohci1394" Alan Stern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200611192241.30740.dtor@insightbb.com \
--to=dtor@insightbb.com \
--cc=gregkh@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux1394-devel@lists.sourceforge.net \
--cc=stefanr@s5r6.in-berlin.de \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox