* module unload deadlock
@ 2004-02-17 17:26 Andrea Arcangeli
2004-02-18 3:29 ` Rusty Russell
0 siblings, 1 reply; 8+ messages in thread
From: Andrea Arcangeli @ 2004-02-17 17:26 UTC (permalink / raw)
To: Rusty Russell; +Cc: linux-kernel
I've got bugreports for a deadlock due semaphore recursion in the module
unload code of 2.6.
You considered the recursion issue for the ->init but not for the
->exit.
sys_delete_module calls ->exit with the mutex held, cleanup_module calls
request_module(), modprobe reads /proc/modules and it deadlocks on the
mutex. This results in rmmod deadlocking and all the module subsystem
hung.
the actual module doing request_modules in the cleanup handler is
parport_pc, calling parport_enumerate (it calls it for another reason,
and parport enumerate is told to load up a lowlevel driver if none was
present, that's worthless for the unload routine but it's useful for all
other calls of parport_enumerate). It's uncertain if other drivers
triggers this deadlock.
This untested two liner will fix it, it's not obviously safe, but it
looks like safe thanks to the MODULE_STATE_GOING read inside the
critical sections protected by the mutex (like in
strong_try_module_get), and if this is not safe the module loading
probably wouldn't be safe either in presence of ->init failures. If it's
not completely safe still it can be made safe using MODULE_STATE_GOING.
It would be possible to fix this problem also by passing a parameter to
parport_enumerate, to avoid calling request_module if invoked within the
cleanup_module routine, but the below looks more generic and it looks
like the module code is already robust enough to handle such lock-drop,
so I guess it's preferable.
Comments?
--- SLES/kernel/module.c.~1~ 2004-02-12 17:24:42.000000000 +0100
+++ SLES/kernel/module.c 2004-02-17 18:08:05.519670280 +0100
@@ -732,7 +732,9 @@ sys_delete_module(const char __user *nam
wait_for_zero_refcount(mod);
/* Final destruction now noone is using it. */
+ up(&module_mutex);
mod->exit();
+ down(&module_mutex);
free_module(mod);
out:
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: module unload deadlock
2004-02-17 17:26 module unload deadlock Andrea Arcangeli
@ 2004-02-18 3:29 ` Rusty Russell
2004-02-18 4:35 ` viro
0 siblings, 1 reply; 8+ messages in thread
From: Rusty Russell @ 2004-02-18 3:29 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, akpm
In message <20040217172646.GT4478@dualathlon.random> you write:
> I've got bugreports for a deadlock due semaphore recursion in the module
> unload code of 2.6.
>
> You considered the recursion issue for the ->init but not for the
> ->exit.
I considered it, but no cleanup called request_module which I
(reluctantly, as it made the locking non-trivial) dropped the lock
around init.
> the actual module doing request_modules in the cleanup handler is
> parport_pc, calling parport_enumerate (it calls it for another reason,
> and parport enumerate is told to load up a lowlevel driver if none was
> present, that's worthless for the unload routine but it's useful for all
> other calls of parport_enumerate). It's uncertain if other drivers
> triggers this deadlock.
Well, that's pretty stupid, but I think this should be fixed in the
module code for a practical reason: the number of oopses or hangs in
rmmod functions. Bad enough that the module is useless: stopping
other module operations is very suboptimal.
> Comments?
Patch seems OK to me. I'd prefer it to go via the -mm tree though to
shake things out in case I've missed something.
Thanks,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
Name: Drop Lock Around Module Exit
Status: Experimental
From: Andrea Arcangeli <andrea@suse.de>
Turns out parport can do request_module from its exit function. We
should drop lock anyway, because too many module exit functions oops
or hang anyway, rendering all module ops useless.
--- SLES/kernel/module.c.~1~ 2004-02-12 17:24:42.000000000 +0100
+++ SLES/kernel/module.c 2004-02-17 18:08:05.519670280 +0100
@@ -732,7 +732,9 @@ sys_delete_module(const char __user *nam
wait_for_zero_refcount(mod);
/* Final destruction now noone is using it. */
+ up(&module_mutex);
mod->exit();
+ down(&module_mutex);
free_module(mod);
out:
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: module unload deadlock
2004-02-18 3:29 ` Rusty Russell
@ 2004-02-18 4:35 ` viro
2004-02-18 15:40 ` Andrea Arcangeli
0 siblings, 1 reply; 8+ messages in thread
From: viro @ 2004-02-18 4:35 UTC (permalink / raw)
To: Rusty Russell; +Cc: Andrea Arcangeli, linux-kernel, akpm
On Wed, Feb 18, 2004 at 02:29:21PM +1100, Rusty Russell wrote:
> > the actual module doing request_modules in the cleanup handler is
> > parport_pc, calling parport_enumerate (it calls it for another reason,
> > and parport enumerate is told to load up a lowlevel driver if none was
> > present, that's worthless for the unload routine but it's useful for all
> > other calls of parport_enumerate). It's uncertain if other drivers
Bullshit. Other calls of parport_enumerate() must die - along with that one.
Patches will go to akpm tomorrow.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: module unload deadlock
2004-02-18 4:35 ` viro
@ 2004-02-18 15:40 ` Andrea Arcangeli
2004-02-18 16:46 ` viro
0 siblings, 1 reply; 8+ messages in thread
From: Andrea Arcangeli @ 2004-02-18 15:40 UTC (permalink / raw)
To: viro; +Cc: Rusty Russell, linux-kernel, akpm
On Wed, Feb 18, 2004 at 04:35:55AM +0000, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Wed, Feb 18, 2004 at 02:29:21PM +1100, Rusty Russell wrote:
> > > the actual module doing request_modules in the cleanup handler is
> > > parport_pc, calling parport_enumerate (it calls it for another reason,
> > > and parport enumerate is told to load up a lowlevel driver if none was
> > > present, that's worthless for the unload routine but it's useful for all
> > > other calls of parport_enumerate). It's uncertain if other drivers
>
> Bullshit. Other calls of parport_enumerate() must die - along with that one.
> Patches will go to akpm tomorrow.
It's clear this could be fixed by making sure parport won't call
request_module from cleanup_module, the primary reason I fixed it in the
module code is that I don't know if other drivers are doing this, do
you? What parport did was legitimate, and it was working fine in the
past, sure the parport code could be made slightly more complex and
aware about the fact it doesn't worth to try loading the lowlevel module
in cleanup_exit, but it wasn't obviously wrong, the cleanup/init module
are slow paths, it didn't matter if parport tried to load a lowlevel
module there.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: module unload deadlock
2004-02-18 15:40 ` Andrea Arcangeli
@ 2004-02-18 16:46 ` viro
2004-02-18 17:24 ` Andrea Arcangeli
0 siblings, 1 reply; 8+ messages in thread
From: viro @ 2004-02-18 16:46 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Rusty Russell, linux-kernel, akpm
On Wed, Feb 18, 2004 at 04:40:41PM +0100, Andrea Arcangeli wrote:
> On Wed, Feb 18, 2004 at 04:35:55AM +0000, viro@parcelfarce.linux.theplanet.co.uk wrote:
> It's clear this could be fixed by making sure parport won't call
> request_module from cleanup_module, the primary reason I fixed it in the
> module code is that I don't know if other drivers are doing this, do
> you? What parport did was legitimate, and it was working fine in the
> past, sure the parport code could be made slightly more complex and
> aware about the fact it doesn't worth to try loading the lowlevel module
> in cleanup_exit, but it wasn't obviously wrong, the cleanup/init module
> are slow paths, it didn't matter if parport tried to load a lowlevel
> module there.
Sigh...
No, it wasn't legitimate. As the matter of fact, _nothing_ outside of
parport/share.c has any business looking at the list of ports. IOW,
parport_enumerate() should be removed regardless of the request_module()
crap.
In particular, parport_pc should keep track of the ports it had created
instead of messing with parport_enumerate().
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: module unload deadlock
2004-02-18 16:46 ` viro
@ 2004-02-18 17:24 ` Andrea Arcangeli
2004-02-18 17:37 ` viro
0 siblings, 1 reply; 8+ messages in thread
From: Andrea Arcangeli @ 2004-02-18 17:24 UTC (permalink / raw)
To: viro; +Cc: Rusty Russell, linux-kernel, akpm
On Wed, Feb 18, 2004 at 04:46:59PM +0000, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Wed, Feb 18, 2004 at 04:40:41PM +0100, Andrea Arcangeli wrote:
> > On Wed, Feb 18, 2004 at 04:35:55AM +0000, viro@parcelfarce.linux.theplanet.co.uk wrote:
> > It's clear this could be fixed by making sure parport won't call
> > request_module from cleanup_module, the primary reason I fixed it in the
> > module code is that I don't know if other drivers are doing this, do
> > you? What parport did was legitimate, and it was working fine in the
> > past, sure the parport code could be made slightly more complex and
> > aware about the fact it doesn't worth to try loading the lowlevel module
> > in cleanup_exit, but it wasn't obviously wrong, the cleanup/init module
> > are slow paths, it didn't matter if parport tried to load a lowlevel
> > module there.
>
> Sigh...
>
> No, it wasn't legitimate. As the matter of fact, _nothing_ outside of
> parport/share.c has any business looking at the list of ports. IOW,
> parport_enumerate() should be removed regardless of the request_module()
> crap.
>
> In particular, parport_pc should keep track of the ports it had created
> instead of messing with parport_enumerate().
The one you propose (of parport_pc keeping track of the ports by itself)
is a different model, currently it's the highlevel that keeps track of
the ports and each lowlevel registers the lowlevel ports in the
highlevel list of ports. It doesn't mean the current model is wrong. You
may not like it and you may find it less efficient, or less clean, or
whatever, but the current model is definitely legitimate (the parport
code has the troubles you found in the sharing code locking, but this
registration model you're complaining about now is legitimate instead).
But let's ignore parport, the only question is if you know if other
modules are doing the same thing or not. Calling request_module from
cleanup_module was allowed with the 2.4 module API, now it deadlocks.
The only single reason I changed the module code is to avoid other
modules to deadlock in rmmod.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: module unload deadlock
2004-02-18 17:24 ` Andrea Arcangeli
@ 2004-02-18 17:37 ` viro
2004-02-18 17:57 ` Andrea Arcangeli
0 siblings, 1 reply; 8+ messages in thread
From: viro @ 2004-02-18 17:37 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Rusty Russell, linux-kernel, akpm
On Wed, Feb 18, 2004 at 06:24:46PM +0100, Andrea Arcangeli wrote:
> The one you propose (of parport_pc keeping track of the ports by itself)
> is a different model, currently it's the highlevel that keeps track of
> the ports and each lowlevel registers the lowlevel ports in the
> highlevel list of ports. It doesn't mean the current model is wrong. You
> may not like it and you may find it less efficient, or less clean, or
> whatever, but the current model is definitely legitimate (the parport
> code has the troubles you found in the sharing code locking, but this
> registration model you're complaining about now is legitimate instead).
RTFS. And realize that parport_enumerate() exports the damn list with no
protection whatsoever. It is broken and it always had been broken.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: module unload deadlock
2004-02-18 17:37 ` viro
@ 2004-02-18 17:57 ` Andrea Arcangeli
0 siblings, 0 replies; 8+ messages in thread
From: Andrea Arcangeli @ 2004-02-18 17:57 UTC (permalink / raw)
To: viro; +Cc: Rusty Russell, linux-kernel, akpm
On Wed, Feb 18, 2004 at 05:37:41PM +0000, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Wed, Feb 18, 2004 at 06:24:46PM +0100, Andrea Arcangeli wrote:
> > The one you propose (of parport_pc keeping track of the ports by itself)
> > is a different model, currently it's the highlevel that keeps track of
> > the ports and each lowlevel registers the lowlevel ports in the
> > highlevel list of ports. It doesn't mean the current model is wrong. You
> > may not like it and you may find it less efficient, or less clean, or
> > whatever, but the current model is definitely legitimate (the parport
> > code has the troubles you found in the sharing code locking, but this
> > registration model you're complaining about now is legitimate instead).
>
> RTFS. And realize that parport_enumerate() exports the damn list with no
> protection whatsoever. It is broken and it always had been broken.
it has not always had been broken, it was was serialized by the BKL when
I worked on that code the last time ;) (so it wasn't broken in 2.2, just nobody
fixed the locking when the kernel kept evolving)
the smp safety is broken, but the model is not obviously broken as you
claim. the locking of the list could be fixed simply with a
parport_lock spinlock or semaphore or whatever like that, you may prefer
to fix it differently keeping the list local in the lowlevel and
protected it to a private lock to parport_pc instead of making it
visible and exporting hte lock too, I'm just saying that part of the
code was legitimate at least in 2.2.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-02-18 17:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-17 17:26 module unload deadlock Andrea Arcangeli
2004-02-18 3:29 ` Rusty Russell
2004-02-18 4:35 ` viro
2004-02-18 15:40 ` Andrea Arcangeli
2004-02-18 16:46 ` viro
2004-02-18 17:24 ` Andrea Arcangeli
2004-02-18 17:37 ` viro
2004-02-18 17:57 ` Andrea Arcangeli
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.