* [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal @ 2007-09-04 20:24 Neil Horman 2007-09-05 15:22 ` Patrick McHardy 0 siblings, 1 reply; 17+ messages in thread From: Neil Horman @ 2007-09-04 20:24 UTC (permalink / raw) To: rusty, adam, jcm, kaber, netfilter-devel, linux-kernel; +Cc: nhorman Hey all- So I've had a deadlock reported to me. I've found that the sequence of events goes like this: 1) process A (modprobe) runs to remove ip_tables.ko 2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket, increasing the ip_tables socket_ops use count 3) process A acquires a file lock on the file ip_tables.ko, calls remove_module in the kernel, which in turn executes the ip_tables module cleanup routine, which calls nf_unregister_sockopt 4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the calling process into uninterruptible sleep, expecting the process using the socket option code to wake it up when it exits the kernel 4) the user of the socket option code (process B) in do_ipt_get_ctl, calls ipt_find_table_lock, which in this case calls request_module to load ip_tables_nat.ko 5) request_module forks a copy of modprobe (process C) to load the module and blocks until modprobe exits. 6) Process C. forked by request_module process the dependencies of ip_tables_nat.ko, of which ip_tables.ko is one. 7) Process C attempts to lock the request module and all its dependencies, it blocks when it attempts to lock ip_tables.ko (which was previously locked in step 3) Theres not really any great permanent solution to this that I can see, but I've developed a two part solution that corrects the problem Part 1) Modifies the nf_sockopt registration code so that, instead of using a use counter internal to the nf_sockopt_ops structure, we instead use a pointer to the registering modules owner to do module reference counting when nf_sockopt calls a modules set/get routine. This prevents the deadlock by preventing set 4 from happening. Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking remove operations (the same way rmmod does), and add an option to explicity request blocking operation. So if you select blocking operation in modprobe you can still cause the above deadlock, but only if you explicity try (and since root can do any old stupid thing it would like.... :) ). The following 2 patches have been tested out by me. Thanks & Regards Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@tuxdriver.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-04 20:24 [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal Neil Horman @ 2007-09-05 15:22 ` Patrick McHardy 2007-09-05 16:13 ` Rusty Russell 0 siblings, 1 reply; 17+ messages in thread From: Patrick McHardy @ 2007-09-05 15:22 UTC (permalink / raw) To: Neil Horman; +Cc: rusty, adam, jcm, netfilter-devel, linux-kernel Neil Horman wrote: > Hey all- > So I've had a deadlock reported to me. I've found that the sequence of > events goes like this: > > 1) process A (modprobe) runs to remove ip_tables.ko > > 2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket, > increasing the ip_tables socket_ops use count > > 3) process A acquires a file lock on the file ip_tables.ko, calls remove_module > in the kernel, which in turn executes the ip_tables module cleanup routine, > which calls nf_unregister_sockopt > > 4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the > calling process into uninterruptible sleep, expecting the process using the > socket option code to wake it up when it exits the kernel > > 4) the user of the socket option code (process B) in do_ipt_get_ctl, calls > ipt_find_table_lock, which in this case calls request_module to load > ip_tables_nat.ko > > 5) request_module forks a copy of modprobe (process C) to load the module and > blocks until modprobe exits. > > 6) Process C. forked by request_module process the dependencies of > ip_tables_nat.ko, of which ip_tables.ko is one. > > 7) Process C attempts to lock the request module and all its dependencies, it > blocks when it attempts to lock ip_tables.ko (which was previously locked in > step 3) > > Theres not really any great permanent solution to this that I can see, but I've > developed a two part solution that corrects the problem > > Part 1) Modifies the nf_sockopt registration code so that, instead of using a > use counter internal to the nf_sockopt_ops structure, we instead use a pointer > to the registering modules owner to do module reference counting when nf_sockopt > calls a modules set/get routine. This prevents the deadlock by preventing set 4 > from happening. > > Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking > remove operations (the same way rmmod does), and add an option to explicity > request blocking operation. So if you select blocking operation in modprobe you > can still cause the above deadlock, but only if you explicity try (and since > root can do any old stupid thing it would like.... :) ). > > The following 2 patches have been tested out by me. Nice catch, we've had a report of this ages ago, but I never figured out what happend. But I'm wondering, wouldn't module refcounting alone fix this problem? If we make nf_sockopt() call try_module_get(ops->owner), remove_module() on ip_tables.ko would simply fail because the refcount is above zero (so it would fail at point 3 above). Am I missing something important? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 15:22 ` Patrick McHardy @ 2007-09-05 16:13 ` Rusty Russell 2007-09-05 17:08 ` Neil Horman 0 siblings, 1 reply; 17+ messages in thread From: Rusty Russell @ 2007-09-05 16:13 UTC (permalink / raw) To: Patrick McHardy; +Cc: Neil Horman, adam, jcm, netfilter-devel, linux-kernel On Wed, 2007-09-05 at 17:22 +0200, Patrick McHardy wrote: > But I'm wondering, wouldn't module refcounting alone fix this problem? > If we make nf_sockopt() call try_module_get(ops->owner), remove_module() > on ip_tables.ko would simply fail because the refcount is above zero > (so it would fail at point 3 above). Am I missing something important? Yes, that seems the correct solution to me, too. ISTR that this code predates the current module code. Rusty. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 16:13 ` Rusty Russell @ 2007-09-05 17:08 ` Neil Horman 2007-09-05 17:41 ` Rusty Russell 2007-09-06 10:33 ` Patrick McHardy 0 siblings, 2 replies; 17+ messages in thread From: Neil Horman @ 2007-09-05 17:08 UTC (permalink / raw) To: Rusty Russell; +Cc: Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Thu, Sep 06, 2007 at 02:13:26AM +1000, Rusty Russell wrote: > On Wed, 2007-09-05 at 17:22 +0200, Patrick McHardy wrote: > > But I'm wondering, wouldn't module refcounting alone fix this problem? > > If we make nf_sockopt() call try_module_get(ops->owner), remove_module() > > on ip_tables.ko would simply fail because the refcount is above zero > > (so it would fail at point 3 above). Am I missing something important? > > Yes, that seems the correct solution to me, too. ISTR that this code > predates the current module code. > > Rusty. Thanks guys- When I first started looking at this problem I would have agreed with you, that module reference counting alone would fix the problem. However, delete_module can work in either a non-blocking or a blocking mode. rmmod passes O_NONBLOCK to delete module, and so is fine, but modprobe does not. So if you currently use modprobe -r to remove modules (as the iptables service script nominally does), modprobe winds up waiting in the kernel for the module reference count to become zero. Since we can hold a reference to the module being removed in the same path that forks a modprobe request to load that same module (which then blocks on the first modprobes fcntl lock), we still get deadlock. The way I fixed this was by use of the second patch, which brings modprobes behavior into line with the rmmod utility (which is to default to non-blocking operation), leading to the remove_module failure and breaking of the deadlock that you describe above. Thanks & Regards Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@tuxdriver.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 17:08 ` Neil Horman @ 2007-09-05 17:41 ` Rusty Russell 2007-09-05 18:19 ` Jon Masters 2007-09-05 19:27 ` Neil Horman 2007-09-06 10:33 ` Patrick McHardy 1 sibling, 2 replies; 17+ messages in thread From: Rusty Russell @ 2007-09-05 17:41 UTC (permalink / raw) To: Neil Horman; +Cc: Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Wed, 2007-09-05 at 13:08 -0400, Neil Horman wrote: > On Thu, Sep 06, 2007 at 02:13:26AM +1000, Rusty Russell wrote: > > On Wed, 2007-09-05 at 17:22 +0200, Patrick McHardy wrote: > > > But I'm wondering, wouldn't module refcounting alone fix this problem? > > > If we make nf_sockopt() call try_module_get(ops->owner), remove_module() > > > on ip_tables.ko would simply fail because the refcount is above zero > > > (so it would fail at point 3 above). Am I missing something important? > > > > Yes, that seems the correct solution to me, too. ISTR that this code > > predates the current module code. > > > > Rusty. > > Thanks guys- > When I first started looking at this problem I would have agreed with > you, that module reference counting alone would fix the problem. However, > delete_module can work in either a non-blocking or a blocking mode. rmmod > passes O_NONBLOCK to delete module, and so is fine, but modprobe does not. Hi Neil, You have this backwards: O_NONBLOCK is the default. That seems to be what everyone wants, although I implemented 'rmmod -w' because it seemed like a good option. Rusty. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 17:41 ` Rusty Russell @ 2007-09-05 18:19 ` Jon Masters 2007-09-05 19:27 ` Neil Horman 1 sibling, 0 replies; 17+ messages in thread From: Jon Masters @ 2007-09-05 18:19 UTC (permalink / raw) To: Rusty Russell Cc: Neil Horman, Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Thu, 2007-09-06 at 03:41 +1000, Rusty Russell wrote: > On Wed, 2007-09-05 at 13:08 -0400, Neil Horman wrote: > > On Thu, Sep 06, 2007 at 02:13:26AM +1000, Rusty Russell wrote: > > > On Wed, 2007-09-05 at 17:22 +0200, Patrick McHardy wrote: > > > > But I'm wondering, wouldn't module refcounting alone fix this problem? > > > > If we make nf_sockopt() call try_module_get(ops->owner), remove_module() > > > > on ip_tables.ko would simply fail because the refcount is above zero > > > > (so it would fail at point 3 above). Am I missing something important? > > > > > > Yes, that seems the correct solution to me, too. ISTR that this code > > > predates the current module code. > > > > > > Rusty. > > > > Thanks guys- > > When I first started looking at this problem I would have agreed with > > you, that module reference counting alone would fix the problem. However, > > delete_module can work in either a non-blocking or a blocking mode. rmmod > > passes O_NONBLOCK to delete module, and so is fine, but modprobe does not. > You have this backwards: O_NONBLOCK is the default. That seems to be > what everyone wants, although I implemented 'rmmod -w' because it seemed > like a good option. :-) Thanks for keeping me copied. I'll think about the in-kernel module situation when I get some time over the weekend - but shout if there's an external impact sooner! :-) Jon. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 17:41 ` Rusty Russell 2007-09-05 18:19 ` Jon Masters @ 2007-09-05 19:27 ` Neil Horman 2007-09-05 20:17 ` Jon Masters ` (2 more replies) 1 sibling, 3 replies; 17+ messages in thread From: Neil Horman @ 2007-09-05 19:27 UTC (permalink / raw) To: Rusty Russell; +Cc: Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Thu, Sep 06, 2007 at 03:41:37AM +1000, Rusty Russell wrote: > On Wed, 2007-09-05 at 13:08 -0400, Neil Horman wrote: > > On Thu, Sep 06, 2007 at 02:13:26AM +1000, Rusty Russell wrote: > > > On Wed, 2007-09-05 at 17:22 +0200, Patrick McHardy wrote: > > > > But I'm wondering, wouldn't module refcounting alone fix this problem? > > > > If we make nf_sockopt() call try_module_get(ops->owner), remove_module() > > > > on ip_tables.ko would simply fail because the refcount is above zero > > > > (so it would fail at point 3 above). Am I missing something important? > > > > > > Yes, that seems the correct solution to me, too. ISTR that this code > > > predates the current module code. > > > > > > Rusty. > > > > Thanks guys- > > When I first started looking at this problem I would have agreed with > > you, that module reference counting alone would fix the problem. However, > > delete_module can work in either a non-blocking or a blocking mode. rmmod > > passes O_NONBLOCK to delete module, and so is fine, but modprobe does not. > > Hi Neil, > > You have this backwards: O_NONBLOCK is the default. That seems to be > what everyone wants, although I implemented 'rmmod -w' because it seemed > like a good option. > Thats really the point I'm trying to make. O_NONBLOCK should be the default, but it isn't in the case of modprobe. If you look at the rmmod sources, O_NONBLOCK is set in the flags field in the main routine, and cleared if the -w option is set. Whereas in the modprobe sources the only call ever made to delete_module passes the O_EXLC flag only, and never sets O_NONBLOCK, resulting in persistent blocking operation. My 2nd patch corrects this by adding a flags variable to modprobe and setting it to O_NONBLOCK|O_EXLC, and clearing it if the (new) -w option is specified to modprobe, just like in rmmod. Now, I suppose its possible that I've not been looking at the right source tree when doing my work. I've based my modprobe patch on this git tree: http://git.kernel.org/?p=linux/kernel/git/kyle/module-init-tools.git;a=summary If theres a newer one, if you could please point it out to me, I'd appreciate it. If the functional equivalent of my second patch is already in there, then we just need my first patch to be good to go. Thanks & Regards Neil > Rusty. > -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@tuxdriver.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 19:27 ` Neil Horman @ 2007-09-05 20:17 ` Jon Masters 2007-09-05 20:51 ` Rusty Russell 2007-09-05 21:39 ` Jon Masters 2 siblings, 0 replies; 17+ messages in thread From: Jon Masters @ 2007-09-05 20:17 UTC (permalink / raw) To: Neil Horman Cc: Rusty Russell, Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Wed, 2007-09-05 at 15:27 -0400, Neil Horman wrote: > Now, I suppose its possible that I've not been looking at the right source tree > when doing my work. I've based my modprobe patch on this git tree: > http://git.kernel.org/?p=linux/kernel/git/kyle/module-init-tools.git;a=summary > If theres a newer one, if you could please point it out to me, I'd appreciate > it. If the functional equivalent of my second patch is already in there, then > we just need my first patch to be good to go. Ah, you want http://www.kerneltools.org/ - there's info on current trees and versions there. I am in the process of moving to kernel.org. I will check the patch situation right now anyway, leave it with me. Jon. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 19:27 ` Neil Horman 2007-09-05 20:17 ` Jon Masters @ 2007-09-05 20:51 ` Rusty Russell 2007-09-05 20:59 ` Jon Masters 2007-09-05 21:39 ` Jon Masters 2 siblings, 1 reply; 17+ messages in thread From: Rusty Russell @ 2007-09-05 20:51 UTC (permalink / raw) To: Neil Horman Cc: Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel, Jim Radford On Wed, 2007-09-05 at 15:27 -0400, Neil Horman wrote: > On Thu, Sep 06, 2007 at 03:41:37AM +1000, Rusty Russell wrote: > > You have this backwards: O_NONBLOCK is the default. That seems to be > > what everyone wants, although I implemented 'rmmod -w' because it seemed > > like a good option. > > > Thats really the point I'm trying to make. O_NONBLOCK should be the default, > but it isn't in the case of modprobe. Ouch, you're right! And that's been around for a *long* time. From the original 0.9.2 version (Dec 9th 2002). ChangeLog says: Jim Radford <radford@blackbean.org>'s modprobe -r implementation. The userspace check still stops most cases of removing used modules. On the bright side, this one bug has probably done more to deprecate module removal than any thing else, by making it unreliable... Good catch! Rusty. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 20:51 ` Rusty Russell @ 2007-09-05 20:59 ` Jon Masters 0 siblings, 0 replies; 17+ messages in thread From: Jon Masters @ 2007-09-05 20:59 UTC (permalink / raw) To: Rusty Russell Cc: Neil Horman, Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel, Jim Radford On Thu, 2007-09-06 at 06:51 +1000, Rusty Russell wrote: > On Wed, 2007-09-05 at 15:27 -0400, Neil Horman wrote: > > On Thu, Sep 06, 2007 at 03:41:37AM +1000, Rusty Russell wrote: > > > You have this backwards: O_NONBLOCK is the default. That seems to be > > > what everyone wants, although I implemented 'rmmod -w' because it seemed > > > like a good option. > > > > > Thats really the point I'm trying to make. O_NONBLOCK should be the default, > > but it isn't in the case of modprobe. > > Ouch, you're right! I've applied, tagged, and updated http://www.kernel.org/pub/linux/kernel/people/jcm/module-init-tools/devel/module-init-tools.git I also moved the remaining files over from kerneltools.org. Will update tarballs later...FWIW, there are a couple of outstanding patches to libify module-init-tools that I'm still looking at before pushing. Jon. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 19:27 ` Neil Horman 2007-09-05 20:17 ` Jon Masters 2007-09-05 20:51 ` Rusty Russell @ 2007-09-05 21:39 ` Jon Masters 2007-09-06 0:17 ` Neil Horman 2007-09-06 12:55 ` Neil Horman 2 siblings, 2 replies; 17+ messages in thread From: Jon Masters @ 2007-09-05 21:39 UTC (permalink / raw) To: Neil Horman Cc: Rusty Russell, Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Wed, 2007-09-05 at 15:27 -0400, Neil Horman wrote: > Now, I suppose its possible that I've not been looking at the right source tree > when doing my work. I've based my modprobe patch on this git tree: > http://git.kernel.org/?p=linux/kernel/git/kyle/module-init-tools.git;a=summary > If theres a newer one, if you could please point it out to me, I'd appreciate > it. If the functional equivalent of my second patch is already in there, then > we just need my first patch to be good to go. Neil, please test: http://www.kernel.org/pub/linux/kernel/people/jcm/module-init-tools/module-init-tools-3.3-pre12.tar.bz2 Jon. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 21:39 ` Jon Masters @ 2007-09-06 0:17 ` Neil Horman 2007-09-06 12:55 ` Neil Horman 1 sibling, 0 replies; 17+ messages in thread From: Neil Horman @ 2007-09-06 0:17 UTC (permalink / raw) To: Jon Masters Cc: Rusty Russell, Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Wed, Sep 05, 2007 at 05:39:11PM -0400, Jon Masters wrote: > On Wed, 2007-09-05 at 15:27 -0400, Neil Horman wrote: > > > Now, I suppose its possible that I've not been looking at the right source tree > > when doing my work. I've based my modprobe patch on this git tree: > > http://git.kernel.org/?p=linux/kernel/git/kyle/module-init-tools.git;a=summary > > If theres a newer one, if you could please point it out to me, I'd appreciate > > it. If the functional equivalent of my second patch is already in there, then > > we just need my first patch to be good to go. > > Neil, please test: > > http://www.kernel.org/pub/linux/kernel/people/jcm/module-init-tools/module-init-tools-3.3-pre12.tar.bz2 > > Jon. > I'll test it first thing in the AM. Thanks! Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@tuxdriver.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 21:39 ` Jon Masters 2007-09-06 0:17 ` Neil Horman @ 2007-09-06 12:55 ` Neil Horman 2007-09-06 13:35 ` Jon Masters 1 sibling, 1 reply; 17+ messages in thread From: Neil Horman @ 2007-09-06 12:55 UTC (permalink / raw) To: Jon Masters Cc: Rusty Russell, Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Wed, Sep 05, 2007 at 05:39:11PM -0400, Jon Masters wrote: > On Wed, 2007-09-05 at 15:27 -0400, Neil Horman wrote: > > > Now, I suppose its possible that I've not been looking at the right source tree > > when doing my work. I've based my modprobe patch on this git tree: > > http://git.kernel.org/?p=linux/kernel/git/kyle/module-init-tools.git;a=summary > > If theres a newer one, if you could please point it out to me, I'd appreciate > > it. If the functional equivalent of my second patch is already in there, then > > we just need my first patch to be good to go. > > Neil, please test: > > http://www.kernel.org/pub/linux/kernel/people/jcm/module-init-tools/module-init-tools-3.3-pre12.tar.bz2 > > Jon. > Good to go. Thanks Jon! Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@tuxdriver.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-06 12:55 ` Neil Horman @ 2007-09-06 13:35 ` Jon Masters 2007-09-06 15:40 ` Neil Horman 0 siblings, 1 reply; 17+ messages in thread From: Jon Masters @ 2007-09-06 13:35 UTC (permalink / raw) To: Neil Horman Cc: Rusty Russell, Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Thu, 2007-09-06 at 08:55 -0400, Neil Horman wrote: > On Wed, Sep 05, 2007 at 05:39:11PM -0400, Jon Masters wrote: > > On Wed, 2007-09-05 at 15:27 -0400, Neil Horman wrote: > > > > > Now, I suppose its possible that I've not been looking at the right source tree > > > when doing my work. I've based my modprobe patch on this git tree: > > > http://git.kernel.org/?p=linux/kernel/git/kyle/module-init-tools.git;a=summary > > > If theres a newer one, if you could please point it out to me, I'd appreciate > > > it. If the functional equivalent of my second patch is already in there, then > > > we just need my first patch to be good to go. > > > > Neil, please test: > > > > http://www.kernel.org/pub/linux/kernel/people/jcm/module-init-tools/module-init-tools-3.3-pre12.tar.bz2 > > > > Jon. > > > Good to go. Thanks Jon! Yeah, except I need to play "listen to Rusty" and "Rusty knows best", then refix it again, after doing so (see his email). Jon. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-06 13:35 ` Jon Masters @ 2007-09-06 15:40 ` Neil Horman 0 siblings, 0 replies; 17+ messages in thread From: Neil Horman @ 2007-09-06 15:40 UTC (permalink / raw) To: Jon Masters Cc: Rusty Russell, Patrick McHardy, adam, jcm, netfilter-devel, linux-kernel On Thu, Sep 06, 2007 at 09:35:32AM -0400, Jon Masters wrote: > On Thu, 2007-09-06 at 08:55 -0400, Neil Horman wrote: > > On Wed, Sep 05, 2007 at 05:39:11PM -0400, Jon Masters wrote: > > > On Wed, 2007-09-05 at 15:27 -0400, Neil Horman wrote: > > > > > > > Now, I suppose its possible that I've not been looking at the right source tree > > > > when doing my work. I've based my modprobe patch on this git tree: > > > > http://git.kernel.org/?p=linux/kernel/git/kyle/module-init-tools.git;a=summary > > > > If theres a newer one, if you could please point it out to me, I'd appreciate > > > > it. If the functional equivalent of my second patch is already in there, then > > > > we just need my first patch to be good to go. > > > > > > Neil, please test: > > > > > > http://www.kernel.org/pub/linux/kernel/people/jcm/module-init-tools/module-init-tools-3.3-pre12.tar.bz2 > > > > > > Jon. > > > > > Good to go. Thanks Jon! > > Yeah, except I need to play "listen to Rusty" and "Rusty knows best", > then refix it again, after doing so (see his email). > Yes, I saw the note, I was really just running it to ensure that my bug was fixed. Rustys comment certainly seems like the right thing to do, but it caused no ill effects for me specifically. Neil > Jon. > -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@tuxdriver.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-05 17:08 ` Neil Horman 2007-09-05 17:41 ` Rusty Russell @ 2007-09-06 10:33 ` Patrick McHardy 2007-09-06 11:08 ` Neil Horman 1 sibling, 1 reply; 17+ messages in thread From: Patrick McHardy @ 2007-09-06 10:33 UTC (permalink / raw) To: Neil Horman; +Cc: Rusty Russell, adam, jcm, netfilter-devel, linux-kernel Neil Horman wrote: > On Thu, Sep 06, 2007 at 02:13:26AM +1000, Rusty Russell wrote: > >>On Wed, 2007-09-05 at 17:22 +0200, Patrick McHardy wrote: >> >>>But I'm wondering, wouldn't module refcounting alone fix this problem? >>>If we make nf_sockopt() call try_module_get(ops->owner), remove_module() >>>on ip_tables.ko would simply fail because the refcount is above zero >>>(so it would fail at point 3 above). Am I missing something important? >> >>Yes, that seems the correct solution to me, too. ISTR that this code >>predates the current module code. >> >>Rusty. > > > Thanks guys- > When I first started looking at this problem I would have agreed with > you, that module reference counting alone would fix the problem. However, > delete_module can work in either a non-blocking or a blocking mode. rmmod > passes O_NONBLOCK to delete module, and so is fine, but modprobe does not. So > if you currently use modprobe -r to remove modules (as the iptables service > script nominally does), modprobe winds up waiting in the kernel for the module > reference count to become zero. Since we can hold a reference to the module > being removed in the same path that forks a modprobe request to load that same > module (which then blocks on the first modprobes fcntl lock), we still get > deadlock. The way I fixed this was by use of the second patch, which brings > modprobes behavior into line with the rmmod utility (which is to default to > non-blocking operation), leading to the remove_module failure and breaking of > the deadlock that you describe above. Thanks for the explanation, I've applied your patch. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal 2007-09-06 10:33 ` Patrick McHardy @ 2007-09-06 11:08 ` Neil Horman 0 siblings, 0 replies; 17+ messages in thread From: Neil Horman @ 2007-09-06 11:08 UTC (permalink / raw) To: Patrick McHardy; +Cc: Rusty Russell, adam, jcm, netfilter-devel, linux-kernel On Thu, Sep 06, 2007 at 12:33:52PM +0200, Patrick McHardy wrote: > Neil Horman wrote: > > On Thu, Sep 06, 2007 at 02:13:26AM +1000, Rusty Russell wrote: > > > >>On Wed, 2007-09-05 at 17:22 +0200, Patrick McHardy wrote: > >> > >>>But I'm wondering, wouldn't module refcounting alone fix this problem? > >>>If we make nf_sockopt() call try_module_get(ops->owner), remove_module() > >>>on ip_tables.ko would simply fail because the refcount is above zero > >>>(so it would fail at point 3 above). Am I missing something important? > >> > >>Yes, that seems the correct solution to me, too. ISTR that this code > >>predates the current module code. > >> > >>Rusty. > > > > > > Thanks guys- > > When I first started looking at this problem I would have agreed with > > you, that module reference counting alone would fix the problem. However, > > delete_module can work in either a non-blocking or a blocking mode. rmmod > > passes O_NONBLOCK to delete module, and so is fine, but modprobe does not. So > > if you currently use modprobe -r to remove modules (as the iptables service > > script nominally does), modprobe winds up waiting in the kernel for the module > > reference count to become zero. Since we can hold a reference to the module > > being removed in the same path that forks a modprobe request to load that same > > module (which then blocks on the first modprobes fcntl lock), we still get > > deadlock. The way I fixed this was by use of the second patch, which brings > > modprobes behavior into line with the rmmod utility (which is to default to > > non-blocking operation), leading to the remove_module failure and breaking of > > the deadlock that you describe above. > > > Thanks for the explanation, I've applied your patch. Thanks Patrick! Neil > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@tuxdriver.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-09-06 15:41 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-04 20:24 [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal Neil Horman 2007-09-05 15:22 ` Patrick McHardy 2007-09-05 16:13 ` Rusty Russell 2007-09-05 17:08 ` Neil Horman 2007-09-05 17:41 ` Rusty Russell 2007-09-05 18:19 ` Jon Masters 2007-09-05 19:27 ` Neil Horman 2007-09-05 20:17 ` Jon Masters 2007-09-05 20:51 ` Rusty Russell 2007-09-05 20:59 ` Jon Masters 2007-09-05 21:39 ` Jon Masters 2007-09-06 0:17 ` Neil Horman 2007-09-06 12:55 ` Neil Horman 2007-09-06 13:35 ` Jon Masters 2007-09-06 15:40 ` Neil Horman 2007-09-06 10:33 ` Patrick McHardy 2007-09-06 11:08 ` Neil Horman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox