* Re: [PATCH] firewire-sbp2: fix panic after rmmod with slow targets [not found] <1314017561-1976-1-git-send-email-bootc@bootc.net> @ 2011-08-22 14:35 ` Stefan Richter 2011-08-22 20:38 ` Chris Boot 2011-08-22 20:43 ` [PATCH] " Chris Boot 0 siblings, 2 replies; 6+ messages in thread From: Stefan Richter @ 2011-08-22 14:35 UTC (permalink / raw) To: Chris Boot, Tejun Heo; +Cc: linux1394-devel, linux-kernel On Aug 22 Chris Boot wrote: > If firewire-sbp2 starts a login to a target that doesn't complete ORBs > in a timely manner (and has to retry the login), and the module is > removed before the operation times out, you end up with a null-pointer > dereference and a kernel panic. > > This happens because the code in sbp2_remove() just does a > sbp2_target_put(), assuming it will be the last remaining reference. If > there are jobs in the workqueue, this is not the case, and the module is > successfully unloaded while references still exist. The problem is not that sbp2_target_put()'s caller assumes that it is putting the last reference. sbp2_target_put()'s very purpose is to clean up when, and only when, the last reference is gone. The problem is that sbp2_target_get() does not take a reference to the module. scsi_device_get() does. But as long as SCSI Inquiry was not completed and e.g. somebody mounts a filesystem on an SBP-2 attached device, there is no module reference kept by the SCSI core, hence firewire-sbp2 can be unloaded. > This patch cancels pending work for each unit in sbp2_remove(), which > hopefully means there are no extra references around that prevent us > from unloading. This fixes my crash. The fix looks good to me. On thing to watch out for is that sbp2_remove() may be executed by the very same workqueue which also executes lu->work. (This is fw_workqueue = alloc_workqueue("firewire", WQ_NON_REENTRANT | WQ_MEM_RECLAIM, 0).) But AFAIU, the workqueue infrastructure is able to handle this without deadlock since kernel 2.6.36. Tejun, is this understanding correct? > Signed-off-by: Chris Boot <bootc@bootc.net> > --- > drivers/firewire/sbp2.c | 5 +++++ > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c > index 41841a3..3867aaa 100644 > --- a/drivers/firewire/sbp2.c > +++ b/drivers/firewire/sbp2.c > @@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev) > { > struct fw_unit *unit = fw_unit(dev); > struct sbp2_target *tgt = dev_get_drvdata(&unit->device); > + struct sbp2_logical_unit *lu, *next; > + > + list_for_each_entry_safe(lu, next, &tgt->lu_list, link) { > + cancel_delayed_work_sync(&lu->work); > + } > > sbp2_target_put(tgt); > return 0; list_for_each_entry() is sufficient here. You are not changing the list while you iterate over it. Would you please resend the patch with list_for_each_entry()? More importantly, I think the whole sbp2_target instance reference counting can be removed with his work canceling in place. But I have not analyzed this fully yet, and I don't expect you to do this for me. Though if you like to do so, that would of course be welcome. -- Stefan Richter -=====-==-== =--- =-==- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] firewire-sbp2: fix panic after rmmod with slow targets 2011-08-22 14:35 ` [PATCH] firewire-sbp2: fix panic after rmmod with slow targets Stefan Richter @ 2011-08-22 20:38 ` Chris Boot 2011-08-22 22:38 ` Chris Boot 2011-08-22 20:43 ` [PATCH] " Chris Boot 1 sibling, 1 reply; 6+ messages in thread From: Chris Boot @ 2011-08-22 20:38 UTC (permalink / raw) To: Stefan Richter, linux1394-devel, linux-kernel; +Cc: Chris Boot If firewire-sbp2 starts a login to a target that doesn't complete ORBs in a timely manner (and has to retry the login), and the module is removed before the operation times out, you end up with a null-pointer dereference and a kernel panic. This happens because the code in sbp2_remove() just does a sbp2_target_put(), assuming there are no remaining references to the target. If there are jobs in the workqueue, this is not the case, and the module is successfully unloaded while references still exist. This patch cancels pending work for each unit in sbp2_remove(), which hopefully means there are no extra references around that prevent us from unloading. This fixes my crash. Signed-off-by: Chris Boot <bootc@bootc.net> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: <linux1394-devel@lists.sourceforge.net> --- drivers/firewire/sbp2.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c index 41841a3..26397ec 100644 --- a/drivers/firewire/sbp2.c +++ b/drivers/firewire/sbp2.c @@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev) { struct fw_unit *unit = fw_unit(dev); struct sbp2_target *tgt = dev_get_drvdata(&unit->device); + struct sbp2_logical_unit *lu, *next; + + list_for_each_entry(lu, next, &tgt->lu_list, link) { + cancel_delayed_work_sync(&lu->work); + } sbp2_target_put(tgt); return 0; -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] firewire-sbp2: fix panic after rmmod with slow targets 2011-08-22 20:38 ` Chris Boot @ 2011-08-22 22:38 ` Chris Boot 2011-08-22 22:56 ` [PATCH] [v3] " Chris Boot 0 siblings, 1 reply; 6+ messages in thread From: Chris Boot @ 2011-08-22 22:38 UTC (permalink / raw) To: Chris Boot; +Cc: Stefan Richter, linux1394-devel, linux-kernel On 22 Aug 2011, at 21:38, Chris Boot wrote: > + list_for_each_entry(lu, next, &tgt->lu_list, link) { That'll teach me for assuming the define had the same number of arguments, and not compile-testing at least... :-( I'll send rev 3 over shortly. Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] [v3] firewire-sbp2: fix panic after rmmod with slow targets 2011-08-22 22:38 ` Chris Boot @ 2011-08-22 22:56 ` Chris Boot 2011-08-22 23:21 ` Stefan Richter 0 siblings, 1 reply; 6+ messages in thread From: Chris Boot @ 2011-08-22 22:56 UTC (permalink / raw) To: Stefan Richter, linux1394-devel, linux-kernel; +Cc: Chris Boot If firewire-sbp2 starts a login to a target that doesn't complete ORBs in a timely manner (and has to retry the login), and the module is removed before the operation times out, you end up with a null-pointer dereference and a kernel panic. This happens because the code in sbp2_remove() just does a sbp2_target_put(), assuming there are no remaining references to the target. If there are jobs in the workqueue, this is not the case, and the module is successfully unloaded while references still exist. This patch cancels pending work for each unit in sbp2_remove(), which hopefully means there are no extra references around that prevent us from unloading. This fixes my crash. Signed-off-by: Chris Boot <bootc@bootc.net> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: <linux1394-devel@lists.sourceforge.net> --- drivers/firewire/sbp2.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c index 41841a3..260dd23 100644 --- a/drivers/firewire/sbp2.c +++ b/drivers/firewire/sbp2.c @@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev) { struct fw_unit *unit = fw_unit(dev); struct sbp2_target *tgt = dev_get_drvdata(&unit->device); + struct sbp2_logical_unit *lu; + + list_for_each_entry(lu, &tgt->lu_list, link) { + cancel_delayed_work_sync(&lu->work); + } sbp2_target_put(tgt); return 0; -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] [v3] firewire-sbp2: fix panic after rmmod with slow targets 2011-08-22 22:56 ` [PATCH] [v3] " Chris Boot @ 2011-08-22 23:21 ` Stefan Richter 0 siblings, 0 replies; 6+ messages in thread From: Stefan Richter @ 2011-08-22 23:21 UTC (permalink / raw) To: Chris Boot; +Cc: linux1394-devel, linux-kernel On Aug 22 Chris Boot wrote: > --- a/drivers/firewire/sbp2.c > +++ b/drivers/firewire/sbp2.c > @@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev) > { > struct fw_unit *unit = fw_unit(dev); > struct sbp2_target *tgt = dev_get_drvdata(&unit->device); > + struct sbp2_logical_unit *lu; > + > + list_for_each_entry(lu, &tgt->lu_list, link) { > + cancel_delayed_work_sync(&lu->work); > + } > > sbp2_target_put(tgt); > return 0; Thanks. I reproduced the bug, applied your patch, and confirmed that the patch fixes the bug. Committed to linux1394-2.6.git but without the curly braces. -- Stefan Richter -=====-==-== =--- =-=== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] firewire-sbp2: fix panic after rmmod with slow targets 2011-08-22 14:35 ` [PATCH] firewire-sbp2: fix panic after rmmod with slow targets Stefan Richter 2011-08-22 20:38 ` Chris Boot @ 2011-08-22 20:43 ` Chris Boot 1 sibling, 0 replies; 6+ messages in thread From: Chris Boot @ 2011-08-22 20:43 UTC (permalink / raw) To: Stefan Richter; +Cc: Tejun Heo, linux1394-devel, linux-kernel On 22/08/2011 15:35, Stefan Richter wrote: > On Aug 22 Chris Boot wrote: >> If firewire-sbp2 starts a login to a target that doesn't complete ORBs >> in a timely manner (and has to retry the login), and the module is >> removed before the operation times out, you end up with a null-pointer >> dereference and a kernel panic. >> >> This happens because the code in sbp2_remove() just does a >> sbp2_target_put(), assuming it will be the last remaining reference. If >> there are jobs in the workqueue, this is not the case, and the module is >> successfully unloaded while references still exist. > The problem is not that sbp2_target_put()'s caller assumes that it is > putting the last reference. sbp2_target_put()'s very purpose is to clean > up when, and only when, the last reference is gone. Yes that's what I meant by my message, but I guess I wasn't clear enough. I edited it a bit with the second revision, I hope it's a bit clearer now. [snip] >> Signed-off-by: Chris Boot<bootc@bootc.net> >> --- >> drivers/firewire/sbp2.c | 5 +++++ >> 1 files changed, 5 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c >> index 41841a3..3867aaa 100644 >> --- a/drivers/firewire/sbp2.c >> +++ b/drivers/firewire/sbp2.c >> @@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev) >> { >> struct fw_unit *unit = fw_unit(dev); >> struct sbp2_target *tgt = dev_get_drvdata(&unit->device); >> + struct sbp2_logical_unit *lu, *next; >> + >> + list_for_each_entry_safe(lu, next,&tgt->lu_list, link) { >> + cancel_delayed_work_sync(&lu->work); >> + } >> >> sbp2_target_put(tgt); >> return 0; > list_for_each_entry() is sufficient here. You are not changing the list > while you iterate over it. > > Would you please resend the patch with list_for_each_entry()? Done. > More importantly, I think the whole sbp2_target instance reference > counting can be removed with his work canceling in place. But I have not > analyzed this fully yet, and I don't expect you to do this for me. Though > if you like to do so, that would of course be welcome. If you don't mind I'll let you do that as I'm not yet familiar enough with the kernel - only enough to be dangerous! :-) Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-08-22 23:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1314017561-1976-1-git-send-email-bootc@bootc.net>
2011-08-22 14:35 ` [PATCH] firewire-sbp2: fix panic after rmmod with slow targets Stefan Richter
2011-08-22 20:38 ` Chris Boot
2011-08-22 22:38 ` Chris Boot
2011-08-22 22:56 ` [PATCH] [v3] " Chris Boot
2011-08-22 23:21 ` Stefan Richter
2011-08-22 20:43 ` [PATCH] " Chris Boot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox