public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] firewire-sbp2: fix panic after rmmod with slow targets
@ 2011-08-22 13:07 Chris Boot
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Boot @ 2011-08-22 13:07 UTC (permalink / raw)
  To: Stefan Richter, linux1394-devel, linux-kernel; +Cc: Chris Boot

If firewire-sbp2 starts a login to a target that doesn't complete ORBs
in a timely manner (and has to retry the login), and the module is
removed before the operation times out, you end up with a null-pointer
dereference and a kernel panic.

This happens because the code in sbp2_remove() just does a
sbp2_target_put(), assuming it will be the last remaining reference. If
there are jobs in the workqueue, this is not the case, and the module is
successfully unloaded while references still exist.

This patch cancels pending work for each unit in sbp2_remove(), which
hopefully means there are no extra references around that prevent us
from unloading. This fixes my crash.

Signed-off-by: Chris Boot <bootc@bootc.net>
---
 drivers/firewire/sbp2.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index 41841a3..3867aaa 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev)
 {
 	struct fw_unit *unit = fw_unit(dev);
 	struct sbp2_target *tgt = dev_get_drvdata(&unit->device);
+	struct sbp2_logical_unit *lu, *next;
+
+	list_for_each_entry_safe(lu, next, &tgt->lu_list, link) {
+		cancel_delayed_work_sync(&lu->work);
+	}
 
 	sbp2_target_put(tgt);
 	return 0;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] firewire-sbp2: fix panic after rmmod with slow targets
       [not found] <1314017561-1976-1-git-send-email-bootc@bootc.net>
@ 2011-08-22 14:35 ` Stefan Richter
  2011-08-22 20:38   ` Chris Boot
  2011-08-22 20:43   ` [PATCH] " Chris Boot
  0 siblings, 2 replies; 7+ messages in thread
From: Stefan Richter @ 2011-08-22 14:35 UTC (permalink / raw)
  To: Chris Boot, Tejun Heo; +Cc: linux1394-devel, linux-kernel

On Aug 22 Chris Boot wrote:
> If firewire-sbp2 starts a login to a target that doesn't complete ORBs
> in a timely manner (and has to retry the login), and the module is
> removed before the operation times out, you end up with a null-pointer
> dereference and a kernel panic.
> 
> This happens because the code in sbp2_remove() just does a
> sbp2_target_put(), assuming it will be the last remaining reference. If
> there are jobs in the workqueue, this is not the case, and the module is
> successfully unloaded while references still exist.

The problem is not that sbp2_target_put()'s caller assumes that it is
putting the last reference.  sbp2_target_put()'s very purpose is to clean
up when, and only when, the last reference is gone.

The problem is that sbp2_target_get() does not take a reference to the
module.  scsi_device_get() does.  But as long as SCSI Inquiry was not
completed and e.g. somebody mounts a filesystem on an SBP-2 attached
device, there is no module reference kept by the SCSI core, hence
firewire-sbp2 can be unloaded.

> This patch cancels pending work for each unit in sbp2_remove(), which
> hopefully means there are no extra references around that prevent us
> from unloading. This fixes my crash.

The fix looks good to me.  On thing to watch out for is that sbp2_remove()
may be executed by the very same workqueue which also executes lu->work.
(This is fw_workqueue = alloc_workqueue("firewire", WQ_NON_REENTRANT |
WQ_MEM_RECLAIM, 0).)  But AFAIU, the workqueue infrastructure is able to
handle this without deadlock since kernel 2.6.36.  Tejun, is this
understanding correct?

> Signed-off-by: Chris Boot <bootc@bootc.net>
> ---
>  drivers/firewire/sbp2.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
> index 41841a3..3867aaa 100644
> --- a/drivers/firewire/sbp2.c
> +++ b/drivers/firewire/sbp2.c
> @@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev)
>  {
>  	struct fw_unit *unit = fw_unit(dev);
>  	struct sbp2_target *tgt = dev_get_drvdata(&unit->device);
> +	struct sbp2_logical_unit *lu, *next;
> +
> +	list_for_each_entry_safe(lu, next, &tgt->lu_list, link) {
> +		cancel_delayed_work_sync(&lu->work);
> +	}
>  
>  	sbp2_target_put(tgt);
>  	return 0;

list_for_each_entry() is sufficient here.  You are not changing the list
while you iterate over it.

Would you please resend the patch with list_for_each_entry()?

More importantly, I think the whole sbp2_target instance reference
counting can be removed with his work canceling in place.  But I have not
analyzed this fully yet, and I don't expect you to do this for me.  Though
if you like to do so, that would of course be welcome.
-- 
Stefan Richter
-=====-==-== =--- =-==-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] firewire-sbp2: fix panic after rmmod with slow targets
  2011-08-22 14:35 ` [PATCH] firewire-sbp2: fix panic after rmmod with slow targets Stefan Richter
@ 2011-08-22 20:38   ` Chris Boot
  2011-08-22 22:38     ` Chris Boot
  2011-08-22 20:43   ` [PATCH] " Chris Boot
  1 sibling, 1 reply; 7+ messages in thread
From: Chris Boot @ 2011-08-22 20:38 UTC (permalink / raw)
  To: Stefan Richter, linux1394-devel, linux-kernel; +Cc: Chris Boot

If firewire-sbp2 starts a login to a target that doesn't complete ORBs
in a timely manner (and has to retry the login), and the module is
removed before the operation times out, you end up with a null-pointer
dereference and a kernel panic.

This happens because the code in sbp2_remove() just does a
sbp2_target_put(), assuming there are no remaining references to the
target. If there are jobs in the workqueue, this is not the case, and
the module is successfully unloaded while references still exist.

This patch cancels pending work for each unit in sbp2_remove(), which
hopefully means there are no extra references around that prevent us
from unloading. This fixes my crash.

Signed-off-by: Chris Boot <bootc@bootc.net>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <linux1394-devel@lists.sourceforge.net>
---
 drivers/firewire/sbp2.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index 41841a3..26397ec 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev)
 {
 	struct fw_unit *unit = fw_unit(dev);
 	struct sbp2_target *tgt = dev_get_drvdata(&unit->device);
+	struct sbp2_logical_unit *lu, *next;
+
+	list_for_each_entry(lu, next, &tgt->lu_list, link) {
+		cancel_delayed_work_sync(&lu->work);
+	}
 
 	sbp2_target_put(tgt);
 	return 0;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] firewire-sbp2: fix panic after rmmod with slow targets
  2011-08-22 14:35 ` [PATCH] firewire-sbp2: fix panic after rmmod with slow targets Stefan Richter
  2011-08-22 20:38   ` Chris Boot
@ 2011-08-22 20:43   ` Chris Boot
  1 sibling, 0 replies; 7+ messages in thread
From: Chris Boot @ 2011-08-22 20:43 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Tejun Heo, linux1394-devel, linux-kernel

On 22/08/2011 15:35, Stefan Richter wrote:
> On Aug 22 Chris Boot wrote:
>> If firewire-sbp2 starts a login to a target that doesn't complete ORBs
>> in a timely manner (and has to retry the login), and the module is
>> removed before the operation times out, you end up with a null-pointer
>> dereference and a kernel panic.
>>
>> This happens because the code in sbp2_remove() just does a
>> sbp2_target_put(), assuming it will be the last remaining reference. If
>> there are jobs in the workqueue, this is not the case, and the module is
>> successfully unloaded while references still exist.
> The problem is not that sbp2_target_put()'s caller assumes that it is
> putting the last reference.  sbp2_target_put()'s very purpose is to clean
> up when, and only when, the last reference is gone.

Yes that's what I meant by my message, but I guess I wasn't clear 
enough. I edited it a bit with the second revision, I hope it's a bit 
clearer now.

[snip]

>> Signed-off-by: Chris Boot<bootc@bootc.net>
>> ---
>>   drivers/firewire/sbp2.c |    5 +++++
>>   1 files changed, 5 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
>> index 41841a3..3867aaa 100644
>> --- a/drivers/firewire/sbp2.c
>> +++ b/drivers/firewire/sbp2.c
>> @@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev)
>>   {
>>   	struct fw_unit *unit = fw_unit(dev);
>>   	struct sbp2_target *tgt = dev_get_drvdata(&unit->device);
>> +	struct sbp2_logical_unit *lu, *next;
>> +
>> +	list_for_each_entry_safe(lu, next,&tgt->lu_list, link) {
>> +		cancel_delayed_work_sync(&lu->work);
>> +	}
>>
>>   	sbp2_target_put(tgt);
>>   	return 0;
> list_for_each_entry() is sufficient here.  You are not changing the list
> while you iterate over it.
>
> Would you please resend the patch with list_for_each_entry()?

Done.

> More importantly, I think the whole sbp2_target instance reference
> counting can be removed with his work canceling in place.  But I have not
> analyzed this fully yet, and I don't expect you to do this for me.  Though
> if you like to do so, that would of course be welcome.

If you don't mind I'll let you do that as I'm not yet familiar enough 
with the kernel - only enough to be dangerous! :-)

Chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] firewire-sbp2: fix panic after rmmod with slow targets
  2011-08-22 20:38   ` Chris Boot
@ 2011-08-22 22:38     ` Chris Boot
  2011-08-22 22:56       ` [PATCH] [v3] " Chris Boot
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Boot @ 2011-08-22 22:38 UTC (permalink / raw)
  To: Chris Boot; +Cc: Stefan Richter, linux1394-devel, linux-kernel

On 22 Aug 2011, at 21:38, Chris Boot wrote:
> +	list_for_each_entry(lu, next, &tgt->lu_list, link) {

That'll teach me for assuming the define had the same number of arguments, and not compile-testing at least... :-(

I'll send rev 3 over shortly.

Chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] [v3] firewire-sbp2: fix panic after rmmod with slow targets
  2011-08-22 22:38     ` Chris Boot
@ 2011-08-22 22:56       ` Chris Boot
  2011-08-22 23:21         ` Stefan Richter
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Boot @ 2011-08-22 22:56 UTC (permalink / raw)
  To: Stefan Richter, linux1394-devel, linux-kernel; +Cc: Chris Boot

If firewire-sbp2 starts a login to a target that doesn't complete ORBs
in a timely manner (and has to retry the login), and the module is
removed before the operation times out, you end up with a null-pointer
dereference and a kernel panic.

This happens because the code in sbp2_remove() just does a
sbp2_target_put(), assuming there are no remaining references to the
target. If there are jobs in the workqueue, this is not the case, and
the module is successfully unloaded while references still exist.

This patch cancels pending work for each unit in sbp2_remove(), which
hopefully means there are no extra references around that prevent us
from unloading. This fixes my crash.

Signed-off-by: Chris Boot <bootc@bootc.net>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <linux1394-devel@lists.sourceforge.net>
---
 drivers/firewire/sbp2.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index 41841a3..260dd23 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev)
 {
 	struct fw_unit *unit = fw_unit(dev);
 	struct sbp2_target *tgt = dev_get_drvdata(&unit->device);
+	struct sbp2_logical_unit *lu;
+
+	list_for_each_entry(lu, &tgt->lu_list, link) {
+		cancel_delayed_work_sync(&lu->work);
+	}
 
 	sbp2_target_put(tgt);
 	return 0;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] [v3] firewire-sbp2: fix panic after rmmod with slow targets
  2011-08-22 22:56       ` [PATCH] [v3] " Chris Boot
@ 2011-08-22 23:21         ` Stefan Richter
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Richter @ 2011-08-22 23:21 UTC (permalink / raw)
  To: Chris Boot; +Cc: linux1394-devel, linux-kernel

On Aug 22 Chris Boot wrote:
> --- a/drivers/firewire/sbp2.c
> +++ b/drivers/firewire/sbp2.c
> @@ -1198,6 +1198,11 @@ static int sbp2_remove(struct device *dev)
>  {
>  	struct fw_unit *unit = fw_unit(dev);
>  	struct sbp2_target *tgt = dev_get_drvdata(&unit->device);
> +	struct sbp2_logical_unit *lu;
> +
> +	list_for_each_entry(lu, &tgt->lu_list, link) {
> +		cancel_delayed_work_sync(&lu->work);
> +	}
>  
>  	sbp2_target_put(tgt);
>  	return 0;

Thanks.  I reproduced the bug, applied your patch, and confirmed that the
patch fixes the bug.  Committed to linux1394-2.6.git but without the
curly braces.
-- 
Stefan Richter
-=====-==-== =--- =-===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-08-22 23:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1314017561-1976-1-git-send-email-bootc@bootc.net>
2011-08-22 14:35 ` [PATCH] firewire-sbp2: fix panic after rmmod with slow targets Stefan Richter
2011-08-22 20:38   ` Chris Boot
2011-08-22 22:38     ` Chris Boot
2011-08-22 22:56       ` [PATCH] [v3] " Chris Boot
2011-08-22 23:21         ` Stefan Richter
2011-08-22 20:43   ` [PATCH] " Chris Boot
2011-08-22 13:07 Chris Boot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox