public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* oops when removing sbp2 module
@ 2003-10-05 13:12 Paul Mackerras
  2003-10-05 14:49 ` Patrick Mansfield
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Mackerras @ 2003-10-05 13:12 UTC (permalink / raw)
  To: linux-kernel, linux-scsi, linux1394-devel

I'm getting an oops inside the sysfs stuff when I try to remove the
sbp2 (firewire disk) module.  I am running Linus' current BK tree as
of yesterday, i.e. 2.6.0-test6 (plus).  I'm not sure whether the
problem is in sysfs, kobject, scsi or ieee1394 stuff, which is why I'm
posting this to 3 lists.

I have a 40GB disk in a firewire enclosure.  To use it, I insert the
ohci1394 and sbp2 modules, and the disk appears as a SCSI disk.
If I then remove the sbp2 module I get an oops from a null pointer
dereference in sysfs_hash_and_remove.  At that point dir->d_inode is
NULL.  It turns out that sysfs_remove_dir has already been called for
the directory and that is why dir->d_inode is NULL.

In fact what is happening is that class_device_unregister gets called
twice for the same classdev.  This is because scsi_remove_device gets
called twice for the same device.  The first time, the call chain
looks like this:

scsi_remove_device
sbp2_remove_device
sbp2_remove
device_release_driver
driver_detach
bus_remove_driver
driver_unregister
hpsb_unregister_protocol
sbp2_module_exit

and the second time it looks like this:

scsi_remove_device
scsi_forget_host
scsi_remove_host
sbp2_remove_host
hpsb_unregister_highlevel
sbp2_module_exit

So, is this a reference counting problem on the classdev, a problem
where the scsi layer doesn't remove the scsi device from its internal
lists properly in scsi_remove_device, or a problem in the sbp2 code?
Anyone got a fix?

Thanks,
Paul.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: oops when removing sbp2 module
  2003-10-05 13:12 oops when removing sbp2 module Paul Mackerras
@ 2003-10-05 14:49 ` Patrick Mansfield
  2003-10-06  3:08   ` Paul Mackerras
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick Mansfield @ 2003-10-05 14:49 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linux-kernel, linux-scsi, linux1394-devel

On Sun, Oct 05, 2003 at 11:12:18PM +1000, Paul Mackerras wrote:

> In fact what is happening is that class_device_unregister gets called
> twice for the same classdev.  This is because scsi_remove_device gets
> called twice for the same device.  The first time, the call chain
> looks like this:
>
> scsi_remove_device
> sbp2_remove_device
> sbp2_remove
> device_release_driver
> driver_detach
> bus_remove_driver
> driver_unregister
> hpsb_unregister_protocol
> sbp2_module_exit
> 
> and the second time it looks like this:
> 
> scsi_remove_device
> scsi_forget_host
> scsi_remove_host
> sbp2_remove_host
> hpsb_unregister_highlevel
> sbp2_module_exit
> 
> So, is this a reference counting problem on the classdev, a problem
> where the scsi layer doesn't remove the scsi device from its internal
> lists properly in scsi_remove_device, or a problem in the sbp2 code?
> Anyone got a fix?
> 
> Thanks,
> Paul.

Paul -

There is a typo on a change to the access_count check. 

Try this patch:

--- bleed-2.5/drivers/scsi/scsi_sysfs.c-orig	Mon Sep 29 12:21:09 2003
+++ bleed-2.5/drivers/scsi/scsi_sysfs.c	Mon Sep 29 16:19:22 2003
@@ -412,7 +412,7 @@
 		set_bit(SDEV_DEL, &sdev->sdev_state);
 		if (sdev->host->hostt->slave_destroy)
 			sdev->host->hostt->slave_destroy(sdev);
-		if (atomic_read(&sdev->access_count))
+		if (!atomic_read(&sdev->access_count))
 			device_del(&sdev->sdev_gendev);
 		up_write(&class->subsys.rwsem);
 	}

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: oops when removing sbp2 module
  2003-10-05 14:49 ` Patrick Mansfield
@ 2003-10-06  3:08   ` Paul Mackerras
  2003-10-06  3:36     ` Patrick Mansfield
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Mackerras @ 2003-10-06  3:08 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-kernel, linux-scsi, linux1394-devel

Patrick Mansfield writes:

> There is a typo on a change to the access_count check. 
> 
> Try this patch:
> 
> --- bleed-2.5/drivers/scsi/scsi_sysfs.c-orig	Mon Sep 29 12:21:09 2003
> +++ bleed-2.5/drivers/scsi/scsi_sysfs.c	Mon Sep 29 16:19:22 2003
> @@ -412,7 +412,7 @@
>  		set_bit(SDEV_DEL, &sdev->sdev_state);
>  		if (sdev->host->hostt->slave_destroy)
>  			sdev->host->hostt->slave_destroy(sdev);
> -		if (atomic_read(&sdev->access_count))
> +		if (!atomic_read(&sdev->access_count))
>  			device_del(&sdev->sdev_gendev);
>  		up_write(&class->subsys.rwsem);
>  	}

That fixes it, it no longer oopses on removing sbp2.  As before I get
a message saying "Device 'fw-host0' does not have a release()
function, it is broken and must be fixed."  I assume that is a problem
with the sbp2 module.

The code in the patch looks a little worrying to me, though.  Is there
some lock we have taken to ensure that no other process could be
modifying sdev->access_count at the same time?  Also, what is to stop
some other process from noticing that sdev->access_count is 0 and
calling device_del(&sdev->sdev_gendev) ?

Regards,
Paul.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: oops when removing sbp2 module
  2003-10-06  3:08   ` Paul Mackerras
@ 2003-10-06  3:36     ` Patrick Mansfield
  0 siblings, 0 replies; 4+ messages in thread
From: Patrick Mansfield @ 2003-10-06  3:36 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linux-kernel, linux-scsi, linux1394-devel

On Mon, Oct 06, 2003 at 01:08:27PM +1000, Paul Mackerras wrote:

> That fixes it, it no longer oopses on removing sbp2.  As before I get
> a message saying "Device 'fw-host0' does not have a release()
> function, it is broken and must be fixed."  I assume that is a problem
> with the sbp2 module.
> 
> The code in the patch looks a little worrying to me, though.  Is there
> some lock we have taken to ensure that no other process could be
> modifying sdev->access_count at the same time?  Also, what is to stop
> some other process from noticing that sdev->access_count is 0 and
> calling device_del(&sdev->sdev_gendev) ?

Yes, it's a known problem, there is also a comment in the code, Christoph
was working on it.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-10-06  3:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-05 13:12 oops when removing sbp2 module Paul Mackerras
2003-10-05 14:49 ` Patrick Mansfield
2003-10-06  3:08   ` Paul Mackerras
2003-10-06  3:36     ` Patrick Mansfield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox