From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Anderson Subject: Re: scsi_forget_host() and scsi_remove_device() Date: Thu, 3 Jul 2003 15:19:40 -0700 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <20030703221940.GA5983@beaverton.ibm.com> References: <20030627110259.A3751@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from e32.co.us.ibm.com ([32.97.110.130]:26258 "EHLO e32.co.us.ibm.com") by vger.kernel.org with ESMTP id S265439AbTGCWDO (ORCPT ); Thu, 3 Jul 2003 18:03:14 -0400 Content-Disposition: inline In-Reply-To: List-Id: linux-scsi@vger.kernel.org To: Alan Stern Cc: Christoph Hellwig , SCSI development list Alan Stern [stern@rowland.harvard.edu] wrote: > There's a real problem about the way scsi_forget_host() calls > scsi_remove_device() for each device on the host's bus. The problem is > that scsi_remove_device() unregisters the device in sysfs, which unbinds > the device's driver. This happens immediately, without waiting for the > reference count to be 0. So if the device is open (mounted, for example) > when the host is unplugged, the filesystem will have a dangling reference > to the unbound driver. Of course this will most likely cause a segfault > when the user attempts to unmount the device. > > I don't know what the right way is to attack this problem, or what you're > planning to do about it. One approach would be somehow to prevent any new > references to the device from being created while waiting for all the > existing references to go away. But that doesn't seem feasible. > > How are you going to address this problem? > > Alan Stern > Yes, this is an issue. I became more of issue one we started using the LDM driver probe / remove functions. I am still cleaning up a patch set for hosts and devices so it is not quite ready for review yet (I got pulled off on to some other issues for a few days). For scsi devices here is a quick snap shot of what I trying to test. scsi_device_register: calls device_initialize on sdev_driverfs_dev calls class_device_initialize on sdev_classdev calls device_add on sdev_driverfs_dev calls class_device_add on sdev_classdev calls get_device on sdev_driverfs_dev scsi_device_unregister: sets a state the scsi_device indicating "delete" calls class_device_unregister on sdev_classdev scsi_device_cls_release: calls put_device on sdev_driverfs_dev scsi_device_dev_release: calls device_del sdev_driverfs_dev (I have a hack in here to ensure device_add was previously called). calls scsi_free_sdev scsi_device_get: if device state says get ok calls get_device on sdev_driverfs_dev (I still have access_count on try_module_get in the function). scsi_device_put: calls put_device on sdev_driverfs_dev (I still have access_count on try_module_put in the function). The gets are only stopped for callers of scsi_device_get. If a get_device is called from outside the subsystem I cannot stop this though it would not cause bad things to happen just a longer time to cleanup. A core flag would most likely be needed to stop this from happening. -andmike -- Michael Anderson andmike@us.ibm.com