* [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport
@ 2015-10-30 20:01 Benjamin Rood
2015-11-02 8:29 ` Jack Wang
2015-11-10 0:44 ` Martin K. Petersen
0 siblings, 2 replies; 4+ messages in thread
From: Benjamin Rood @ 2015-10-30 20:01 UTC (permalink / raw)
To: linux-scsi, xjtuwjp, James.Bottomley; +Cc: Benjamin Rood
Previously, when this module was unloaded via 'rmmod' with at least one
drive attached, the SCSI error handler thread would become stuck in an
infinite recovery loop and lockup the system, necessitating a reboot.
Once the SAS layer is detached, the driver will fail any subsequent
commands since the target devices are removed. However, removing the
SCSI host generates a SYNCHRONIZE CACHE (10) command, which was failed
and left the error handler no method of recovery.
This patch simply removes the SCSI host first so that no more commands
can come down, prior to cleaning up the SAS layer. Note that the stack
is built up with the SCSI host first, and then the SAS layer. Perhaps
it should be reversed for symmetry, so that commands cannot be sent to
the pm80xx driver prior to attaching the SAS layer?
What was really strange about this bug was that it was introduced at
commit cff549e4860f ("[SCSI]: proper state checking and module refcount
handling in scsi_device_get"). This commit appears to tinker with how
the reference counting is performed for SCSI device objects. My theory
is that prior to this commit, the refcount for a device object was
blindly incremented at some point during the teardown process which
coincidentially made the device stick around during the procedure, which
also coincidentially made any commands sent to the driver not fail
(since the device was technically still "there"). After this commit was
applied, my theory is the refcount for the device object is not being
incremented at a specific point anymore, which makes the device go away,
and thus made the pm80xx driver fail any subsequent commands.
You may also want to see the following for more details:
[1] http://www.spinics.net/lists/linux-scsi/msg37208.html
[2] http://marc.info/?l=linux-scsi&m=144416476406993&w=2
Signed-off-by: Benjamin Rood <brood@attotech.com>
---
drivers/scsi/pm8001/pm8001_init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
index bdc624f..b1b2dd7 100644
--- a/drivers/scsi/pm8001/pm8001_init.c
+++ b/drivers/scsi/pm8001/pm8001_init.c
@@ -1096,10 +1096,10 @@ static void pm8001_pci_remove(struct pci_dev *pdev)
struct pm8001_hba_info *pm8001_ha;
int i, j;
pm8001_ha = sha->lldd_ha;
+ scsi_remove_host(pm8001_ha->shost);
sas_unregister_ha(sha);
sas_remove_host(pm8001_ha->shost);
list_del(&pm8001_ha->list);
- scsi_remove_host(pm8001_ha->shost);
PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF);
PM8001_CHIP_DISP->chip_soft_rst(pm8001_ha);
--
2.4.3
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport
2015-10-30 20:01 [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport Benjamin Rood
@ 2015-11-02 8:29 ` Jack Wang
2015-11-03 13:14 ` Christoph Hellwig
2015-11-10 0:44 ` Martin K. Petersen
1 sibling, 1 reply; 4+ messages in thread
From: Jack Wang @ 2015-11-02 8:29 UTC (permalink / raw)
To: Benjamin Rood
Cc: linux-scsi@vger.kernel.org, Jej B, Benjamin Rood, Hannes Reinecke,
Christoph Hellwig
2015-10-30 21:01 GMT+01:00 Benjamin Rood <benjaminjrood@gmail.com>:
> Previously, when this module was unloaded via 'rmmod' with at least one
> drive attached, the SCSI error handler thread would become stuck in an
> infinite recovery loop and lockup the system, necessitating a reboot.
>
> Once the SAS layer is detached, the driver will fail any subsequent
> commands since the target devices are removed. However, removing the
> SCSI host generates a SYNCHRONIZE CACHE (10) command, which was failed
> and left the error handler no method of recovery.
>
> This patch simply removes the SCSI host first so that no more commands
> can come down, prior to cleaning up the SAS layer. Note that the stack
> is built up with the SCSI host first, and then the SAS layer. Perhaps
> it should be reversed for symmetry, so that commands cannot be sent to
> the pm80xx driver prior to attaching the SAS layer?
>
> What was really strange about this bug was that it was introduced at
> commit cff549e4860f ("[SCSI]: proper state checking and module refcount
> handling in scsi_device_get"). This commit appears to tinker with how
> the reference counting is performed for SCSI device objects. My theory
> is that prior to this commit, the refcount for a device object was
> blindly incremented at some point during the teardown process which
> coincidentially made the device stick around during the procedure, which
> also coincidentially made any commands sent to the driver not fail
> (since the device was technically still "there"). After this commit was
> applied, my theory is the refcount for the device object is not being
> incremented at a specific point anymore, which makes the device go away,
> and thus made the pm80xx driver fail any subsequent commands.
>
> You may also want to see the following for more details:
>
> [1] http://www.spinics.net/lists/linux-scsi/msg37208.html
> [2] http://marc.info/?l=linux-scsi&m=144416476406993&w=2
>
> Signed-off-by: Benjamin Rood <brood@attotech.com>
> ---
> drivers/scsi/pm8001/pm8001_init.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
> index bdc624f..b1b2dd7 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -1096,10 +1096,10 @@ static void pm8001_pci_remove(struct pci_dev *pdev)
> struct pm8001_hba_info *pm8001_ha;
> int i, j;
> pm8001_ha = sha->lldd_ha;
> + scsi_remove_host(pm8001_ha->shost);
> sas_unregister_ha(sha);
> sas_remove_host(pm8001_ha->shost);
> list_del(&pm8001_ha->list);
> - scsi_remove_host(pm8001_ha->shost);
> PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF);
> PM8001_CHIP_DISP->chip_soft_rst(pm8001_ha);
>
> --
> 2.4.3
>
Thanks for digging it down. Looks good to me.
Seems other libsas based driver all affected, maybe we should also fix them?
Christoph (cc-ed), could you share your opinion about this fix, as the
commit cff549e4860f is from you?
Regards,
Jack
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport
2015-11-02 8:29 ` Jack Wang
@ 2015-11-03 13:14 ` Christoph Hellwig
0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2015-11-03 13:14 UTC (permalink / raw)
To: Jack Wang
Cc: Benjamin Rood, linux-scsi@vger.kernel.org, Jej B, Benjamin Rood,
Hannes Reinecke, Christoph Hellwig
On Mon, Nov 02, 2015 at 09:29:13AM +0100, Jack Wang wrote:
> Thanks for digging it down. Looks good to me.
>
> Seems other libsas based driver all affected, maybe we should also fix them?
> Christoph (cc-ed), could you share your opinion about this fix, as the
> commit cff549e4860f is from you?
scsi_remove_host should be done first in ->remove and the commit
was written under that assumption. So it looks like the other drivers
will need fixes as well.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport
2015-10-30 20:01 [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport Benjamin Rood
2015-11-02 8:29 ` Jack Wang
@ 2015-11-10 0:44 ` Martin K. Petersen
1 sibling, 0 replies; 4+ messages in thread
From: Martin K. Petersen @ 2015-11-10 0:44 UTC (permalink / raw)
To: Benjamin Rood; +Cc: linux-scsi, xjtuwjp, James.Bottomley, Benjamin Rood
>>>>> "Benjamin" == Benjamin Rood <benjaminjrood@gmail.com> writes:
Benjamin> Previously, when this module was unloaded via 'rmmod' with at
Benjamin> least one drive attached, the SCSI error handler thread would
Benjamin> become stuck in an infinite recovery loop and lockup the
Benjamin> system, necessitating a reboot.
Applied.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-11-10 0:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-30 20:01 [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport Benjamin Rood
2015-11-02 8:29 ` Jack Wang
2015-11-03 13:14 ` Christoph Hellwig
2015-11-10 0:44 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox