* [PATCH 1/1] scsi subsystem : fix function __scsi_device_lookup
@ 2015-09-23 22:37 Zhengping Zhou
0 siblings, 0 replies; 5+ messages in thread
From: Zhengping Zhou @ 2015-09-23 22:37 UTC (permalink / raw)
To: James Bottomley, Matthew R. Ochs
Cc: linux-scsi, Kashyap Desai, Sumit Saxena, Uday Lingala,
linux-kernel, Zhengping Zhou
when a scsi_device is unpluged from scsi controller, if the
scsi_device is still be used by application layer,it won't be
released until users release it. In this case, scsi_device_remove just set
the scsi_device's state to be SDEV_DEL. But if you plug the disk
just before the old scsi_device is released, then there will be two
scsi_device structures in scsi_host->__devices. when the next unpluging
event happens,some low-level drivers will check whether the scsi_device
has been added to host (for example, the megaraid sas series controller)
by calling scsi_device_lookup(call __scsi_device_lookup).
__scsi_device_lookup will return the first scsi_device. Because its
state is SDEV_DEL, the scsi_device_lookup will return NULL finally,
making the low-level driver assume that the scsi_device has been
removed,and won't call scsi_device_remove,which will lead the
failure of hot swap.
Signed-off-by: Zhengping Zhou <johnzzpcrystal@gmail.com>
---
Hi all:
I find a bug about the failure of hot swap when I am using
megaraid sas series controller. Finally I have found that
when controller receives the event of hot swap, it will firstly
check whether the device is added to the system/host by calling
scsi_device_lookup.The logics in function megasas_aen_polling
is as follows:
case MR_EVT_PD_REMOVED:
if (megasas_get_pd_list(instance) == 0) {
for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
for (j = 0;
j < MEGASAS_MAX_DEV_PER_CHANNEL;
j++) {
pd_index =
(i * MEGASAS_MAX_DEV_PER_CHANNEL) + j;
sdev1 = scsi_device_lookup(host, i, j, 0);
if (instance->pd_list[pd_index].driveState
== MR_PD_STATE_SYSTEM) {
if (sdev1)
scsi_device_put(sdev1);
} else {
if (sdev1) {
scsi_remove_device(sdev1);
scsi_device_put(sdev1);
}
}
}
}
}
If the previous scsi_device is not released, this will lead the
appearance of two scsi_devices which correspond with the same disk.
And when the disk is unpluged afterwards, the controller will assume
that this disk has never been added into the system/host. Thus it won't
call scsi_device_remove. When I finish this modification, this problem
is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY
and PCI_DEVICE_ID_LSI_FURY.
Thanks
Zhengping
---
drivers/scsi/scsi.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 207d6a7..5251d6d 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1118,6 +1118,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
struct scsi_device *sdev;
list_for_each_entry(sdev, &shost->__devices, siblings) {
+ if (sdev->sdev_state == SDEV_DEL)
+ continue;
if (sdev->channel == channel && sdev->id == id &&
sdev->lun ==lun)
return sdev;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 1/1] scsi subsystem : fix function __scsi_device_lookup
@ 2015-10-15 7:38 Zhengping Zhou
2015-10-15 7:53 ` Hannes Reinecke
0 siblings, 1 reply; 5+ messages in thread
From: Zhengping Zhou @ 2015-10-15 7:38 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, Zhengping Zhou
when a scsi_device is unpluged from scsi controller, if the
scsi_device is still be used by application layer,it won't be
released until users release it. In this case, scsi_device_remove just set
the scsi_device's state to be SDEV_DEL. But if you plug the disk
just before the old scsi_device is released, then there will be two
scsi_device structures in scsi_host->__devices. when the next unpluging
event happens,some low-level drivers will check whether the scsi_device
has been added to host (for example, the megaraid sas series controller)
by calling scsi_device_lookup(call __scsi_device_lookup).
__scsi_device_lookup will return the first scsi_device. Because its
state is SDEV_DEL, the scsi_device_lookup will return NULL finally,
making the low-level driver assume that the scsi_device has been
removed,and won't call scsi_device_remove,which will lead the
failure of hot swap.
Signed-off-by: Zhengping Zhou <johnzzpcrystal@gmail.com>
---
Hi all:
I'm sorry to bother again,that's my second time to send
this patch.
I find a bug about the failure of hot swap when I am using
megaraid sas series controller. Finally I have found that
when controller receives the event of hot swap, it will firstly
check whether the device is added to the system/host by calling
scsi_device_lookup.The logics in function megasas_aen_polling
is as follows:
case MR_EVT_PD_REMOVED:
if (megasas_get_pd_list(instance) == 0) {
for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
for (j = 0;
j < MEGASAS_MAX_DEV_PER_CHANNEL;
j++) {
pd_index =
(i * MEGASAS_MAX_DEV_PER_CHANNEL) + j;
sdev1 = scsi_device_lookup(host, i, j, 0);
if (instance->pd_list[pd_index].driveState
== MR_PD_STATE_SYSTEM) {
if (sdev1)
scsi_device_put(sdev1);
} else {
if (sdev1) {
scsi_remove_device(sdev1);
scsi_device_put(sdev1);
}
}
}
}
}
If the previous scsi_device is not released, this will lead the
appearance of two scsi_devices which correspond with the same disk.
And when the disk is unpluged afterwards, the controller will assume
that this disk has never been added into the system/host. Thus it won't
call scsi_device_remove. When I finish this modification, this problem
is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY
and PCI_DEVICE_ID_LSI_FURY.
Thanks
Zhengping
---
drivers/scsi/scsi.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 207d6a7..5251d6d 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1118,6 +1118,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
struct scsi_device *sdev;
list_for_each_entry(sdev, &shost->__devices, siblings) {
+ if (sdev->sdev_state == SDEV_DEL)
+ continue;
if (sdev->channel == channel && sdev->id == id &&
sdev->lun ==lun)
return sdev;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] scsi subsystem : fix function __scsi_device_lookup
2015-10-15 7:38 Zhengping Zhou
@ 2015-10-15 7:53 ` Hannes Reinecke
2015-10-15 8:22 ` Zhengping Zhou
0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2015-10-15 7:53 UTC (permalink / raw)
To: Zhengping Zhou, James Bottomley; +Cc: linux-scsi
On 10/15/2015 09:38 AM, Zhengping Zhou wrote:
> when a scsi_device is unpluged from scsi controller, if the
> scsi_device is still be used by application layer,it won't be
> released until users release it. In this case, scsi_device_remove just set
> the scsi_device's state to be SDEV_DEL. But if you plug the disk
> just before the old scsi_device is released, then there will be two
> scsi_device structures in scsi_host->__devices. when the next unpluging
> event happens,some low-level drivers will check whether the scsi_device
> has been added to host (for example, the megaraid sas series controller)
> by calling scsi_device_lookup(call __scsi_device_lookup).
> __scsi_device_lookup will return the first scsi_device. Because its
> state is SDEV_DEL, the scsi_device_lookup will return NULL finally,
> making the low-level driver assume that the scsi_device has been
> removed,and won't call scsi_device_remove,which will lead the
> failure of hot swap.
> Signed-off-by: Zhengping Zhou <johnzzpcrystal@gmail.com>
> ---
> Hi all:
> I'm sorry to bother again,that's my second time to send
> this patch.
> I find a bug about the failure of hot swap when I am using
> megaraid sas series controller. Finally I have found that
> when controller receives the event of hot swap, it will firstly
> check whether the device is added to the system/host by calling
> scsi_device_lookup.The logics in function megasas_aen_polling
> is as follows:
> case MR_EVT_PD_REMOVED:
> if (megasas_get_pd_list(instance) == 0) {
> for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
> for (j = 0;
> j < MEGASAS_MAX_DEV_PER_CHANNEL;
> j++) {
>
> pd_index =
> (i * MEGASAS_MAX_DEV_PER_CHANNEL) + j;
>
> sdev1 = scsi_device_lookup(host, i, j, 0);
>
> if (instance->pd_list[pd_index].driveState
> == MR_PD_STATE_SYSTEM) {
> if (sdev1)
> scsi_device_put(sdev1);
> } else {
> if (sdev1) {
> scsi_remove_device(sdev1);
> scsi_device_put(sdev1);
> }
> }
> }
> }
> }
> If the previous scsi_device is not released, this will lead the
> appearance of two scsi_devices which correspond with the same disk.
> And when the disk is unpluged afterwards, the controller will assume
> that this disk has never been added into the system/host. Thus it won't
> call scsi_device_remove. When I finish this modification, this problem
> is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY
> and PCI_DEVICE_ID_LSI_FURY.
> Thanks
> Zhengping
> ---
> drivers/scsi/scsi.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 207d6a7..5251d6d 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -1118,6 +1118,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
> struct scsi_device *sdev;
>
> list_for_each_entry(sdev, &shost->__devices, siblings) {
> + if (sdev->sdev_state == SDEV_DEL)
> + continue;
> if (sdev->channel == channel && sdev->id == id &&
> sdev->lun ==lun)
> return sdev;
>
Ho-hum.
So lookup will return NULL, which then will cause the subsequent
functions to assume the scsi_device is not present, right?
And if you're _really_ unlucky it'll continue to add this device
(with the same LUN, target, bus, and host number!) to the list,
resulting in us having _two_ devices with the same number on the list.
Happy lookup.
I guess this calls for the lock rework from Johannes ...
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] scsi subsystem : fix function __scsi_device_lookup
2015-10-15 7:53 ` Hannes Reinecke
@ 2015-10-15 8:22 ` Zhengping Zhou
2015-10-16 12:08 ` Zhengping Zhou
0 siblings, 1 reply; 5+ messages in thread
From: Zhengping Zhou @ 2015-10-15 8:22 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: linux-scsi, Jej B
Yes
But it's interesting ,when you continue to add this device (with the
same LUN, target, bus, and host number!) to the list,scsi_add_device
will call scsi_probe_and_add_lun,and it will call
scsi_device_lookup_by_target(starget, lun) to recheck its existence
finally.but scsi_device_lookup_by_target doesn't has the problem what
I mentioned in this patch .So it's OK when you add this device again.
Forgive my poor English!
2015-10-15 15:53 GMT+08:00 Hannes Reinecke <hare@suse.de>:
> On 10/15/2015 09:38 AM, Zhengping Zhou wrote:
>> when a scsi_device is unpluged from scsi controller, if the
>> scsi_device is still be used by application layer,it won't be
>> released until users release it. In this case, scsi_device_remove just set
>> the scsi_device's state to be SDEV_DEL. But if you plug the disk
>> just before the old scsi_device is released, then there will be two
>> scsi_device structures in scsi_host->__devices. when the next unpluging
>> event happens,some low-level drivers will check whether the scsi_device
>> has been added to host (for example, the megaraid sas series controller)
>> by calling scsi_device_lookup(call __scsi_device_lookup).
>> __scsi_device_lookup will return the first scsi_device. Because its
>> state is SDEV_DEL, the scsi_device_lookup will return NULL finally,
>> making the low-level driver assume that the scsi_device has been
>> removed,and won't call scsi_device_remove,which will lead the
>> failure of hot swap.
>> Signed-off-by: Zhengping Zhou <johnzzpcrystal@gmail.com>
>> ---
>> Hi all:
>> I'm sorry to bother again,that's my second time to send
>> this patch.
>> I find a bug about the failure of hot swap when I am using
>> megaraid sas series controller. Finally I have found that
>> when controller receives the event of hot swap, it will firstly
>> check whether the device is added to the system/host by calling
>> scsi_device_lookup.The logics in function megasas_aen_polling
>> is as follows:
>> case MR_EVT_PD_REMOVED:
>> if (megasas_get_pd_list(instance) == 0) {
>> for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
>> for (j = 0;
>> j < MEGASAS_MAX_DEV_PER_CHANNEL;
>> j++) {
>>
>> pd_index =
>> (i * MEGASAS_MAX_DEV_PER_CHANNEL) + j;
>>
>> sdev1 = scsi_device_lookup(host, i, j, 0);
>>
>> if (instance->pd_list[pd_index].driveState
>> == MR_PD_STATE_SYSTEM) {
>> if (sdev1)
>> scsi_device_put(sdev1);
>> } else {
>> if (sdev1) {
>> scsi_remove_device(sdev1);
>> scsi_device_put(sdev1);
>> }
>> }
>> }
>> }
>> }
>> If the previous scsi_device is not released, this will lead the
>> appearance of two scsi_devices which correspond with the same disk.
>> And when the disk is unpluged afterwards, the controller will assume
>> that this disk has never been added into the system/host. Thus it won't
>> call scsi_device_remove. When I finish this modification, this problem
>> is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY
>> and PCI_DEVICE_ID_LSI_FURY.
>> Thanks
>> Zhengping
>> ---
>> drivers/scsi/scsi.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>> index 207d6a7..5251d6d 100644
>> --- a/drivers/scsi/scsi.c
>> +++ b/drivers/scsi/scsi.c
>> @@ -1118,6 +1118,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>> struct scsi_device *sdev;
>>
>> list_for_each_entry(sdev, &shost->__devices, siblings) {
>> + if (sdev->sdev_state == SDEV_DEL)
>> + continue;
>> if (sdev->channel == channel && sdev->id == id &&
>> sdev->lun ==lun)
>> return sdev;
>>
> Ho-hum.
>
> So lookup will return NULL, which then will cause the subsequent
> functions to assume the scsi_device is not present, right?
>
> And if you're _really_ unlucky it'll continue to add this device
> (with the same LUN, target, bus, and host number!) to the list,
> resulting in us having _two_ devices with the same number on the list.
>
> Happy lookup.
>
> I guess this calls for the lock rework from Johannes ...
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> hare@suse.de +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] scsi subsystem : fix function __scsi_device_lookup
2015-10-15 8:22 ` Zhengping Zhou
@ 2015-10-16 12:08 ` Zhengping Zhou
0 siblings, 0 replies; 5+ messages in thread
From: Zhengping Zhou @ 2015-10-16 12:08 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: linux-scsi, Jej B
2015-10-15 16:22 GMT+08:00 Zhengping Zhou <johnzzpcrystal@gmail.com>:
> Yes
>
> But it's interesting ,when you continue to add this device (with the
> same LUN, target, bus, and host number!) to the list,scsi_add_device
> will call scsi_probe_and_add_lun,and it will call
> scsi_device_lookup_by_target(starget, lun) to recheck its existence
> finally.but scsi_device_lookup_by_target doesn't has the problem what
> I mentioned in this patch .So it's OK when you add this device again.
>
> Forgive my poor English!
> 2015-10-15 15:53 GMT+08:00 Hannes Reinecke <hare@suse.de>:
>> On 10/15/2015 09:38 AM, Zhengping Zhou wrote:
>>> when a scsi_device is unpluged from scsi controller, if the
>>> scsi_device is still be used by application layer,it won't be
>>> released until users release it. In this case, scsi_device_remove just set
>>> the scsi_device's state to be SDEV_DEL. But if you plug the disk
>>> just before the old scsi_device is released, then there will be two
>>> scsi_device structures in scsi_host->__devices. when the next unpluging
>>> event happens,some low-level drivers will check whether the scsi_device
>>> has been added to host (for example, the megaraid sas series controller)
>>> by calling scsi_device_lookup(call __scsi_device_lookup).
>>> __scsi_device_lookup will return the first scsi_device. Because its
>>> state is SDEV_DEL, the scsi_device_lookup will return NULL finally,
>>> making the low-level driver assume that the scsi_device has been
>>> removed,and won't call scsi_device_remove,which will lead the
>>> failure of hot swap.
>>> Signed-off-by: Zhengping Zhou <johnzzpcrystal@gmail.com>
>>> ---
>>> Hi all:
>>> I'm sorry to bother again,that's my second time to send
>>> this patch.
>>> I find a bug about the failure of hot swap when I am using
>>> megaraid sas series controller. Finally I have found that
>>> when controller receives the event of hot swap, it will firstly
>>> check whether the device is added to the system/host by calling
>>> scsi_device_lookup.The logics in function megasas_aen_polling
>>> is as follows:
>>> case MR_EVT_PD_REMOVED:
>>> if (megasas_get_pd_list(instance) == 0) {
>>> for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
>>> for (j = 0;
>>> j < MEGASAS_MAX_DEV_PER_CHANNEL;
>>> j++) {
>>>
>>> pd_index =
>>> (i * MEGASAS_MAX_DEV_PER_CHANNEL) + j;
>>>
>>> sdev1 = scsi_device_lookup(host, i, j, 0);
>>>
>>> if (instance->pd_list[pd_index].driveState
>>> == MR_PD_STATE_SYSTEM) {
>>> if (sdev1)
>>> scsi_device_put(sdev1);
>>> } else {
>>> if (sdev1) {
>>> scsi_remove_device(sdev1);
>>> scsi_device_put(sdev1);
>>> }
>>> }
>>> }
>>> }
>>> }
>>> If the previous scsi_device is not released, this will lead the
>>> appearance of two scsi_devices which correspond with the same disk.
>>> And when the disk is unpluged afterwards, the controller will assume
>>> that this disk has never been added into the system/host. Thus it won't
>>> call scsi_device_remove. When I finish this modification, this problem
>>> is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY
>>> and PCI_DEVICE_ID_LSI_FURY.
>>> Thanks
>>> Zhengping
>>> ---
>>> drivers/scsi/scsi.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>>> index 207d6a7..5251d6d 100644
>>> --- a/drivers/scsi/scsi.c
>>> +++ b/drivers/scsi/scsi.c
>>> @@ -1118,6 +1118,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>>> struct scsi_device *sdev;
>>>
>>> list_for_each_entry(sdev, &shost->__devices, siblings) {
>>> + if (sdev->sdev_state == SDEV_DEL)
>>> + continue;
>>> if (sdev->channel == channel && sdev->id == id &&
>>> sdev->lun ==lun)
>>> return sdev;
>>>
>> Ho-hum.
>>
>> So lookup will return NULL, which then will cause the subsequent
>> functions to assume the scsi_device is not present, right?
>>
>> And if you're _really_ unlucky it'll continue to add this device
>> (with the same LUN, target, bus, and host number!) to the list,
>> resulting in us having _two_ devices with the same number on the list.
>>
>> Happy lookup.
>>
>> I guess this calls for the lock rework from Johannes ...
>>
>> Cheers,
>>
>> Hannes
>> --
>> Dr. Hannes Reinecke zSeries & Storage
>> hare@suse.de +49 911 74053 688
>> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
>> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
>> HRB 21284 (AG Nürnberg)
You can compare the function __scsi_device_lookup_by_target and the
function __scsi_device_lookup.
It will be more easy to figure out this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-10-16 12:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-23 22:37 [PATCH 1/1] scsi subsystem : fix function __scsi_device_lookup Zhengping Zhou
-- strict thread matches above, loose matches on Subject: below --
2015-10-15 7:38 Zhengping Zhou
2015-10-15 7:53 ` Hannes Reinecke
2015-10-15 8:22 ` Zhengping Zhou
2015-10-16 12:08 ` Zhengping Zhou
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).