* [PATCH] fc_user_scan correction
@ 2008-04-22 17:28 James Smart
2008-04-22 18:10 ` Mike Christie
0 siblings, 1 reply; 7+ messages in thread
From: James Smart @ 2008-04-22 17:28 UTC (permalink / raw)
To: linux-scsi
Way back when, when the fc_user_scan routine was created, it kept some
of its original logic that walked the rport list and kicked off a scan.
Unfortunately, it didn't keep any of the locking around the rport list,
nor did it consider the synchronous nature of the scan invoked. The result,
there are some scan requests where the rport list changes, thus a subsequent
scan is called on a bogus rport structure and the system NMI's.
The fix is to defer to the midlayer scan routine and not deal with locking
in the user scan function.
Patch was cut against scsi-misc-2.6
-- james s
Signed-off-by: James Smart <james.smart@emulex.com>
---
scsi_scan.c | 1 +
scsi_transport_fc.c | 21 +--------------------
2 files changed, 2 insertions(+), 20 deletions(-)
diff -upNr a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
--- a/drivers/scsi/scsi_scan.c 2008-04-22 13:09:23.000000000 -0400
+++ b/drivers/scsi/scsi_scan.c 2008-04-22 13:10:02.000000000 -0400
@@ -1669,6 +1669,7 @@ int scsi_scan_host_selected(struct Scsi_
return 0;
}
+EXPORT_SYMBOL(scsi_scan_host_selected);
static void scsi_sysfs_add_devices(struct Scsi_Host *shost)
{
diff -upNr a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
--- a/drivers/scsi/scsi_transport_fc.c 2008-04-22 12:51:22.000000000 -0400
+++ b/drivers/scsi/scsi_transport_fc.c 2008-04-22 12:53:30.000000000 -0400
@@ -1929,29 +1929,10 @@ fc_timed_out(struct scsi_cmnd *scmd)
return EH_NOT_HANDLED;
}
-/*
- * Must be called with shost->host_lock held
- */
static int fc_user_scan(struct Scsi_Host *shost, uint channel,
uint id, uint lun)
{
- struct fc_rport *rport;
-
- list_for_each_entry(rport, &fc_host_rports(shost), peers) {
- if (rport->scsi_target_id == -1)
- continue;
-
- if (rport->port_state != FC_PORTSTATE_ONLINE)
- continue;
-
- if ((channel == SCAN_WILD_CARD || channel == rport->channel) &&
- (id == SCAN_WILD_CARD || id == rport->scsi_target_id)) {
- scsi_scan_target(&rport->dev, rport->channel,
- rport->scsi_target_id, lun, 1);
- }
- }
-
- return 0;
+ return scsi_scan_host_selected(shost, channel, id, lun, 1);
}
static int fc_tsk_mgmt_response(struct Scsi_Host *shost, u64 nexus, u64 tm_id,
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fc_user_scan correction
2008-04-22 17:28 [PATCH] fc_user_scan correction James Smart
@ 2008-04-22 18:10 ` Mike Christie
2008-04-22 18:11 ` Mike Christie
0 siblings, 1 reply; 7+ messages in thread
From: Mike Christie @ 2008-04-22 18:10 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
James Smart wrote:
> Way back when, when the fc_user_scan routine was created, it kept some
> of its original logic that walked the rport list and kicked off a scan.
> Unfortunately, it didn't keep any of the locking around the rport list,
> nor did it consider the synchronous nature of the scan invoked. The result,
> there are some scan requests where the rport list changes, thus a subsequent
> scan is called on a bogus rport structure and the system NMI's.
>
> The fix is to defer to the midlayer scan routine and not deal with locking
> in the user scan function.
>
> Patch was cut against scsi-misc-2.6
>
> -- james s
>
>
> Signed-off-by: James Smart <james.smart@emulex.com>
>
> ---
>
> scsi_scan.c | 1 +
> scsi_transport_fc.c | 21 +--------------------
> 2 files changed, 2 insertions(+), 20 deletions(-)
>
>
> diff -upNr a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> --- a/drivers/scsi/scsi_scan.c 2008-04-22 13:09:23.000000000 -0400
> +++ b/drivers/scsi/scsi_scan.c 2008-04-22 13:10:02.000000000 -0400
> @@ -1669,6 +1669,7 @@ int scsi_scan_host_selected(struct Scsi_
>
> return 0;
> }
> +EXPORT_SYMBOL(scsi_scan_host_selected);
>
> static void scsi_sysfs_add_devices(struct Scsi_Host *shost)
> {
> diff -upNr a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
> --- a/drivers/scsi/scsi_transport_fc.c 2008-04-22 12:51:22.000000000 -0400
> +++ b/drivers/scsi/scsi_transport_fc.c 2008-04-22 12:53:30.000000000 -0400
> @@ -1929,29 +1929,10 @@ fc_timed_out(struct scsi_cmnd *scmd)
> return EH_NOT_HANDLED;
> }
>
> -/*
> - * Must be called with shost->host_lock held
> - */
> static int fc_user_scan(struct Scsi_Host *shost, uint channel,
> uint id, uint lun)
> {
> - struct fc_rport *rport;
> -
> - list_for_each_entry(rport, &fc_host_rports(shost), peers) {
> - if (rport->scsi_target_id == -1)
> - continue;
> -
> - if (rport->port_state != FC_PORTSTATE_ONLINE)
> - continue;
> -
> - if ((channel == SCAN_WILD_CARD || channel == rport->channel) &&
> - (id == SCAN_WILD_CARD || id == rport->scsi_target_id)) {
> - scsi_scan_target(&rport->dev, rport->channel,
> - rport->scsi_target_id, lun, 1);
> - }
> - }
> -
> - return 0;
> + return scsi_scan_host_selected(shost, channel, id, lun, 1);
> }
Is this is the same as if you did not implement the user_scan callout?
scsi_sysfs.c will call
scsi_scan_host_selected(shost, channel, id, lun, 1);
I thought we added the user_scan callback because the transport classes
had to pass in the device struct between the host and target so we got
.../host/rport/target/scsi_device
instead of
.../host/target/scsi_device
qla4xxx has the same problem. Do not look at it for help :( It added a
mutex and does not deadlock because like the FC class it stats the
removal of the rport/session then device so the cache sync always fails
(the check ready function always returns DID_NO_CONNECT so the cache
sync fails). iscsi tcp/iser/bnx2i works because it has userspace helping
out with the removal and shutdown and does it in two stages.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fc_user_scan correction
2008-04-22 18:10 ` Mike Christie
@ 2008-04-22 18:11 ` Mike Christie
2008-04-22 19:04 ` James Smart
0 siblings, 1 reply; 7+ messages in thread
From: Mike Christie @ 2008-04-22 18:11 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
Mike Christie wrote:
> James Smart wrote:
>> Way back when, when the fc_user_scan routine was created, it kept some
>> of its original logic that walked the rport list and kicked off a scan.
>> Unfortunately, it didn't keep any of the locking around the rport list,
>> nor did it consider the synchronous nature of the scan invoked. The
>> result,
>> there are some scan requests where the rport list changes, thus a
>> subsequent
>> scan is called on a bogus rport structure and the system NMI's.
>>
>> The fix is to defer to the midlayer scan routine and not deal with
>> locking
>> in the user scan function.
>>
>> Patch was cut against scsi-misc-2.6
>>
>> -- james s
>>
>>
>> Signed-off-by: James Smart <james.smart@emulex.com>
>>
>> ---
>>
>> scsi_scan.c | 1 +
>> scsi_transport_fc.c | 21 +--------------------
>> 2 files changed, 2 insertions(+), 20 deletions(-)
>>
>>
>> diff -upNr a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>> --- a/drivers/scsi/scsi_scan.c 2008-04-22 13:09:23.000000000 -0400
>> +++ b/drivers/scsi/scsi_scan.c 2008-04-22 13:10:02.000000000 -0400
>> @@ -1669,6 +1669,7 @@ int scsi_scan_host_selected(struct Scsi_
>>
>> return 0;
>> }
>> +EXPORT_SYMBOL(scsi_scan_host_selected);
>>
>> static void scsi_sysfs_add_devices(struct Scsi_Host *shost)
>> {
>> diff -upNr a/drivers/scsi/scsi_transport_fc.c
>> b/drivers/scsi/scsi_transport_fc.c
>> --- a/drivers/scsi/scsi_transport_fc.c 2008-04-22
>> 12:51:22.000000000 -0400
>> +++ b/drivers/scsi/scsi_transport_fc.c 2008-04-22
>> 12:53:30.000000000 -0400
>> @@ -1929,29 +1929,10 @@ fc_timed_out(struct scsi_cmnd *scmd)
>> return EH_NOT_HANDLED;
>> }
>>
>> -/*
>> - * Must be called with shost->host_lock held
>> - */
>> static int fc_user_scan(struct Scsi_Host *shost, uint channel,
>> uint id, uint lun)
>> {
>> - struct fc_rport *rport;
>> -
>> - list_for_each_entry(rport, &fc_host_rports(shost), peers) {
>> - if (rport->scsi_target_id == -1)
>> - continue;
>> -
>> - if (rport->port_state != FC_PORTSTATE_ONLINE)
>> - continue;
>> -
>> - if ((channel == SCAN_WILD_CARD || channel == rport->channel) &&
>> - (id == SCAN_WILD_CARD || id == rport->scsi_target_id)) {
>> - scsi_scan_target(&rport->dev, rport->channel,
>> - rport->scsi_target_id, lun, 1);
>> - }
>> - }
>> -
>> - return 0;
>> + return scsi_scan_host_selected(shost, channel, id, lun, 1);
>> }
>
> Is this is the same as if you did not implement the user_scan callout?
> scsi_sysfs.c will call
>
> scsi_scan_host_selected(shost, channel, id, lun, 1);
>
> I thought we added the user_scan callback because the transport classes
> had to pass in the device struct between the host and target so we got
>
> .../host/rport/target/scsi_device
>
> instead of
>
> .../host/target/scsi_device
>
> qla4xxx has the same problem. Do not look at it for help :( It added a
> mutex and does not deadlock because like the FC class it stats the
> removal of the rport/session then device so the cache sync always fails
> (the check ready function always returns DID_NO_CONNECT so the cache
> sync fails). iscsi tcp/iser/bnx2i works because it has userspace helping
> out with the removal and shutdown and does it in two stages.
>
I think we need some loop + locking + refcounting similar to how the
shost_for_each_device loops over devices.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fc_user_scan correction
2008-04-22 18:11 ` Mike Christie
@ 2008-04-22 19:04 ` James Smart
2008-04-22 19:46 ` Mike Christie
2008-04-22 19:51 ` Mike Christie
0 siblings, 2 replies; 7+ messages in thread
From: James Smart @ 2008-04-22 19:04 UTC (permalink / raw)
To: Mike Christie; +Cc: linux-scsi
Mike Christie wrote:
> Mike Christie wrote:
>> Is this is the same as if you did not implement the user_scan callout?
>> scsi_sysfs.c will call
>>
>> scsi_scan_host_selected(shost, channel, id, lun, 1);
>>
>> I thought we added the user_scan callback because the transport
>> classes had to pass in the device struct between the host and target
>> so we got
>>
>> .../host/rport/target/scsi_device
>>
>> instead of
>>
>> .../host/target/scsi_device
>>
>> qla4xxx has the same problem. Do not look at it for help :( It added a
>> mutex and does not deadlock because like the FC class it stats the
>> removal of the rport/session then device so the cache sync always
>> fails (the check ready function always returns DID_NO_CONNECT so the
>> cache sync fails). iscsi tcp/iser/bnx2i works because it has userspace
>> helping out with the removal and shutdown and does it in two stages.
>>
>
> I think we need some loop + locking + refcounting similar to how the
> shost_for_each_device loops over devices.
>
For FC, I don't believe there's any advantage to looping/locking. There's
miniscule advantages of not scanning targets that are just returned back
by the driver as not being present.
Taking another look at the user_scan sysfs routine, I can only come up with
a few reasons why it exists at all:
- some transports/LLDs, which do target enumeration and auto-scan, can't
handle directed scans to targets that don't exist. I have a hard time
believing this is true.
- There's some performance advantage for walking the transport target
list rather than cycling on the target ids. But, this interface can't
performance sensitive. This is the only reason I can see why user_scan
exists (to filter out non-existent targets).
- The "rescan" flag needs to be clean. For transports that auto-scan,
they have the best knowledge of when rescan should be 0 or 1. This
protects against a race between the user scan and the 1st-time target
discovery.
Hmm... this last point gives me concern, as I didn't fix it either.
Thoughts ? I'm happy leaving it as the default, but the rescan bothers me.
-- james s
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fc_user_scan correction
2008-04-22 19:04 ` James Smart
@ 2008-04-22 19:46 ` Mike Christie
2008-04-22 21:30 ` James Smart
2008-04-22 19:51 ` Mike Christie
1 sibling, 1 reply; 7+ messages in thread
From: Mike Christie @ 2008-04-22 19:46 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
James Smart wrote:
>
>
> Mike Christie wrote:
>> Mike Christie wrote:
>>> Is this is the same as if you did not implement the user_scan
>>> callout? scsi_sysfs.c will call
>>>
>>> scsi_scan_host_selected(shost, channel, id, lun, 1);
>>>
>>> I thought we added the user_scan callback because the transport
>>> classes had to pass in the device struct between the host and target
>>> so we got
>>>
>>> .../host/rport/target/scsi_device
>>>
>>> instead of
>>>
>>> .../host/target/scsi_device
>>>
>>> qla4xxx has the same problem. Do not look at it for help :( It added
>>> a mutex and does not deadlock because like the FC class it stats the
>>> removal of the rport/session then device so the cache sync always
>>> fails (the check ready function always returns DID_NO_CONNECT so the
>>> cache sync fails). iscsi tcp/iser/bnx2i works because it has
>>> userspace helping out with the removal and shutdown and does it in
>>> two stages.
>>>
>>
>> I think we need some loop + locking + refcounting similar to how the
>> shost_for_each_device loops over devices.
>>
>
> For FC, I don't believe there's any advantage to looping/locking. There's
> miniscule advantages of not scanning targets that are just returned back
> by the driver as not being present.
I was actually just thinking of refcounting. Because we are coming in
from the host the rpot would not have a refcount from the
sysfs/userpscae reference. If there were no scsi_devices/targets on the
rport and the rport ie being removed then I thought the scsi_scan.c code
could be accessing a struct device that was freed or in the middle of
being freed.
>
> Taking another look at the user_scan sysfs routine, I can only come up with
> a few reasons why it exists at all:
> - some transports/LLDs, which do target enumeration and auto-scan, can't
> handle directed scans to targets that don't exist. I have a hard time
> believing this is true.
I am not sure what you are saying? Is this the same as my comment about
.../host/rport/target/scsi_device
vs
.../host/target/scsi_device
If you go down scsi_scan_host_selected then we go to
scsi_scan_host_selected -> scsi_scan_channel -> __scsi_scan_target and
the parent that is passed to __scsi_scan_target is the shost, so we get
.../host/target/scsi_device
For the transport classes we did scsi_scan_target and pass the rport so
we end up with
.../host/rport/target/scsi_device
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fc_user_scan correction
2008-04-22 19:04 ` James Smart
2008-04-22 19:46 ` Mike Christie
@ 2008-04-22 19:51 ` Mike Christie
1 sibling, 0 replies; 7+ messages in thread
From: Mike Christie @ 2008-04-22 19:51 UTC (permalink / raw)
To: James.Smart; +Cc: linux-scsi
James Smart wrote:
>
>
> Mike Christie wrote:
>> Mike Christie wrote:
>>> Is this is the same as if you did not implement the user_scan
>>> callout? scsi_sysfs.c will call
>>>
>>> scsi_scan_host_selected(shost, channel, id, lun, 1);
>>>
>>> I thought we added the user_scan callback because the transport
>>> classes had to pass in the device struct between the host and target
>>> so we got
>>>
>>> .../host/rport/target/scsi_device
>>>
>>> instead of
>>>
>>> .../host/target/scsi_device
>>>
>>> qla4xxx has the same problem. Do not look at it for help :( It added
>>> a mutex and does not deadlock because like the FC class it stats the
>>> removal of the rport/session then device so the cache sync always
>>> fails (the check ready function always returns DID_NO_CONNECT so the
>>> cache sync fails). iscsi tcp/iser/bnx2i works because it has
>>> userspace helping out with the removal and shutdown and does it in
>>> two stages.
>>>
>>
>> I think we need some loop + locking + refcounting similar to how the
>> shost_for_each_device loops over devices.
>>
>
> For FC, I don't believe there's any advantage to looping/locking. There's
> miniscule advantages of not scanning targets that are just returned back
> by the driver as not being present.
>
> Taking another look at the user_scan sysfs routine, I can only come up with
> a few reasons why it exists at all:
> - some transports/LLDs, which do target enumeration and auto-scan, can't
> handle directed scans to targets that don't exist. I have a hard time
> believing this is true.
> - There's some performance advantage for walking the transport target
> list rather than cycling on the target ids. But, this interface can't
> performance sensitive. This is the only reason I can see why user_scan
> exists (to filter out non-existent targets).
> - The "rescan" flag needs to be clean. For transports that auto-scan,
> they have the best knowledge of when rescan should be 0 or 1. This
> protects against a race between the user scan and the 1st-time target
> discovery.
>
Oh yeah, the original reason is here
http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=e02f3f59225d8c3b2a0ad0dc941a09865e27da61
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fc_user_scan correction
2008-04-22 19:46 ` Mike Christie
@ 2008-04-22 21:30 ` James Smart
0 siblings, 0 replies; 7+ messages in thread
From: James Smart @ 2008-04-22 21:30 UTC (permalink / raw)
To: Mike Christie; +Cc: linux-scsi
Mike,
You're right. The nuance is that user_scan *must* be supported if the
transport binds the target to a transport-specific device node that is
not the shost, as user_scan needs to call the scsi_scan_target() with
the parent node the target structure has to be bound to. It's not just
a sysfs interface.
I'll rework the patch and update the user_scan comment to make sure this
is better understood.
-- james s
Mike Christie wrote:
> James Smart wrote:
>>
>>
>> Mike Christie wrote:
>>> Mike Christie wrote:
>>>> Is this is the same as if you did not implement the user_scan
>>>> callout? scsi_sysfs.c will call
>>>>
>>>> scsi_scan_host_selected(shost, channel, id, lun, 1);
>>>>
>>>> I thought we added the user_scan callback because the transport
>>>> classes had to pass in the device struct between the host and target
>>>> so we got
>>>>
>>>> .../host/rport/target/scsi_device
>>>>
>>>> instead of
>>>>
>>>> .../host/target/scsi_device
>>>>
>>>> qla4xxx has the same problem. Do not look at it for help :( It added
>>>> a mutex and does not deadlock because like the FC class it stats the
>>>> removal of the rport/session then device so the cache sync always
>>>> fails (the check ready function always returns DID_NO_CONNECT so the
>>>> cache sync fails). iscsi tcp/iser/bnx2i works because it has
>>>> userspace helping out with the removal and shutdown and does it in
>>>> two stages.
>>>>
>>>
>>> I think we need some loop + locking + refcounting similar to how the
>>> shost_for_each_device loops over devices.
>>>
>>
>> For FC, I don't believe there's any advantage to looping/locking. There's
>> miniscule advantages of not scanning targets that are just returned back
>> by the driver as not being present.
>
> I was actually just thinking of refcounting. Because we are coming in
> from the host the rpot would not have a refcount from the
> sysfs/userpscae reference. If there were no scsi_devices/targets on the
> rport and the rport ie being removed then I thought the scsi_scan.c code
> could be accessing a struct device that was freed or in the middle of
> being freed.
>
>
>>
>> Taking another look at the user_scan sysfs routine, I can only come up
>> with
>> a few reasons why it exists at all:
>> - some transports/LLDs, which do target enumeration and auto-scan, can't
>> handle directed scans to targets that don't exist. I have a hard time
>> believing this is true.
>
> I am not sure what you are saying? Is this the same as my comment about
> .../host/rport/target/scsi_device
> vs
> .../host/target/scsi_device
>
> If you go down scsi_scan_host_selected then we go to
> scsi_scan_host_selected -> scsi_scan_channel -> __scsi_scan_target and
> the parent that is passed to __scsi_scan_target is the shost, so we get
> .../host/target/scsi_device
>
> For the transport classes we did scsi_scan_target and pass the rport so
> we end up with
> .../host/rport/target/scsi_device
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-04-22 21:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-22 17:28 [PATCH] fc_user_scan correction James Smart
2008-04-22 18:10 ` Mike Christie
2008-04-22 18:11 ` Mike Christie
2008-04-22 19:04 ` James Smart
2008-04-22 19:46 ` Mike Christie
2008-04-22 21:30 ` James Smart
2008-04-22 19:51 ` Mike Christie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).