All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] Fusion SAS and Fibre Channel: target missing after resetting external raid
@ 2008-03-04 16:50 Michael Reed
  2008-03-04 17:22 ` Moore, Eric
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Reed @ 2008-03-04 16:50 UTC (permalink / raw)
  To: linux-scsi, Moore, Eric, Prakash, Sathya; +Cc: Michael Reed, Jeremy Higdon

Following a hard reset of a SAS raid, one of the raid targets is occasionally
missing.  I tracked this down to a pretty obscure little bug.

The LSI fusion drivers for SAS and Fibre Channel both use their respective
transport layers.  Those transport layers increment the target number
assigned to new targets.

The routine __scsi_scan_target uses the "this_id" element of the Scsi_Host
structure to avoid scanning the scsi host adapter.  Both fusion drivers set
"this_id" from a value returned in a firmware PortFacts response.  For my
particular test case (SAS) the firmware id assigned to the initiator was
173.  After enough raid resets to cause the raid targets to go and come a
sufficient number of times, the id assigned by the transport to a raid
target would match the id assigned by the host adapter to the "this_id"
field, resulting in that target not being scanned.

static void __scsi_scan_target(struct device *parent, unsigned int channel,
                unsigned int id, unsigned int lun, int rescan)
{
        struct Scsi_Host *shost = dev_to_shost(parent);
        int bflags = 0;
        int res;
        struct scsi_target *starget;

        if (shost->this_id == id)
                /*
                 * Don't scan the host adapter
                 */
                return;

....

The fix is simple.  Fusion SAS and Fibre Channel (subject to same bug) should
just leave "this_id" initialized to "-1".

Applies to 2.6.25-rc3-git5.

Signed-off-by: Michael Reed <mdr@sgi.com>

--

--- kou/drivers/message/fusion/mptfc.c	2008-01-24 16:58:37.000000000 -0600
+++ ko/drivers/message/fusion/mptfc.c	2008-03-04 09:01:18.428176326 -0600
@@ -1238,8 +1238,6 @@ mptfc_probe(struct pci_dev *pdev, const 
 	sh->max_id = ioc->pfacts->MaxDevices;
 	sh->max_lun = max_lun;
 
-	sh->this_id = ioc->pfacts[0].PortSCSIID;
-
 	/* Required entry.
 	 */
 	sh->unique_id = ioc->id;
--- kou/drivers/message/fusion/mptsas.c	2008-03-04 08:38:58.000000000 -0600
+++ ko/drivers/message/fusion/mptsas.c	2008-03-04 09:01:04.284807301 -0600
@@ -3176,8 +3176,6 @@ mptsas_probe(struct pci_dev *pdev, const
 
 	sh->transportt = mptsas_transport_template;
 
-	sh->this_id = ioc->pfacts[0].PortSCSIID;
-
 	/* Required entry.
 	 */
 	sh->unique_id = ioc->id;


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH 1/1] Fusion SAS and Fibre Channel: target missing after resetting external raid
  2008-03-04 16:50 [PATCH 1/1] Fusion SAS and Fibre Channel: target missing after resetting external raid Michael Reed
@ 2008-03-04 17:22 ` Moore, Eric
  2008-03-13 19:53   ` Michael Reed
  0 siblings, 1 reply; 4+ messages in thread
From: Moore, Eric @ 2008-03-04 17:22 UTC (permalink / raw)
  To: Michael Reed, linux-scsi, Prakash, Sathya; +Cc: Jeremy Higdon

On Tuesday, March 04, 2008 9:51 AM,  Michael Reed wrote:
> Subject: [PATCH 1/1] Fusion SAS and Fibre Channel: target 
> missing after resetting external raid
> 
> Following a hard reset of a SAS raid, one of the raid targets 
> is occasionally
> missing.  I tracked this down to a pretty obscure little bug.
> 
> The LSI fusion drivers for SAS and Fibre Channel both use 
> their respective
> transport layers.  Those transport layers increment the target number
> assigned to new targets.
> 
> The routine __scsi_scan_target uses the "this_id" element of 
> the Scsi_Host
> structure to avoid scanning the scsi host adapter.  Both 
> fusion drivers set
> "this_id" from a value returned in a firmware PortFacts 
> response.  For my
> particular test case (SAS) the firmware id assigned to the 
> initiator was
> 173.  After enough raid resets to cause the raid targets to 
> go and come a
> sufficient number of times, the id assigned by the transport to a raid
> target would match the id assigned by the host adapter to the 
> "this_id"
> field, resulting in that target not being scanned.
> 
> static void __scsi_scan_target(struct device *parent, 
> unsigned int channel,
>                 unsigned int id, unsigned int lun, int rescan)
> {
>         struct Scsi_Host *shost = dev_to_shost(parent);
>         int bflags = 0;
>         int res;
>         struct scsi_target *starget;
> 
>         if (shost->this_id == id)
>                 /*
>                  * Don't scan the host adapter
>                  */
>                 return;
> 
> ....
> 
> The fix is simple.  Fusion SAS and Fibre Channel (subject to 
> same bug) should
> just leave "this_id" initialized to "-1".
> 
> Applies to 2.6.25-rc3-git5.
> 
> Signed-off-by: Michael Reed <mdr@sgi.com>
> 
> --
> 
> --- kou/drivers/message/fusion/mptfc.c	2008-01-24 
> 16:58:37.000000000 -0600
> +++ ko/drivers/message/fusion/mptfc.c	2008-03-04 
> 09:01:18.428176326 -0600
> @@ -1238,8 +1238,6 @@ mptfc_probe(struct pci_dev *pdev, const 
>  	sh->max_id = ioc->pfacts->MaxDevices;
>  	sh->max_lun = max_lun;
>  
> -	sh->this_id = ioc->pfacts[0].PortSCSIID;
> -
>  	/* Required entry.
>  	 */
>  	sh->unique_id = ioc->id;
> --- kou/drivers/message/fusion/mptsas.c	2008-03-04 
> 08:38:58.000000000 -0600
> +++ ko/drivers/message/fusion/mptsas.c	2008-03-04 
> 09:01:04.284807301 -0600
> @@ -3176,8 +3176,6 @@ mptsas_probe(struct pci_dev *pdev, const
>  
>  	sh->transportt = mptsas_transport_template;
>  
> -	sh->this_id = ioc->pfacts[0].PortSCSIID;
> -
>  	/* Required entry.
>  	 */
>  	sh->unique_id = ioc->id;
> 
>

This looks good.  I had deleted setting this_id in mptsas internal
sources long ago.    In addition to this change, we need to fix
mptscsih_slave_configure so it doesn't set the queue depth to 1 for SAS
protocal when sdev->id is greater than sh->max_id.   The sas transport
layer assigns the target ids, incrementing with each hotplug add, with
large topologies, it doesn't take long to hit this threshold.

Eric

  

 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] Fusion SAS and Fibre Channel: target missing after resetting external raid
  2008-03-04 17:22 ` Moore, Eric
@ 2008-03-13 19:53   ` Michael Reed
  2008-03-20 22:46     ` Moore, Eric
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Reed @ 2008-03-13 19:53 UTC (permalink / raw)
  To: Moore, Eric; +Cc: linux-scsi, Prakash, Sathya, Jeremy Higdon, Hannes Reinecke



Moore, Eric wrote:
>>On Tuesday, March 04, 2008 9:51 AM,  Michael Reed wrote:

<snip>

>>
>> The fix is simple.  Fusion SAS and Fibre Channel (subject to 
>> same bug) should just leave "this_id" initialized to "-1".
>>
>> Applies to 2.6.25-rc3-git5.
>>
>> Signed-off-by: Michael Reed <mdr@sgi.com>
>>
>> --
>>
<snip>
> 
> This looks good.  I had deleted setting this_id in mptsas internal
> sources long ago.    In addition to this change, we need to fix
> mptscsih_slave_configure so it doesn't set the queue depth to 1 for SAS
> protocal when sdev->id is greater than sh->max_id.   The sas transport
> layer assigns the target ids, incrementing with each hotplug add, with
> large topologies, it doesn't take long to hit this threshold.
> 
> Eric

Adjusting the patch to reflect Eric's comments.  The change WRT
max_id is from Hannes.  FC should also be subject to the same issue.

Signed-off-by: Michael Reed <mdr@sgi.com>

--- gitu/drivers/message/fusion/mptfc.c	2008-01-24 14:58:37.000000000 -0800
+++ git/drivers/message/fusion/mptfc.c	2008-03-13 12:37:52.176015612 -0700
@@ -1238,8 +1238,6 @@ mptfc_probe(struct pci_dev *pdev, const 
 	sh->max_id = ioc->pfacts->MaxDevices;
 	sh->max_lun = max_lun;
 
-	sh->this_id = ioc->pfacts[0].PortSCSIID;
-
 	/* Required entry.
 	 */
 	sh->unique_id = ioc->id;
--- gitu/drivers/message/fusion/mptsas.c	2008-03-10 13:23:46.000000000 -0700
+++ git/drivers/message/fusion/mptsas.c	2008-03-13 12:38:01.912440308 -0700
@@ -3181,8 +3181,6 @@ mptsas_probe(struct pci_dev *pdev, const
 
 	sh->transportt = mptsas_transport_template;
 
-	sh->this_id = ioc->pfacts[0].PortSCSIID;
-
 	/* Required entry.
 	 */
 	sh->unique_id = ioc->id;
--- gitu/drivers/message/fusion/mptscsih.c	2008-03-10 13:20:30.000000000 -0700
+++ git/drivers/message/fusion/mptscsih.c	2008-03-13 12:38:38.070017332 -0700
@@ -2442,12 +2442,6 @@ mptscsih_slave_configure(struct scsi_dev
 		    ioc->name, sdev->sdtr, sdev->wdtr,
 		    sdev->ppr, sdev->inquiry_len));
 
-	if (sdev->id > sh->max_id) {
-		/* error case, should never happen */
-		scsi_adjust_queue_depth(sdev, 0, 1);
-		goto slave_configure_exit;
-	}
-
 	vdevice->configured_lun = 1;
 	mptscsih_change_queue_depth(sdev, MPT_SCSI_CMD_PER_DEV_HIGH);
 
@@ -2461,8 +2455,6 @@ mptscsih_slave_configure(struct scsi_dev
 		    ioc->name, vtarget->negoFlags, vtarget->maxOffset,
 		    vtarget->minSyncFactor));
 
-slave_configure_exit:
-
 	dsprintk(ioc, printk(MYIOC_s_DEBUG_FMT
 		"tagged %d, simple %d, ordered %d\n",
 		ioc->name,sdev->tagged_supported, sdev->simple_tags,

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH 1/1] Fusion SAS and Fibre Channel: target missing after resetting external raid
  2008-03-13 19:53   ` Michael Reed
@ 2008-03-20 22:46     ` Moore, Eric
  0 siblings, 0 replies; 4+ messages in thread
From: Moore, Eric @ 2008-03-20 22:46 UTC (permalink / raw)
  To: Michael Reed; +Cc: linux-scsi, Prakash, Sathya, Jeremy Higdon, Hannes Reinecke

On Thursday, March 13, 2008 1:54 PM,  Michael Reed wrote:
> Adjusting the patch to reflect Eric's comments.  The change WRT
> max_id is from Hannes.  FC should also be subject to the same issue.
> 
> Signed-off-by: Michael Reed <mdr@sgi.com>
> 
> --- gitu/drivers/message/fusion/mptfc.c	2008-01-24 
> 14:58:37.000000000 -0800
> +++ git/drivers/message/fusion/mptfc.c	2008-03-13 
> 12:37:52.176015612 -0700
> @@ -1238,8 +1238,6 @@ mptfc_probe(struct pci_dev *pdev, const 
>  	sh->max_id = ioc->pfacts->MaxDevices;
>  	sh->max_lun = max_lun;
>  
> -	sh->this_id = ioc->pfacts[0].PortSCSIID;
> -
>  	/* Required entry.
>  	 */
>  	sh->unique_id = ioc->id;
> --- gitu/drivers/message/fusion/mptsas.c	2008-03-10 
> 13:23:46.000000000 -0700
> +++ git/drivers/message/fusion/mptsas.c	2008-03-13 
> 12:38:01.912440308 -0700
> @@ -3181,8 +3181,6 @@ mptsas_probe(struct pci_dev *pdev, const
>  
>  	sh->transportt = mptsas_transport_template;
>  
> -	sh->this_id = ioc->pfacts[0].PortSCSIID;
> -
>  	/* Required entry.
>  	 */
>  	sh->unique_id = ioc->id;
> --- gitu/drivers/message/fusion/mptscsih.c	2008-03-10 
> 13:20:30.000000000 -0700
> +++ git/drivers/message/fusion/mptscsih.c	2008-03-13 
> 12:38:38.070017332 -0700
> @@ -2442,12 +2442,6 @@ mptscsih_slave_configure(struct scsi_dev
>  		    ioc->name, sdev->sdtr, sdev->wdtr,
>  		    sdev->ppr, sdev->inquiry_len));
>  
> -	if (sdev->id > sh->max_id) {
> -		/* error case, should never happen */
> -		scsi_adjust_queue_depth(sdev, 0, 1);
> -		goto slave_configure_exit;
> -	}
> -
>  	vdevice->configured_lun = 1;
>  	mptscsih_change_queue_depth(sdev, MPT_SCSI_CMD_PER_DEV_HIGH);
>  
> @@ -2461,8 +2455,6 @@ mptscsih_slave_configure(struct scsi_dev
>  		    ioc->name, vtarget->negoFlags, vtarget->maxOffset,
>  		    vtarget->minSyncFactor));
>  
> -slave_configure_exit:
> -
>  	dsprintk(ioc, printk(MYIOC_s_DEBUG_FMT
>  		"tagged %d, simple %d, ordered %d\n",
>  		ioc->name,sdev->tagged_supported, sdev->simple_tags,
>

ACK 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-03-20 22:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-04 16:50 [PATCH 1/1] Fusion SAS and Fibre Channel: target missing after resetting external raid Michael Reed
2008-03-04 17:22 ` Moore, Eric
2008-03-13 19:53   ` Michael Reed
2008-03-20 22:46     ` Moore, Eric

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.