[PATCH] mpt2sas: DIF Type 2 Protection Support

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mpt2sas: DIF Type 2 Protection Support
@ 2010-04-22 16:47 Eric Moore
  2010-04-22 19:24 ` lpfc SAN/SCSI issue brem belguebli
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Moore @ 2010-04-22 16:47 UTC (permalink / raw)
  To: linux-scsi

Adding DIF Type 2 protection support, as well as turning on 32 byte cdb's,
and setting the cdb length for > 16 byte in the SCSI_IO->control parameter.

Signed-off-by: Martin Petersen <martin.petersen@oracle.com>
Signed-off-by: Eric Moore <eric.moore@lsi.com>

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
index b4afe43..0f41fcd 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.h
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
@@ -69,11 +69,11 @@
 #define MPT2SAS_DRIVER_NAME		"mpt2sas"
 #define MPT2SAS_AUTHOR	"LSI Corporation <DL-MPTFusionLinux@lsi.com>"
 #define MPT2SAS_DESCRIPTION	"LSI MPT Fusion SAS 2.0 Device Driver"
-#define MPT2SAS_DRIVER_VERSION		"05.100.00.02"
+#define MPT2SAS_DRIVER_VERSION		"05.100.00.03"
 #define MPT2SAS_MAJOR_VERSION		05
 #define MPT2SAS_MINOR_VERSION		100
 #define MPT2SAS_BUILD_VERSION		00
-#define MPT2SAS_RELEASE_VERSION		02
+#define MPT2SAS_RELEASE_VERSION		03
 
 /*
  * Set MPT2SAS_SG_DEPTH value based on user input.
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index c5ff26a..456ea7c 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -2858,9 +2858,7 @@ _scsih_setup_eedp(struct scsi_cmnd *scmd, Mpi2SCSIIORequest_t *mpi_request)
 	unsigned char prot_op = scsi_get_prot_op(scmd);
 	unsigned char prot_type = scsi_get_prot_type(scmd);
 
-	if (prot_type == SCSI_PROT_DIF_TYPE0 ||
-	   prot_type == SCSI_PROT_DIF_TYPE2 ||
-	   prot_op == SCSI_PROT_NORMAL)
+	if (prot_type == SCSI_PROT_DIF_TYPE0 || prot_op == SCSI_PROT_NORMAL)
 		return;
 
 	if (prot_op ==  SCSI_PROT_READ_STRIP)
@@ -2882,7 +2880,13 @@ _scsih_setup_eedp(struct scsi_cmnd *scmd, Mpi2SCSIIORequest_t *mpi_request)
 		    MPI2_SCSIIO_EEDPFLAGS_CHECK_GUARD;
 		mpi_request->CDB.EEDP32.PrimaryReferenceTag =
 		    cpu_to_be32(scsi_get_lba(scmd));
+		break;
+
+	case SCSI_PROT_DIF_TYPE2:
 
+		eedp_flags |= MPI2_SCSIIO_EEDPFLAGS_INC_PRI_REFTAG |
+		    MPI2_SCSIIO_EEDPFLAGS_CHECK_REFTAG |
+		    MPI2_SCSIIO_EEDPFLAGS_CHECK_GUARD;
 		break;
 
 	case SCSI_PROT_DIF_TYPE3:
@@ -3013,7 +3017,7 @@ _scsih_qcmd(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
 		mpi_control |= MPI2_SCSIIO_CONTROL_SIMPLEQ;
 	/* Make sure Device is not raid volume */
 	if (!_scsih_is_raid(&scmd->device->sdev_gendev) &&
-	    sas_is_tlr_enabled(scmd->device))
+	    sas_is_tlr_enabled(scmd->device) && scmd->cmd_len != 32)
 		mpi_control |= MPI2_SCSIIO_CONTROL_TLR_ON;
 
 	smid = mpt2sas_base_get_smid_scsiio(ioc, ioc->scsi_io_cb_idx, scmd);
@@ -3025,6 +3029,8 @@ _scsih_qcmd(struct scsi_cmnd *scmd, void (*done)(struct scsi_cmnd *))
 	mpi_request = mpt2sas_base_get_msg_frame(ioc, smid);
 	memset(mpi_request, 0, sizeof(Mpi2SCSIIORequest_t));
 	_scsih_setup_eedp(scmd, mpi_request);
+	if (scmd->cmd_len == 32)
+		mpi_control |= 4 << MPI2_SCSIIO_CONTROL_ADDCDBLEN_SHIFT;
 	mpi_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST;
 	if (sas_device_priv_data->sas_target->flags &
 	    MPT_TARGET_FLAGS_RAID_COMPONENT)
@@ -6567,7 +6573,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_LIST_HEAD(&ioc->delayed_tr_list);
 
 	/* init shost parameters */
-	shost->max_cmd_len = 16;
+	shost->max_cmd_len = 32;
 	shost->max_lun = max_lun;
 	shost->transportt = mpt2sas_transport_template;
 	shost->unique_id = ioc->id;
@@ -6580,7 +6586,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 
 	scsi_host_set_prot(shost, SHOST_DIF_TYPE1_PROTECTION
-	    | SHOST_DIF_TYPE3_PROTECTION);
+	    | SHOST_DIF_TYPE2_PROTECTION | SHOST_DIF_TYPE3_PROTECTION);
 	scsi_host_set_guard(shost, SHOST_DIX_GUARD_CRC);
 
 	/* event thread */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* lpfc SAN/SCSI issue
  2010-04-22 16:47 [PATCH] mpt2sas: DIF Type 2 Protection Support Eric Moore
@ 2010-04-22 19:24 ` brem belguebli
  2010-04-23 13:28   ` James Smart
  0 siblings, 1 reply; 9+ messages in thread
From: brem belguebli @ 2010-04-22 19:24 UTC (permalink / raw)
  Cc: linux-scsi

I have a server (RHEL 5.3) connected to 2 SAN extended fabrics (across 2
sites, distance 1 ms, links are ISL with 100 km long distance buffer
credits) via 2 lpfc HBA's (LPe1105-HP FC with the RHEL 5.3 shipped LPFC
driver 8.2.0.33.3p.)

A SAN FABRIC reconfiguration (DWDM Ring failover from worker to
protection)  occured yesterday  after some intersite telco link switch
that lasted less than 0,3 ms. 

Only one FABRIC was impacted, named FABRIC2 

Our server is connected to the FABRICs thru 2 edge switches, so not
directly connected to the core switches on which the link failure
occured. 

>From then, our server (which accesses thru the 2 fabrics the LUNS from
our 2 sites) started to climb in terms of load average (up to 250 for a
dual proc quadcore machine!) with a high percentage of iowait (up to
50%). 

We did some testing, bypassing DM-MP by issuing dd commands to the
physical /dev/sdX devices (more than 30 LUNS are presented to the
server, seen each thru 4 paths making more than 120 /dev/sd devices)
and half of our dd processes went to D state, as well as some unitary
scsi_id that we manually run on the same physical devices. 

Multipathd itself was also in D state. 

The only way to restore the whole thing was to reset the server HBA
connected to FABRIC2, after 2 hours of investigation 

No kind of scsi log, or whatever did appear during the outage duration
(~2 hours) despite the fact that the scsi timeouts set on the physical
devices is 60s, that the HBA's timeout is 14s. 

The /sys/block/sdX/device/state were showing running state despite the
fact that the devices (well half of them) were actually inaccessible. 

What leads me to : 

1) assumption: it looks the lpfc driver following this SAN event goes in
a black hole mode not returning any io error or whatever to the scsi
upper layer 

2) question: how come the scsi timers don't trigger and declare the
device faulty (the answer may be in the above assumption). 

Any idea or tip on what could cause this, some FC SCN message not well
handled or whatever ?

Regards

Brem

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lpfc SAN/SCSI issue
  2010-04-22 19:24 ` lpfc SAN/SCSI issue brem belguebli
@ 2010-04-23 13:28   ` James Smart
       [not found]     ` <j2o29ae894c1004230922le8baf635y563e50e3edc53bc3@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: James Smart @ 2010-04-23 13:28 UTC (permalink / raw)
  To: brem belguebli; +Cc: linux-scsi@vger.kernel.org

Brem,

We're looking at the lpfc driver as to whether this matches anything we are 
aware of.

Please send me the system console log during this time frame. No messages 
whatsoever would be very odd.  Sending us the output of the shost, rport, and 
sdev sysfs parameters, as well as DM configuration values would also help. 
It won't necessarily be i/o timers that would fire, but other timers should.

-- james s


brem belguebli wrote:
> I have a server (RHEL 5.3) connected to 2 SAN extended fabrics (across 2
> sites, distance 1 ms, links are ISL with 100 km long distance buffer
> credits) via 2 lpfc HBA's (LPe1105-HP FC with the RHEL 5.3 shipped LPFC
> driver 8.2.0.33.3p.)
>  
> A SAN FABRIC reconfiguration (DWDM Ring failover from worker to
> protection)  occured yesterday  after some intersite telco link switch
> that lasted less than 0,3 ms. 
>  
> Only one FABRIC was impacted, named FABRIC2 
>  
> Our server is connected to the FABRICs thru 2 edge switches, so not
> directly connected to the core switches on which the link failure
> occured. 
>  
>>From then, our server (which accesses thru the 2 fabrics the LUNS from
> our 2 sites) started to climb in terms of load average (up to 250 for a
> dual proc quadcore machine!) with a high percentage of iowait (up to
> 50%). 
>  
> We did some testing, bypassing DM-MP by issuing dd commands to the
> physical /dev/sdX devices (more than 30 LUNS are presented to the
> server, seen each thru 4 paths making more than 120 /dev/sd devices)
> and half of our dd processes went to D state, as well as some unitary
> scsi_id that we manually run on the same physical devices. 
>  
> Multipathd itself was also in D state. 
>  
> The only way to restore the whole thing was to reset the server HBA
> connected to FABRIC2, after 2 hours of investigation 
>  
> No kind of scsi log, or whatever did appear during the outage duration
> (~2 hours) despite the fact that the scsi timeouts set on the physical
> devices is 60s, that the HBA's timeout is 14s. 
>  
> The /sys/block/sdX/device/state were showing running state despite the
> fact that the devices (well half of them) were actually inaccessible. 
>  
> What leads me to : 
>  
> 1) assumption: it looks the lpfc driver following this SAN event goes in
> a black hole mode not returning any io error or whatever to the scsi
> upper layer 
>  
> 2) question: how come the scsi timers don't trigger and declare the
> device faulty (the answer may be in the above assumption). 
>  
> Any idea or tip on what could cause this, some FC SCN message not well
> handled or whatever ?
> 
> Regards
> 
> Brem
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <j2o29ae894c1004230922le8baf635y563e50e3edc53bc3@mail.gmail.com>]

[parent not found: <4BD226F4.6070908@emulex.com>]

[parent not found: <1272109999.2983.30.camel@localhost>]

[parent not found: <4BD5D258.8030309@emulex.com>]

* Re: lpfc SAN/SCSI issue
       [not found]           ` <4BD5D258.8030309@emulex.com>
@ 2010-04-26 21:52             ` brem belguebli
  2010-04-27 17:37               ` brem belguebli
  0 siblings, 1 reply; 9+ messages in thread
From: brem belguebli @ 2010-04-26 21:52 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi

Hi James,

On Mon, 2010-04-26 at 13:50 -0400, James Smart wrote:
> Brem,
> 
> I'm not understanding you.
> 
> 
> brem belguebli wrote: 
> > We have sg3_utils installed , and I think we ran sg_verify on one or
> > 2
> > unresponsive /dev/sd and it didn't give the hand back.
> >   
> what do you mean "give the hand back" ?    was the operation
> successful or not ?
> 
When I say it didn't give the hand back, I mean the one or 2 processes
got stuck in D state, thus not returning success .
> > It was exactly
> > cd /sys/block
> > for DEV in `ls -1d dev*`; do
> > echo ${DEV}
> >         dd if =/dev/${DEV} of=/dev/null bs=1024 count=1 &
> >         echo
> > done
> > 
> > And yes it really works, never seen any kind of preemption of DM-MP over
> > direct sd access. I've cc'ed dm-devel may be some DM guru could give his
> > opinion on this.
> > 
> > Next time, I'll use a sg_dd instead of dd, to bypass any cache effect
> > (by the way, does VFS cache anything when addressing /dev/X devices ?)
> >   
> ok - by "works" means "dd successfully read 1 block from the device" -
> right ?
> 
Yes, the devices on which dd was successful were the ones from FABRIC1,
dd completed successfully by reading the first 1024 bytes to copy them
to /dev/null
  
> > > The most interesting for the lpfc driver would be the lpfc module
> > > parameter "lpfc_log_verbose=4115"
> > > which turns on discovery log messages, els messages, link events, and
> > > FCP i/o error messages.
> > >     
> > 
> > As our DWDM ring switch is on the less optimal path, there will be a
> > switch back to nominal soon.
> > 
> > I'll activate this log level on the HBA's and check the firmware
> > versions you gave me .
> >   
> ok. I believe that the shost for the adapters in question, have a
> sysfs variable for lpfc_log_verbose, that sets the log level on the
> individual adapter. This would not require you to unload/reload the
> driver to set the option.
> 
I'll tell you tomorrow (was off today) if the parameter exists for these
HBA's.
> > Hopefully, we will be able to provide you something deeper to
> > investigate.
> > 
> > Brem
> >   
> 
> ok.
> 
> -- james
> 
> 
Thanks



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lpfc SAN/SCSI issue
  2010-04-26 21:52             ` brem belguebli
@ 2010-04-27 17:37               ` brem belguebli
  2010-05-03 16:39                 ` brem belguebli
  0 siblings, 1 reply; 9+ messages in thread
From: brem belguebli @ 2010-04-27 17:37 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi

Hi James,

I could set lpfc_log_verbose on both HBA's to 4115, I hope it'll be high
enough to get interesting traces.

On Mon, 2010-04-26 at 23:52 +0200, brem belguebli wrote:
> Hi James,
> 
> On Mon, 2010-04-26 at 13:50 -0400, James Smart wrote:
> > Brem,
> > 
> > I'm not understanding you.
> > 
> > 
> > brem belguebli wrote: 
> > > We have sg3_utils installed , and I think we ran sg_verify on one or
> > > 2
> > > unresponsive /dev/sd and it didn't give the hand back.
> > >   
> > what do you mean "give the hand back" ?    was the operation
> > successful or not ?
> > 
> When I say it didn't give the hand back, I mean the one or 2 processes
> got stuck in D state, thus not returning success .
> > > It was exactly
> > > cd /sys/block
> > > for DEV in `ls -1d dev*`; do
> > > echo ${DEV}
> > >         dd if =/dev/${DEV} of=/dev/null bs=1024 count=1 &
> > >         echo
> > > done
> > > 
> > > And yes it really works, never seen any kind of preemption of DM-MP over
> > > direct sd access. I've cc'ed dm-devel may be some DM guru could give his
> > > opinion on this.
> > > 
> > > Next time, I'll use a sg_dd instead of dd, to bypass any cache effect
> > > (by the way, does VFS cache anything when addressing /dev/X devices ?)
> > >   
> > ok - by "works" means "dd successfully read 1 block from the device" -
> > right ?
> > 
> Yes, the devices on which dd was successful were the ones from FABRIC1,
> dd completed successfully by reading the first 1024 bytes to copy them
> to /dev/null
>   
> > > > The most interesting for the lpfc driver would be the lpfc module
> > > > parameter "lpfc_log_verbose=4115"
> > > > which turns on discovery log messages, els messages, link events, and
> > > > FCP i/o error messages.
> > > >     
> > > 
> > > As our DWDM ring switch is on the less optimal path, there will be a
> > > switch back to nominal soon.
> > > 
> > > I'll activate this log level on the HBA's and check the firmware
> > > versions you gave me .
> > >   
> > ok. I believe that the shost for the adapters in question, have a
> > sysfs variable for lpfc_log_verbose, that sets the log level on the
> > individual adapter. This would not require you to unload/reload the
> > driver to set the option.
> > 
> I'll tell you tomorrow (was off today) if the parameter exists for these
> HBA's.


> > > Hopefully, we will be able to provide you something deeper to
> > > investigate.
> > > 
> > > Brem
> > >   
> > 
> > ok.
> > 
> > -- james
> > 
> > 
> Thanks
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lpfc SAN/SCSI issue
  2010-04-27 17:37               ` brem belguebli
@ 2010-05-03 16:39                 ` brem belguebli
  2010-05-05 14:01                   ` James Smart
  0 siblings, 1 reply; 9+ messages in thread
From: brem belguebli @ 2010-05-03 16:39 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi

Hi james,

We haven't yet been able to ask our Telco to switch back the DWDM
links to original situation.

However, since logging was activated on the server I'm having a lot of
messages :

lpfc 0000:10:00.1: 1:(0):0730 FCP command x26 failed: x2 SNS x70000500
x20000000 Data: xa x200 x10 x0 x0

for which I couldn't find no explanation
(http://www-dl.emulex.com/support/linux/820482p/linux.pdf)

Do you have any information on this ?

Also, there are other lpfc parameters that could be tweaked if I
understand well their meaning:

lpfc_hba_queue_depth currently set to 1024 :   Does it represent the
number of [IOs/Exchanges] the HBA will queue untill the remote port
acks them or untill it is considered down ?

lpfc_max_scsicmpl_time set to 0 : Does 0 represent some infinite
value, meaning it won't timeout any IO for which the driver did not
receive any completion ack ?

Thanks

Brem




2010/4/27 brem belguebli <brem.belguebli@gmail.com>:
> Hi James,
>
> I could set lpfc_log_verbose on both HBA's to 4115, I hope it'll be high
> enough to get interesting traces.
>
> On Mon, 2010-04-26 at 23:52 +0200, brem belguebli wrote:
>> Hi James,
>>
>> On Mon, 2010-04-26 at 13:50 -0400, James Smart wrote:
>> > Brem,
>> >
>> > I'm not understanding you.
>> >
>> >
>> > brem belguebli wrote:
>> > > We have sg3_utils installed , and I think we ran sg_verify on one or
>> > > 2
>> > > unresponsive /dev/sd and it didn't give the hand back.
>> > >
>> > what do you mean "give the hand back" ?    was the operation
>> > successful or not ?
>> >
>> When I say it didn't give the hand back, I mean the one or 2 processes
>> got stuck in D state, thus not returning success .
>> > > It was exactly
>> > > cd /sys/block
>> > > for DEV in `ls -1d dev*`; do
>> > > echo ${DEV}
>> > >         dd if =/dev/${DEV} of=/dev/null bs=1024 count=1 &
>> > >         echo
>> > > done
>> > >
>> > > And yes it really works, never seen any kind of preemption of DM-MP over
>> > > direct sd access. I've cc'ed dm-devel may be some DM guru could give his
>> > > opinion on this.
>> > >
>> > > Next time, I'll use a sg_dd instead of dd, to bypass any cache effect
>> > > (by the way, does VFS cache anything when addressing /dev/X devices ?)
>> > >
>> > ok - by "works" means "dd successfully read 1 block from the device" -
>> > right ?
>> >
>> Yes, the devices on which dd was successful were the ones from FABRIC1,
>> dd completed successfully by reading the first 1024 bytes to copy them
>> to /dev/null
>>
>> > > > The most interesting for the lpfc driver would be the lpfc module
>> > > > parameter "lpfc_log_verbose=4115"
>> > > > which turns on discovery log messages, els messages, link events, and
>> > > > FCP i/o error messages.
>> > > >
>> > >
>> > > As our DWDM ring switch is on the less optimal path, there will be a
>> > > switch back to nominal soon.
>> > >
>> > > I'll activate this log level on the HBA's and check the firmware
>> > > versions you gave me .
>> > >
>> > ok. I believe that the shost for the adapters in question, have a
>> > sysfs variable for lpfc_log_verbose, that sets the log level on the
>> > individual adapter. This would not require you to unload/reload the
>> > driver to set the option.
>> >
>> I'll tell you tomorrow (was off today) if the parameter exists for these
>> HBA's.
>
>
>> > > Hopefully, we will be able to provide you something deeper to
>> > > investigate.
>> > >
>> > > Brem
>> > >
>> >
>> > ok.
>> >
>> > -- james
>> >
>> >
>> Thanks
>>
>>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lpfc SAN/SCSI issue
  2010-05-03 16:39                 ` brem belguebli
@ 2010-05-05 14:01                   ` James Smart
  2010-05-06 11:06                     ` brem belguebli
  0 siblings, 1 reply; 9+ messages in thread
From: James Smart @ 2010-05-05 14:01 UTC (permalink / raw)
  To: brem belguebli; +Cc: linux-scsi@vger.kernel.org

brem belguebli wrote:
> Hi james,
> 
> We haven't yet been able to ask our Telco to switch back the DWDM
> links to original situation.
> 
> However, since logging was activated on the server I'm having a lot of
> messages :
> 
> lpfc 0000:10:00.1: 1:(0):0730 FCP command x26 failed: x2 SNS x70000500
> x20000000 Data: xa x200 x10 x0 x0
> 
> for which I couldn't find no explanation
> (http://www-dl.emulex.com/support/linux/820482p/linux.pdf)
> 
> Do you have any information on this ?

This is saying that SCSI command opcode 0x26 (Vendor-specific opcode ??) 
failed, with Status code x2 (Check Condition) followed by the SCSI sense data, 
w/ Sense Key 5 (ILLEGAL REQUEST).

I don't know who would be issuing this command (opcode 0x26), most likely some 
utility/daemon using sgio, but the target is rejecting the command (not valid 
for the vendor).  Very reasonable.

> Also, there are other lpfc parameters that could be tweaked if I
> understand well their meaning:
> 
> lpfc_hba_queue_depth currently set to 1024 :   Does it represent the
> number of [IOs/Exchanges] the HBA will queue untill the remote port
> acks them or untill it is considered down ?

This is the total number of i/o's outstanding on the wire, to all 
targets/luns, at any point in time.  This is typically the capacity of the 
adapter, which is used in a FIFO basis as I/O is received from the midlayer. 
The default value of the attribute takes the maximum from the adapter. On your 
adapter, the value is 1024. On most newer adapters, it is 2x this or more. 
The only time I've seen this value tweaked is when our adapter is connected to 
a single target (array), and overruns or fully utilizes the capacity of the 
target, causing the target to work harder, and actually accomplish less, than 
it could at say an 80% utilization level (note: capacity level is 
target-specific).   (another reason per-target queue_depth handling was put in 
- see next comment).

> 
> lpfc_max_scsicmpl_time set to 0 : Does 0 represent some infinite
> value, meaning it won't timeout any IO for which the driver did not
> receive any completion ack ?

No, unrelated.  This is relative to target queue depth mgmt.  The midlayer 
doesn't do queue depth management by target - only per sdev (lun). Our driver 
does though.  Target queue depth is the sum of all i/o to all luns on the same 
target,  with a threshold that may or may not be capped based on the array 
type, and which scales/ramps down to the existing outstanding i/o count when 
the target reports QUEUE_FULL/TASK_SET_FULL.  This behavior is valid only on 
targets that have a shared i/o queue for all luns.  This value controls the 
per-target ramp-up processing. If 0, we use a constant compiled-in interval 
which ramps our target queue depth back up by x%. When non-zero, it specifies 
a shost-specific time interval for the ramp up (it's actually a little 
trickier than this as it's tailored on some arrays that really depended upon 
not being overrun beyond their capacity levels).

-- james s

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lpfc SAN/SCSI issue
  2010-05-05 14:01                   ` James Smart
@ 2010-05-06 11:06                     ` brem belguebli
  2010-05-06 13:39                       ` James Smart
  0 siblings, 1 reply; 9+ messages in thread
From: brem belguebli @ 2010-05-06 11:06 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi@vger.kernel.org

Hi James,


2010/5/5 James Smart <james.smart@emulex.com>:
>
>
> brem belguebli wrote:
>>
>> Hi james,
>>
>> We haven't yet been able to ask our Telco to switch back the DWDM
>> links to original situation.
>>
>> However, since logging was activated on the server I'm having a lot of
>> messages :
>>
>> lpfc 0000:10:00.1: 1:(0):0730 FCP command x26 failed: x2 SNS x70000500
>> x20000000 Data: xa x200 x10 x0 x0
>>
>> for which I couldn't find no explanation
>> (http://www-dl.emulex.com/support/linux/820482p/linux.pdf)
>>
>> Do you have any information on this ?
>
> This is saying that SCSI command opcode 0x26 (Vendor-specific opcode ??)
> failed, with Status code x2 (Check Condition) followed by the SCSI sense
> data, w/ Sense Key 5 (ILLEGAL REQUEST).
>
> I don't know who would be issuing this command (opcode 0x26), most likely
> some utility/daemon using sgio, but the target is rejecting the command (not
> valid for the vendor).  Very reasonable.
>
I could finally find the 730 messages explanation in your docs, and we
have tracked the faulty program.
It is hpasm which is shipped with the Proliant Support Pack, that we
invoque in the monitoring of the hardware RAID of the servers.
Actually the same program runs on similar (OS, HBA's, etc...) machines
without querying the opcode 0x26, and on 2 servers it does.
Further investigation on this pointed out that on these 2 servers, we
did install extra Emulex packages, elxocmlibhbaapi,
elxocmlibhbaapi-32bit and elxocmcore that install various libraries (
/usr/lib/libemsdm.so, /usr/lib/libdfc.so,/usr/lib/libnl.so.1) that
certainly contained symbols that are, thru the linux-gate.so, matched
in these 3 libs, making the above program (hpasm) querying opcode 0x26
on all the storage controllers on the system.
>
>> Also, there are other lpfc parameters that could be tweaked if I
>> understand well their meaning:
>>
>> lpfc_hba_queue_depth currently set to 1024 :   Does it represent the
>> number of [IOs/Exchanges] the HBA will queue untill the remote port
>> acks them or untill it is considered down ?
>
> This is the total number of i/o's outstanding on the wire, to all
> targets/luns, at any point in time.  This is typically the capacity of the
> adapter, which is used in a FIFO basis as I/O is received from the midlayer.
> The default value of the attribute takes the maximum from the adapter. On
> your adapter, the value is 1024. On most newer adapters, it is 2x this or
> more. The only time I've seen this value tweaked is when our adapter is
> connected to a single target (array), and overruns or fully utilizes the
> capacity of the target, causing the target to work harder, and actually
> accomplish less, than it could at say an 80% utilization level (note:
> capacity level is target-specific).   (another reason per-target queue_depth
> handling was put in - see next comment).
>
>
>>
>> lpfc_max_scsicmpl_time set to 0 : Does 0 represent some infinite
>> value, meaning it won't timeout any IO for which the driver did not
>> receive any completion ack ?
>
> No, unrelated.  This is relative to target queue depth mgmt.  The midlayer
> doesn't do queue depth management by target - only per sdev (lun). Our
> driver does though.  Target queue depth is the sum of all i/o to all luns on
> the same target,  with a threshold that may or may not be capped based on
> the array type, and which scales/ramps down to the existing outstanding i/o
> count when the target reports QUEUE_FULL/TASK_SET_FULL.  This behavior is
> valid only on targets that have a shared i/o queue for all luns.  This value
> controls the per-target ramp-up processing. If 0, we use a constant
> compiled-in interval which ramps our target queue depth back up by x%. When
> non-zero, it specifies a shost-specific time interval for the ramp up (it's
> actually a little trickier than this as it's tailored on some arrays that
> really depended upon not being overrun beyond their capacity levels).
>
Thanks for the explanation.

However, we do not have anymore x26 opcode error messages, though I
wasn't sure this was the root cause of the problem we had during the
DWDM ring failover, I increased the logging (0xffff) on the HBA's of
the nodes (total 4 nodes, 2 that were reporting the x26 opcode error
say Group A, and the 2 that never did, say Group B).
These 4 nodes form a cluster accessing the same LUNS thru the same
controllers the very same way, and I get errors relative to INQUIRY on
 Group A:

lpfc 0000:10:00.1: 1:(0):0730 FCP command x12 failed: x0 SNS x0 x0
Data: x8 x3c x0 x0 x0
lpfc 0000:10:00.1: 1:(0):0716 FCP Read Underrun, expected 96, residual
60 Data: x3c x12 x0
lpfc 0000:10:00.1: 1:0336 Rsp Ring 0 error: IOCB Data: xff000018
xe99fc48 x0 x0 x3c x0 x1d70c8e xa29b16
lpfc 0000:10:00.1: 1:0729 FCP cmd x12 failed <0/0> status: x1 result:
x3c Data: x1d7 xc8e
lpfc 0000:10:00.0: 0:(0):0730 FCP command x12 failed: x0 SNS x0 x0
Data: x8 x3c x0 x0 x0
lpfc 0000:10:00.0: 0:(0):0716 FCP Read Underrun, expected 96, residual
60 Data: x3c x12 x0
lpfc 0000:10:00.1: 1:0336 Rsp Ring 0 error: IOCB Data: xff000018
xe9960c0 x0 x0 x3c x0 x3360c67 xa29b16
lpfc 0000:10:00.1: 1:0729 FCP cmd x12 failed <0/0> status: x1 result:
x3c Data: x336 xc67

On both HBA's and concerning the 13 paths seen thru target 0 (<0/0>, <0/1>...)

Group B doesn't show no error.

I'm going to get on one of Group B node a HBA's change to make sure it
is not a hardware issue, and I'll keep you informed.


>
> -- james s
>
>
Regards

Brem
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: lpfc SAN/SCSI issue
  2010-05-06 11:06                     ` brem belguebli
@ 2010-05-06 13:39                       ` James Smart
  0 siblings, 0 replies; 9+ messages in thread
From: James Smart @ 2010-05-06 13:39 UTC (permalink / raw)
  To: brem belguebli; +Cc: linux-scsi@vger.kernel.org

brem belguebli wrote:
> However, we do not have anymore x26 opcode error messages, though I
> wasn't sure this was the root cause of the problem we had during the
> DWDM ring failover,

It most likely wasn't - although error handlers on some arrays, when 
overloaded or going through failovers, sometimes react oddly.

> I increased the logging (0xffff) on the HBA's of
> the nodes (total 4 nodes, 2 that were reporting the x26 opcode error
> say Group A, and the 2 that never did, say Group B).

I did not recommend 0xFFFF as it turns on everything - whether error or not. 
The value I gave should have filtered out non-errors.

> These 4 nodes form a cluster accessing the same LUNS thru the same
> controllers the very same way, and I get errors relative to INQUIRY on
>  Group A:
> 
> lpfc 0000:10:00.1: 1:(0):0730 FCP command x12 failed: x0 SNS x0 x0
> Data: x8 x3c x0 x0 x0
> lpfc 0000:10:00.1: 1:(0):0716 FCP Read Underrun, expected 96, residual
> 60 Data: x3c x12 x0
> lpfc 0000:10:00.1: 1:0336 Rsp Ring 0 error: IOCB Data: xff000018
> xe99fc48 x0 x0 x3c x0 x1d70c8e xa29b16

Yes - this a normal response for SCSI commands where the command allows 
variable length data from the target - INQUIRY is such a case. We report any 
SCSI completion error - such as this underrun (target returned less data than 
the buffer the host gave it).  This is not an error.

> Group B doesn't show no error.

If you're not seeing the underrun error - there isn't i/o being performed. And 
if INQUIRY isn't being seen, the midlayer isn't attempting to scan the device. 
  Most likely is the hba isn't even seeing the target, which should be visible 
from the lpfc log messages on FC discovery.  Please send me the log messages 
for the Group B hosts and I'll help interpret - However! don't spam linux-scsi 
with this huge log (especially if 0xffff, the older log value should have been 
good enough). Send it to me off-list.

-- james s

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-05-06 13:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-22 16:47 [PATCH] mpt2sas: DIF Type 2 Protection Support Eric Moore
2010-04-22 19:24 ` lpfc SAN/SCSI issue brem belguebli
2010-04-23 13:28   ` James Smart
     [not found]     ` <j2o29ae894c1004230922le8baf635y563e50e3edc53bc3@mail.gmail.com>
     [not found]       ` <4BD226F4.6070908@emulex.com>
     [not found]         ` <1272109999.2983.30.camel@localhost>
     [not found]           ` <4BD5D258.8030309@emulex.com>
2010-04-26 21:52             ` brem belguebli
2010-04-27 17:37               ` brem belguebli
2010-05-03 16:39                 ` brem belguebli
2010-05-05 14:01                   ` James Smart
2010-05-06 11:06                     ` brem belguebli
2010-05-06 13:39                       ` James Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox