All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG][RFC][PATCH] dpt_i2o driver in 2.4
@ 2005-01-18  5:41 Sakurai Hiroomi
  0 siblings, 0 replies; 3+ messages in thread
From: Sakurai Hiroomi @ 2005-01-18  5:41 UTC (permalink / raw)
  To: linux-scsi; +Cc: hiroomi sakurai

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]

Dear Mark,

Our Linux server uses adaptec ASR-2010S and we are using dpt_i2o driver.
To raise the reliability of the server, we reviewd the dpt_i2o driver.
# The version of driver is 2.4 Build 5 in linux-2.4.28.

We found some problems and questions.

Please see the attached the dpt_i2o_problem_document.txt about the problems. 
And also I made a patch agains the driver(dpt_i2o.diff).

I'm not participated in the linux-scsi mailing list.
Please reply to the following addresses. 

    E-Mail : sakurai_hiro@soft.fujitsu.com

Best Regards
Hiroomi Sakurai


[-- Attachment #2: dpt_i2o_problem_document.txt --]
[-- Type: text/plain, Size: 10793 bytes --]



Please see the dpt_i2o.diff about the correction of each problem.


Problem

1. Problem in adpt_detect() function

   When adpt_install_hba() returns error status, adpt_detect() returns hba_count-1.
   But it's necessary to call adpt_i2o_sys_shutdown() and return 0.

   adpt_detect() : 181 - 191
   ----------------------------------------------------------------------------
   /* search for all Adatpec I2O RAID cards */
   while ((pDev = pci_find_device( PCI_DPT_VENDOR_ID, PCI_ANY_ID, pDev))) {
           if(pDev->device == PCI_DPT_DEVICE_ID ||
              pDev->device == PCI_DPT_RAPTOR_DEVICE_ID){
                   if(adpt_install_hba(sht, pDev) ){
                           PERROR("Could not Init an I2O RAID device\n");
                           PERROR("Will not try to detect others.\n");
                           return hba_count-1;
                   }
           }
   }
   ----------------------------------------------------------------------------


2. Problem in adpt_detect() and adpt_inquiry() function

   adpt_inquiry() doesn't return any value even if it detects any errors.
   Please add error status as a return value.
   When caller receives error from adpt_inquiry(), caller should check the
   return value.
   For example, when adpt_detect() receives error status, it should check
   the value.

   adpt_detect() : 238 - 242
   ----------------------------------------------------------------------------
   if (adpt_i2o_parse_lct(pHba) < 0){
           adpt_i2o_delete_hba(pHba);
           continue;
   }
   adpt_inquiry(pHba);
   ----------------------------------------------------------------------------

   adpt_inquiry() : 277 - 355
   ----------------------------------------------------------------------------
   static void adpt_inquiry(adpt_hba* pHba)
   {

   (omission)

   }
   ----------------------------------------------------------------------------


3. Problem in adpt_proc_info() function

   adpt_proc_info() doesn't check 'd->pScsi_dev' is null or not.
   It cause panic when 'd->pScsi_dev' is null.
   It is necessary to check it.

   adpt_proc_info() : 577 - 583
   ----------------------------------------------------------------------------
   for(chan = 0; chan < MAX_CHANNEL; chan++) {
           for(id = 0; id < MAX_ID; id++) {
                   d = pHba->channel[chan].device[id];
                   while(d){
                           len += sprintf(buffer+len,"\t%-24.24s", d->pScsi_dev->vendor);
                           len += sprintf(buffer+len," Rev: %-8.8s\n", d->pScsi_dev->rev);
                           pos = begin + len;
   ----------------------------------------------------------------------------


4. Problem in adpt_install_hba() function

   adpt_install_hba() find the emplty hbas[i] by 'for (i=0;i<DPTI_MAX_HBA;i++)'.
   But even if there are no emply hbas[], it continues the operaiton.
   This causes device name 'dpti16' for all devices exceeds 17 cards.
   You should remove the limitation of card number(DPTI_MAX_HBA=16).
   (At least you should add the error sequence to fail exceeds 17.)

   adpt_install_hba() : 929 - 934
   ----------------------------------------------------------------------------
   for(i=0;i<DPTI_MAX_HBA;i++) {
           if(hbas[i]==NULL) {
                   hbas[i]=pHba;
                   break;
           }
   }
   ----------------------------------------------------------------------------


5. Problem in adpt_find_device() function

   In adpt_find_device(), you should check 'id' is smaller than 'MAX_ID'.

   adpt_find_device() : 1087 - 1088
   ----------------------------------------------------------------------------
   if(chan < 0 || chan >= MAX_CHANNEL)
           return NULL;
   ----------------------------------------------------------------------------


6. Problem in adpt_i2o_reparse_lct() function

   In adpt_i2o_reparse_lct(), you check 'bus_no' is smaller than 'MAX_CHANNEL',
   but you use 'bus_no' before checking.
   You should move the sequence of checking before using.
   Also you should check the check 'scsi_id' is smaller than 'MAX_ID'.

   adpt_i2o_reparse_lct() : 2410 - 2442
   ----------------------------------------------------------------------------
   bus_no = buf[0]>>16;
   scsi_id = buf[1];
   scsi_lun = (buf[2]>>8 )&0xff;
   pDev = pHba->channel[bus_no].device[scsi_id];

   (omission)

   if(bus_no >= MAX_CHANNEL) {     // Something wrong skip it
           printk(KERN_WARNING"%s: Channel number %d out of range \n", pHba->name, bus_no);
           continue;
   }
   pDev = pHba->channel[bus_no].device[scsi_id];
   ----------------------------------------------------------------------------


7. Problem in adpt_i2o_reparse_lct() function

   In adpt_i2o_reparse_lct(), you check 'pDev->pScsi_dev' is null or not, 
   but you use 'pDev->pScsi_dev' before checking.
   You shoud move the sequence ot checking before using.

   adpt_i2o_reparse_lct() : 2480 - 2486
   ----------------------------------------------------------------------------
   if(pDev->pScsi_dev->online == FALSE) {
           printk(KERN_WARNING"%s: Setting device (%d,%d,%d) back online\n",
                           pHba->name,bus_no,scsi_id,scsi_lun);
           if (pDev->pScsi_dev) {
                   pDev->pScsi_dev->online = TRUE;
           }
   }
   ----------------------------------------------------------------------------


8. Problem in adpt_i2o_lct_get() function

   adpt_i2o_lct_get() continues the operation even of adpt_i2o_query_scalar()
   returns with error status.
   It's necessary return -1 when error detects.

   adpt_i2o_lct_get() : 2924 - 2936
   ----------------------------------------------------------------------------
   // I2O_DPT_EXEC_IOP_BUFFERS_GROUP_NO;
   if(adpt_i2o_query_scalar(pHba, 0 , 0x8000, -1, buf, sizeof(buf))>=0) {
           pHba->FwDebugBufferSize = buf[1];
           pHba->FwDebugBuffer_P    = pHba->base_addr_virt + buf[0];
           pHba->FwDebugFlags_P     = pHba->FwDebugBuffer_P + FW_DEBUG_FLAGS_OFFSET;
           pHba->FwDebugBLEDvalue_P = pHba->FwDebugBuffer_P + FW_DEBUG_BLED_OFFSET;
           pHba->FwDebugBLEDflag_P  = pHba->FwDebugBLEDvalue_P + 1;
           pHba->FwDebugStrLength_P = pHba->FwDebugBuffer_P + FW_DEBUG_STR_LENGTH_OFFSET;
           pHba->FwDebugBuffer_P += buf[2];
           pHba->FwDebugFlags = 0;
   }

   return 0;
   ----------------------------------------------------------------------------


Question

1. Question in adpt_queue() function

   In adpt_queue(), you check 'cmd->device->hostdata' is null or not every
   time. But this sequence seems redundant.
   I think this operation can move in adpt_detect().
   'pDev->pScsi_dev=cmd->device' aslo seems redundant.


   adpt_queue() : 422 - 439
   ----------------------------------------------------------------------------
   // TODO if the cmd->device if offline then I may need to issue a bus rescan
   // followed by a get_lct to see if the device is there anymore
   if((pDev = (struct adpt_device*) (cmd->device->hostdata)) == NULL) {
           /*
            * First command request for this device.  Set up a pointer
            * to the device structure.  This should be a TEST_UNIT_READY
            * command from scan_scsis_single.
            */
           if ((pDev = adpt_find_device(pHba, (u32)cmd->channel, (u32)cmd->target, (u32)cmd-> lun)) == NULL) {
                   // TODO: if any luns are at this bus, scsi id then fake a TEST_UNIT_READY and INQUIRY response
                   // with type 7F (for all luns less than the max for this bus,id) so the lun scan will continue.
                   cmd->result = (DID_NO_CONNECT << 16);
                   cmd->scsi_done(cmd);
                   return 0;
           }
           (struct adpt_device*)(cmd->device->hostdata) = pDev;
   }
   pDev->pScsi_dev = cmd->device;
   ----------------------------------------------------------------------------


2. Question in adpt_i2o_to_scsi() function

   Why adpt_i2o_to_scsi() returns DID_TIMEOUT when sense code is
   DATA_PROTECT and 'class 7'?


   adpt_i2o_to_scsi() : 2306 - 2322
   ----------------------------------------------------------------------------
   // copy over the request sense data if it was a check
   // condition status
   if(dev_status == 0x02 /*CHECK_CONDITION*/) {
           u32 len = sizeof(cmd->sense_buffer);
           len = (len > 40) ?  40 : len;
           // Copy over the sense data
           memcpy(cmd->sense_buffer, (void*)(reply+28) , len);
           if(cmd->sense_buffer[0] == 0x70 /* class 7 */ &&
              cmd->sense_buffer[2] == DATA_PROTECT ){
                   /* This is to handle an array failed */
                   cmd->result = (DID_TIME_OUT << 16);
                   printk(KERN_WARNING"%s: SCSI Data Protect-Device (%d,%d,%d) hba_status=0x%x, dev_status=0x%x, cmd=0x%x\n",
                           pHba->name, (u32)cmd->channel, (u32)cmd->target, (u32)cmd->lun,
                           hba_status, dev_status, cmd->cmnd[0]);

            }
   }
   ----------------------------------------------------------------------------


3. Question in adpt_i2o_build_sys_table() function

   adpt_i2o_build_systable() continues the operation even if adpt_i2o_status_get()
   returns the error. This cause the diferrence between the number of actual HBAs(hba_count) and
   and the number of sys_tbl entry(sys_tbl->num_entries).
   Does this cause some kind of problem?

   adpt_i2o_build_sys_table() : 2961 - 2981
   ----------------------------------------------------------------------------
   for(pHba = hba_chain; pHba; pHba = pHba->next) {
           // Get updated Status Block so we have the latest information
           if (adpt_i2o_status_get(pHba)) {
                   sys_tbl->num_entries--;
                   continue; // try next one
           }

           sys_tbl->iops[count].org_id = pHba->status_block->org_id;
           sys_tbl->iops[count].iop_id = pHba->unit + 2;
           sys_tbl->iops[count].seg_num = 0;
           sys_tbl->iops[count].i2o_version = pHba->status_block->i2o_version;
           sys_tbl->iops[count].iop_state = pHba->status_block->iop_state;
           sys_tbl->iops[count].msg_type = pHba->status_block->msg_type;
           sys_tbl->iops[count].frame_size = pHba->status_block->inbound_frame_size;
           sys_tbl->iops[count].last_changed = sys_tbl_ind - 1; // ??
           sys_tbl->iops[count].iop_capabilities = pHba->status_block->iop_capabilities;
           sys_tbl->iops[count].inbound_low = (u32)virt_to_bus((void*)pHba->post_port);
           sys_tbl->iops[count].inbound_high = (u32)((u64)virt_to_bus((void*)pHba->post_port)>>32);

           count++;
   }
   ----------------------------------------------------------------------------



[-- Attachment #3: dpt_i2o.diff --]
[-- Type: application/octet-stream, Size: 6299 bytes --]

diff -Nur linux-2.4.28.org/drivers/scsi/dpt_i2o.c linux-2.4.28/drivers/scsi/dpt_i2o.c
--- linux-2.4.28.org/drivers/scsi/dpt_i2o.c	2004-11-17 20:54:21.000000000 +0900
+++ linux-2.4.28/drivers/scsi/dpt_i2o.c	2005-01-18 11:42:56.000000000 +0900

/*                  */
/* Problem Number 1 */
/*                  */
************************************************************************************
@@ -185,7 +185,8 @@
 			if(adpt_install_hba(sht, pDev) ){
 				PERROR("Could not Init an I2O RAID device\n");
 				PERROR("Will not try to detect others.\n");
-				return hba_count-1;
+				adpt_i2o_sys_shutdown();
+				return 0;
 			}
 		}
 	}
************************************************************************************

/*                  */
/* Problem Number 2 */
/*                  */
************************************************************************************
@@ -239,7 +240,11 @@
 			adpt_i2o_delete_hba(pHba);
 			continue;
 		}
-		adpt_inquiry(pHba);
+
+		if (adpt_inquiry(pHba) < 0) {
+			adpt_i2o_sys_shutdown();
+			return 0;
+		}
 	}
 
 	for (pHba = hba_chain; pHba; pHba = pHba->next) {
@@ -274,7 +279,7 @@
 }
 
 
-static void adpt_inquiry(adpt_hba* pHba)
+static int adpt_inquiry(adpt_hba* pHba)
 {
 	u32 msg[14]; 
 	u32 *mptr;
@@ -291,7 +296,7 @@
 	buf = (u8*)kmalloc(80,GFP_KERNEL|ADDR32);
 	if(!buf){
 		printk(KERN_ERR"%s: Could not allocate buffer\n",pHba->name);
-		return;
+		return -1;
 	}
 	memset((void*)buf, 0, 36);
 	
@@ -351,7 +356,7 @@
 	}
 	kfree(buf);
 	adpt_i2o_status_get(pHba);
-	return ;
+	return 0;
 }
************************************************************************************
 
/*                  */
/* Problem Number 3 */
/*                  */
************************************************************************************
@@ -578,6 +583,11 @@
 		for(id = 0; id < MAX_ID; id++) {
 			d = pHba->channel[chan].device[id];
 			while(d){
+				if (d->pScsi_dev == NULL) {
+					d = d->next_lun;
+					continue;
+				}
+
 				len += sprintf(buffer+len,"\t%-24.24s", d->pScsi_dev->vendor);
 				len += sprintf(buffer+len," Rev: %-8.8s\n", d->pScsi_dev->rev);
 				pos = begin + len;
************************************************************************************

/*                  */
/* Problem Number 4 */
/*                  */
************************************************************************************
@@ -933,6 +943,11 @@
 		}
 	}
 
+	if(i >= DPTI_MAX_HBA){
+		printk(KERN_ERR"Couldn't register for hbas[]\n");
+		return -EINVAL;
+	}
+
 	if(hba_chain != NULL){
 		for(p = hba_chain; p->next; p = p->next);
 		p->next = pHba;
************************************************************************************

/*                  */
/* Problem Number 5 */
/*                  */
************************************************************************************
@@ -1092,6 +1107,9 @@
 		return NULL;
 	}
 
+	if (id < 0 || id >= MAX_ID)
+		return NULL;
+
 	d = pHba->channel[chan].device[id];
 	if(!d || d->tid == 0) {
 		return NULL;
************************************************************************************

/*                  */
/* Problem Number 6 */
/*                  */
************************************************************************************
@@ -2410,6 +2428,14 @@
 			bus_no = buf[0]>>16;
 			scsi_id = buf[1];
 			scsi_lun = (buf[2]>>8 )&0xff;
+			if(bus_no >= MAX_CHANNEL) {     // Something wrong skip it
+				printk(KERN_WARNING"%s: Channel number %d out of range \n", pHba->name, bus_no);
+				continue;
+			}
+			if(scsi_id >= MAX_ID) {
+				printk(KERN_WARNING"%s: Id number %d out of range \n", pHba->name, scsi_id);
+				continue;
+			}
 			pDev = pHba->channel[bus_no].device[scsi_id];
 			/* da lun */
 			while(pDev) {
@@ -2435,10 +2461,6 @@
 				adpt_i2o_report_hba_unit(pHba, d);
 				adpt_i2o_install_device(pHba, d);
 	
-				if(bus_no >= MAX_CHANNEL) {	// Something wrong skip it
-					printk(KERN_WARNING"%s: Channel number %d out of range \n", pHba->name, bus_no);
-					continue;
-				}
 				pDev = pHba->channel[bus_no].device[scsi_id];	
 				if( pDev == NULL){
 					pDev =  kmalloc(sizeof(struct adpt_device),GFP_KERNEL);
************************************************************************************

/*                  */
/* Problem Number 7 */
/*                  */
************************************************************************************
@@ -2477,6 +2499,10 @@
 			// We found an old device - check it
 			while(pDev) {
 				if(pDev->scsi_lun == scsi_lun) {
+					if(pDev->pScsi_dev == NULL) {
+						pDev = pDev->next_lun;
+						continue;
+					}
 					if(pDev->pScsi_dev->online == FALSE) {
 						printk(KERN_WARNING"%s: Setting device (%d,%d,%d) back online\n",
 								pHba->name,bus_no,scsi_id,scsi_lun);
************************************************************************************

/*                  */
/* Problem Number 8 */
/*                  */
************************************************************************************
@@ -2931,6 +2957,8 @@
 		pHba->FwDebugStrLength_P = pHba->FwDebugBuffer_P + FW_DEBUG_STR_LENGTH_OFFSET;
 		pHba->FwDebugBuffer_P += buf[2]; 
 		pHba->FwDebugFlags = 0;
+	} else {
+		return -1;
 	}
 
 	return 0;
************************************************************************************

diff -Nur linux-2.4.28.org/drivers/scsi/dpti.h linux-2.4.28/drivers/scsi/dpti.h
--- linux-2.4.28.org/drivers/scsi/dpti.h	2001-09-08 01:28:38.000000000 +0900
+++ linux-2.4.28/drivers/scsi/dpti.h	2005-01-18 11:43:46.000000000 +0900

/*                  */
/* Problem Number 2 */
/*                  */
************************************************************************************
@@ -348,7 +348,7 @@
 static s32 adpt_send_nop(adpt_hba*pHba,u32 m);
 static void adpt_i2o_delete_hba(adpt_hba* pHba);
 static void adpt_select_queue_depths(struct Scsi_Host *host, Scsi_Device * devicelist);
-static void adpt_inquiry(adpt_hba* pHba);
+static int adpt_inquiry(adpt_hba* pHba);
 static void adpt_fail_posted_scbs(adpt_hba* pHba);
 static struct adpt_device* adpt_find_device(adpt_hba* pHba, u32 chan, u32 id, u32 lun);
 static int adpt_install_hba(Scsi_Host_Template* sht, struct pci_dev* pDev) ;
************************************************************************************

^ permalink raw reply	[flat|nested] 3+ messages in thread
* RE: [BUG][RFC][PATCH] dpt_i2o driver in 2.4
@ 2005-01-18 15:11 Salyzyn, Mark
  2005-01-20  8:21 ` Sakurai Hiroomi
  0 siblings, 1 reply; 3+ messages in thread
From: Salyzyn, Mark @ 2005-01-18 15:11 UTC (permalink / raw)
  To: Sakurai Hiroomi; +Cc: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 3761 bytes --]

I am pouring through the inspection. The Adaptec branch (which works in
both 2.4 and 2.6 kernels, and in 64 bit architectures) of the driver has
been patched to some of these suggestions and has been included for your
information.

1) The driver is capable of instantiating a subset of the available
adapters, doing the adpt_i2o_sys_shutdown and/or returning 0 on a single
failure to instantiate would break that feature. The Adapter has not
gone through startup until after adpt_i2o_activate_hba, there is no
reason to do an adpt_i2o_sys_shutdown. The linux community may not wish
that a driver instantiate a subset of the hardware without failing, so
your point may be entirely valid; I am hoping for some community
comment.

2) A valid concern, but the architectural decision at this point was
that adpt_inquiry would never fail. And if it did, it still did not
represent a reason to fail the adapter in the absence of all other
failures. It was an `informational' request for an adapter name. We had
versions of (internal) firmware that failed this call and still
functioned properly in all other respects.

3) Our latest driver has this patch in another form regarding the check
for pScsi_dev null. This patch is a requirement; we have seen this
failure mode.

4) Good catch, I have added your patch that prevents falling through to
the instantiating code when exceeding the maximum number of adapters to
the Adaptec branch. I agree, a better choice is to have no limits, but I
was bit once by a pci bus that had *multitudes* of the adapter
replicated :-) The management applications have the same hard-coded
limit (that is no reason for the driver to match it though)

5) Good catch, I had added your patch that checks the id parameter.

6) Good catch. We have never experienced this possible data corruption
from the adapter, but one can never be too paranoid in this case since
it is a subsystem that could misbehave. Checking the bus_no and scsi_id
is highly recommended and has been added to the Adaptec Branch of the
driver (except that I switched to the C style comment).

7) Out latest driver already has this patch in another form regarding
the check for pScsi_dev null. This patch is a requirement; we have seen
this failure mode.

8) It is not a failure to be unable to acquire the blinkLED or debug
buffers from the adapter. By itself, this failure does not justify
failing the adapter, especially since there are older (DPT PM2554)
adapters that did not have this scalar.

Sincerely -- Mark Salyzyn

-----Original Message-----
From: Salyzyn, Mark 
Sent: Tuesday, January 18, 2005 8:34 AM
To: 'Sakurai Hiroomi'
Subject: RE: [BUG][RFC][PATCH] dpt_i2o driver in 2.4

The driver Adaptec has in its maintained branch is enclosed. I will take
the inspection report and apply it to this driver.

Sincerely -- Mark Salyzyn

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Sakurai Hiroomi
Sent: Tuesday, January 18, 2005 12:41 AM
To: linux-scsi@vger.kernel.org
Cc: hiroomi sakurai
Subject: [BUG][RFC][PATCH] dpt_i2o driver in 2.4

Dear Mark,

Our Linux server uses adaptec ASR-2010S and we are using dpt_i2o driver.
To raise the reliability of the server, we reviewd the dpt_i2o driver.
# The version of driver is 2.4 Build 5 in linux-2.4.28.

We found some problems and questions.

Please see the attached the dpt_i2o_problem_document.txt about the
problems. 
And also I made a patch agains the driver(dpt_i2o.diff).

I'm not participated in the linux-scsi mailing list.
Please reply to the following addresses. 

    E-Mail : sakurai_hiro@soft.fujitsu.com

Best Regards
Hiroomi Sakurai


[-- Attachment #2: dpt_i2o-2.5.0-2380.tgz --]
[-- Type: application/x-compressed, Size: 59305 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-01-20  8:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-18  5:41 [BUG][RFC][PATCH] dpt_i2o driver in 2.4 Sakurai Hiroomi
  -- strict thread matches above, loose matches on Subject: below --
2005-01-18 15:11 Salyzyn, Mark
2005-01-20  8:21 ` Sakurai Hiroomi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.