Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Hannes Reinecke <hare@suse.de>
To: Sean Bruno <sean.bruno@dsl-only.net>
Cc: linux-scsi@vger.kernel.org
Subject: Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN
Date: Fri, 20 Oct 2006 09:01:54 +0200	[thread overview]
Message-ID: <45387462.10300@suse.de> (raw)
In-Reply-To: <1161274711.3204.41.camel@home-desk>

[-- Attachment #1: Type: text/plain, Size: 2649 bytes --]

Sean Bruno wrote:
> On Thu, 2006-10-19 at 16:10 +0200, Hannes Reinecke wrote:
>> Sean Bruno wrote:
>>> On Thu, 2006-10-19 at 01:52 -0400, Mike Christie wrote:
>>>> On Wed, 2006-10-18 at 15:32 -0700, Sean Bruno wrote:
>>>>> On Wed, 2006-10-18 at 15:24 -0700, Sean Bruno wrote:
>>>>>> I have had a tough time tracking this one down, however I can say for
>>>>>> certain that the 29320 is really having trouble if a LUN is power
>>>>>> cycled.
>>>>>>
>>>>>> I don't have access to a BUS analyzer right now, but here is my
>>>>>> regression.
>>>>>>
>>>>>> 1.  Hook an external SCSI array/disk to a 29320.
>>>>>> 2.  Power up SCSI array/disk
>>>>>> 3.  Power up PC with 29320.
>>>>>> 4.  When PC has booted, login and test device by creating a file
>>>>>>     system, eg. mkfs /dev/sda (or whatever disk the array is called on
>>>>>>     ur machine).
>>>>>> 5.  Power cycle array/disk
>>>>>> 6.  Retest device with another 'mkfs /dev/sda' ... panic/crash/lock-up
>>>>>> ensues.
>>>>>>
>>>>>>
>>>>>>
>>>>>> This did not happen in 2.6.15.7 but did appear in 2.6.16 and higher.
>>>>>>
>>>> Does this only occur with sg or is that the only way you got a trace? In
>>>> the original bug report you mentioned it occurring with mkfs, but the
>>>> bug oops is from a sg request. Is tdg_2 run while the mkfs is running?
>>> Snippets from 'dmesg' during step 6:
>>>
>>> scsi0: Someone reset channel A
>>> sd 0:0:4:0: Attempting to queue an ABORT message:CDB: 0x28 0x0 0x0 0x0
>>> 0x0 0x80 0x0 0x0 0x80 0x0
>>> Infinite interrupt loop, INTSTAT = 8scsi0: At time of recovery, card was
>>> paused
>> Ah. Hmm. Infinite SCSI interrupt.
>>
>> Maybe someone forgot to clear the status ...
>>
>> Can you try the attached patch?
>>
>> Cheers,
>>
>> Hannes
> 
> Better.  The patch allows me to cycle power on the array exactly once.
> So the new regression is:
> 
> 1.  Hook an external SCSI array/disk to a 29320.
> 2.  Power up SCSI array/disk
> 3.  Power up PC with 29320.
> 4.  When PC has booted, login and test device by creating a file
>     system, eg. mkfs /dev/sda (or whatever disk the array is called on
>     ur machine).
> 5.  Power cycle array/disk
> 6.  Retest device with another 'mkfs /dev/sda'  <-- works just fine!
> 7.  Power cycle array/disk
> 8.  No need to do anything, card dump in dmesg/messages appears and
> device in not useable:
> 
Ok. Not bad. So we have to switch to non-pkt commands after a reset.
Make sense. Care to try the updated patch?

Thanks for all the testing!

Cheers,

Hannes
-- 
Dr. Hannes Reinecke			hare@suse.de
SuSE Linux Products GmbH		S390 & zSeries
Maxfeldstraße 5				+49 911 74053 688
90409 Nürnberg				http://www.suse.de

[-- Attachment #2: aic79xx-external-device-reset --]
[-- Type: text/plain, Size: 3984 bytes --]

diff --git a/drivers/scsi/aic7xxx/aic79xx_core.c b/drivers/scsi/aic7xxx/aic79xx_core.c
index 653818d..555920a 100644
--- a/drivers/scsi/aic7xxx/aic79xx_core.c
+++ b/drivers/scsi/aic7xxx/aic79xx_core.c
@@ -1053,10 +1053,12 @@ #endif
 			 * If a target takes us into the command phase
 			 * assume that it has been externally reset and
 			 * has thus lost our previous packetized negotiation
-			 * agreement.
-			 * Revert to async/narrow transfers until we
-			 * can renegotiate with the device and notify
-			 * the OSM about the reset.
+			 * agreement.  Since we have not sent an identify
+			 * message and may not have fully qualified the
+			 * connection, we change our command to TUR, assert
+			 * ATN and ABORT the task when we go to message in
+			 * phase.  The OSM will see the REQUEUE_REQUEST
+			 * status and retry the command.
 			 */
 			scbid = ahd_get_scbptr(ahd);
 			scb = ahd_lookup_scb(ahd, scbid);
@@ -1083,7 +1085,28 @@ #endif
 			ahd_set_syncrate(ahd, &devinfo, /*period*/0,
 					 /*offset*/0, /*ppr_options*/0,
 					 AHD_TRANS_ACTIVE, /*paused*/TRUE);
-			scb->flags |= SCB_EXTERNAL_RESET;
+			/* Hand-craft TUR command */
+			ahd_outb(ahd, SCB_CDB_STORE, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+1, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+2, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+3, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+4, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+5, 0);
+			ahd_outb(ahd, SCB_CDB_LEN, 6);
+			scb->hscb->control &= ~(TAG_ENB|SCB_TAG_TYPE);
+			scb->hscb->control |= MK_MESSAGE;
+			ahd_outb(ahd, SCB_CONTROL, scb->hscb->control);
+			ahd_outb(ahd, MSG_OUT, HOST_MSG);
+			ahd_outb(ahd, SAVED_SCSIID, scb->hscb->scsiid);
+			/*
+			 * The lun is 0, regardless of the SCB's lun
+			 * as we have not sent an identify message.
+			 */
+			ahd_outb(ahd, SAVED_LUN, 0);
+			ahd_outb(ahd, SEQ_FLAGS, 0);
+			ahd_assert_atn(ahd);
+			scb->flags &= ~SCB_PACKETIZED;
+			scb->flags |= SCB_ABORT|SCB_EXTERNAL_RESET;
 			ahd_freeze_devq(ahd, scb);
 			ahd_set_transaction_status(scb, CAM_REQUEUE_REQ);
 			ahd_freeze_scb(scb);
@@ -1519,8 +1542,10 @@ ahd_handle_scsiint(struct ahd_softc *ahd
 	/*
 	 * Ignore external resets after a bus reset.
 	 */
-	if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE))
+	if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE)) {
+		ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
 		return;
+	}
 
 	/*
 	 * Clear bus reset flag
@@ -2200,6 +2225,22 @@ ahd_handle_nonpkt_busfree(struct ahd_sof
 			if (sent_msg == MSG_ABORT_TAG)
 				tag = SCB_GET_TAG(scb);
 
+			if ((scb->flags & SCB_EXTERNAL_RESET) != 0) {
+				/*
+				 * This abort is in response to an
+				 * unexpected switch to command phase
+				 * for a packetized connection.  Since
+				 * the identify message was never sent,
+				 * "saved lun" is 0.  We really want to
+				 * abort only the SCB that encountered
+				 * this error, which could have a different
+				 * lun.  The SCB will be retried so the OS
+				 * will see the UA after renegotiating to
+				 * packetized.
+				 */
+				tag = SCB_GET_TAG(scb);
+				saved_lun = scb->hscb->lun;
+			}
 			found = ahd_abort_scbs(ahd, target, 'A', saved_lun,
 					       tag, ROLE_INITIATOR,
 					       CAM_REQ_ABORTED);
@@ -7920,6 +7961,11 @@ #endif
 	ahd_clear_fifo(ahd, 1);
 
 	/*
+	 * Clear SCSI interrupt status
+	 */
+	ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
+
+	/*
 	 * Reenable selections
 	 */
 	ahd_outb(ahd, SIMODE1, ahd_inb(ahd, SIMODE1) | ENSCSIRST);
@@ -7952,10 +7998,6 @@ #ifdef AHD_TARGET_MODE
 		}
 	}
 #endif
-	/* Notify the XPT that a bus reset occurred */
-	ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
-		       CAM_LUN_WILDCARD, AC_BUS_RESET);
-
 	/*
 	 * Revert to async/narrow transfers until we renegotiate.
 	 */
@@ -7977,6 +8019,10 @@ #endif
 		}
 	}
 
+	/* Notify the XPT that a bus reset occurred */
+	ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
+		       CAM_LUN_WILDCARD, AC_BUS_RESET);
+
 	ahd_restart(ahd);
 
 	return (found);

next prev parent reply	other threads:[~2006-10-20  7:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-18 22:24 Adaptec 29320 [aic79xx] fails on power cycle of LUN Sean Bruno
2006-10-18 22:27 ` James Bottomley
2006-10-18 22:32 ` Sean Bruno
2006-10-19  5:52   ` Mike Christie
2006-10-19 12:23     ` Sean Bruno
2006-10-19 12:25     ` Sean Bruno
2006-10-19 14:10       ` Hannes Reinecke
2006-10-19 16:18         ` Sean Bruno
2006-10-20  7:01           ` Hannes Reinecke [this message]
2006-10-21 20:48             ` Sean Bruno
2006-10-22  4:45               ` Sean Bruno

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:653818d dfblob:555920a )
 OR (
bs:"Re: Adaptec 29320 [aic79xx] fails on power cycle of LUN" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45387462.10300@suse.de \
    --to=hare@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sean.bruno@dsl-only.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox