public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] aic79xx: Fixup external device reset
@ 2006-10-23 13:22 Hannes Reinecke
  2006-10-24 23:37 ` James Bottomley
  0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2006-10-23 13:22 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List

[-- Attachment #1: Type: text/plain, Size: 465 bytes --]

Hi all,

Whenever we detect an external device reset we have to take care to not 
confuse the SCSI bus. We already handle ML retry, but we have to 
non-packetized command mode as well. And notify the SCSI engine about 
the command abort.
This code is actually from the original adaptec driver.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke			hare@suse.de
SuSE Linux Products GmbH		S390 & zSeries
Maxfeldstraße 5				+49 911 74053 688
90409 Nürnberg				http://www.suse.de

[-- Attachment #2: 0001-aic79xx-Fixup-external-device-reset.txt --]
[-- Type: text/plain, Size: 4863 bytes --]

From: Hannes Reinecke <hare@suse.de>
Date: Mon Oct 23 09:46:17 2006 +0200
Subject: aic79xx: Fixup external device reset

Whenever an external device is resetted we really have to take
care to keep the channel in sync. Just notifying SCSI-ML and retry
is not enough as we have to make sure the SCSI bus is not getting
confused, either.
So whenever we detect an external reset we rewrite the command to
TUR, disable packetized command and notify the internal engine
that an abort has happened. This way we trigger a proper bus
reset sequence and all devices will be renegotiated properly.
Kudos to Justin Gibbs and Luben Tuikov for this idea.

Signed-off-by: Hannes Reinecke <hare@suse.de>

---

 drivers/scsi/aic7xxx/aic79xx_core.c |   66 ++++++++++++++++++++++++++++++-----
 1 files changed, 56 insertions(+), 10 deletions(-)

f23c5c586615db1224e5cc7926b2731c3246011f
diff --git a/drivers/scsi/aic7xxx/aic79xx_core.c b/drivers/scsi/aic7xxx/aic79xx_core.c
index 653818d..555920a 100644
--- a/drivers/scsi/aic7xxx/aic79xx_core.c
+++ b/drivers/scsi/aic7xxx/aic79xx_core.c
@@ -1053,10 +1053,12 @@ #endif
 			 * If a target takes us into the command phase
 			 * assume that it has been externally reset and
 			 * has thus lost our previous packetized negotiation
-			 * agreement.
-			 * Revert to async/narrow transfers until we
-			 * can renegotiate with the device and notify
-			 * the OSM about the reset.
+			 * agreement.  Since we have not sent an identify
+			 * message and may not have fully qualified the
+			 * connection, we change our command to TUR, assert
+			 * ATN and ABORT the task when we go to message in
+			 * phase.  The OSM will see the REQUEUE_REQUEST
+			 * status and retry the command.
 			 */
 			scbid = ahd_get_scbptr(ahd);
 			scb = ahd_lookup_scb(ahd, scbid);
@@ -1083,7 +1085,28 @@ #endif
 			ahd_set_syncrate(ahd, &devinfo, /*period*/0,
 					 /*offset*/0, /*ppr_options*/0,
 					 AHD_TRANS_ACTIVE, /*paused*/TRUE);
-			scb->flags |= SCB_EXTERNAL_RESET;
+			/* Hand-craft TUR command */
+			ahd_outb(ahd, SCB_CDB_STORE, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+1, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+2, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+3, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+4, 0);
+			ahd_outb(ahd, SCB_CDB_STORE+5, 0);
+			ahd_outb(ahd, SCB_CDB_LEN, 6);
+			scb->hscb->control &= ~(TAG_ENB|SCB_TAG_TYPE);
+			scb->hscb->control |= MK_MESSAGE;
+			ahd_outb(ahd, SCB_CONTROL, scb->hscb->control);
+			ahd_outb(ahd, MSG_OUT, HOST_MSG);
+			ahd_outb(ahd, SAVED_SCSIID, scb->hscb->scsiid);
+			/*
+			 * The lun is 0, regardless of the SCB's lun
+			 * as we have not sent an identify message.
+			 */
+			ahd_outb(ahd, SAVED_LUN, 0);
+			ahd_outb(ahd, SEQ_FLAGS, 0);
+			ahd_assert_atn(ahd);
+			scb->flags &= ~SCB_PACKETIZED;
+			scb->flags |= SCB_ABORT|SCB_EXTERNAL_RESET;
 			ahd_freeze_devq(ahd, scb);
 			ahd_set_transaction_status(scb, CAM_REQUEUE_REQ);
 			ahd_freeze_scb(scb);
@@ -1519,8 +1542,10 @@ ahd_handle_scsiint(struct ahd_softc *ahd
 	/*
 	 * Ignore external resets after a bus reset.
 	 */
-	if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE))
+	if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE)) {
+		ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
 		return;
+	}
 
 	/*
 	 * Clear bus reset flag
@@ -2200,6 +2225,22 @@ ahd_handle_nonpkt_busfree(struct ahd_sof
 			if (sent_msg == MSG_ABORT_TAG)
 				tag = SCB_GET_TAG(scb);
 
+			if ((scb->flags & SCB_EXTERNAL_RESET) != 0) {
+				/*
+				 * This abort is in response to an
+				 * unexpected switch to command phase
+				 * for a packetized connection.  Since
+				 * the identify message was never sent,
+				 * "saved lun" is 0.  We really want to
+				 * abort only the SCB that encountered
+				 * this error, which could have a different
+				 * lun.  The SCB will be retried so the OS
+				 * will see the UA after renegotiating to
+				 * packetized.
+				 */
+				tag = SCB_GET_TAG(scb);
+				saved_lun = scb->hscb->lun;
+			}
 			found = ahd_abort_scbs(ahd, target, 'A', saved_lun,
 					       tag, ROLE_INITIATOR,
 					       CAM_REQ_ABORTED);
@@ -7920,6 +7961,11 @@ #endif
 	ahd_clear_fifo(ahd, 1);
 
 	/*
+	 * Clear SCSI interrupt status
+	 */
+	ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
+
+	/*
 	 * Reenable selections
 	 */
 	ahd_outb(ahd, SIMODE1, ahd_inb(ahd, SIMODE1) | ENSCSIRST);
@@ -7952,10 +7998,6 @@ #ifdef AHD_TARGET_MODE
 		}
 	}
 #endif
-	/* Notify the XPT that a bus reset occurred */
-	ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
-		       CAM_LUN_WILDCARD, AC_BUS_RESET);
-
 	/*
 	 * Revert to async/narrow transfers until we renegotiate.
 	 */
@@ -7977,6 +8019,10 @@ #endif
 		}
 	}
 
+	/* Notify the XPT that a bus reset occurred */
+	ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
+		       CAM_LUN_WILDCARD, AC_BUS_RESET);
+
 	ahd_restart(ahd);
 
 	return (found);
-- 
1.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/4] aic79xx: Fixup external device reset
  2006-10-23 13:22 [PATCH 1/4] aic79xx: Fixup external device reset Hannes Reinecke
@ 2006-10-24 23:37 ` James Bottomley
  2006-10-25  2:49   ` Sean Bruno
  2006-10-25  7:01   ` Hannes Reinecke
  0 siblings, 2 replies; 5+ messages in thread
From: James Bottomley @ 2006-10-24 23:37 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: SCSI Mailing List

On Mon, 2006-10-23 at 15:22 +0200, Hannes Reinecke wrote:
> +                       /* Hand-craft TUR command */
> +                       ahd_outb(ahd, SCB_CDB_STORE, 0);
> +                       ahd_outb(ahd, SCB_CDB_STORE+1, 0);
> +                       ahd_outb(ahd, SCB_CDB_STORE+2, 0);
> +                       ahd_outb(ahd, SCB_CDB_STORE+3, 0);
> +                       ahd_outb(ahd, SCB_CDB_STORE+4, 0);
> +                       ahd_outb(ahd, SCB_CDB_STORE+5, 0);
> +                       ahd_outb(ahd, SCB_CDB_LEN, 6);
> +                       scb->hscb->control &= ~(TAG_ENB|SCB_TAG_TYPE);
> +                       scb->hscb->control |= MK_MESSAGE;
> +                       ahd_outb(ahd, SCB_CONTROL, scb->hscb->control);
> +                       ahd_outb(ahd, MSG_OUT, HOST_MSG);
> +                       ahd_outb(ahd, SAVED_SCSIID, scb->hscb->scsiid);

What's the reason for having to have this hand crafted test unit ready?

James



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/4] aic79xx: Fixup external device reset
  2006-10-24 23:37 ` James Bottomley
@ 2006-10-25  2:49   ` Sean Bruno
  2006-10-25  7:01   ` Hannes Reinecke
  1 sibling, 0 replies; 5+ messages in thread
From: Sean Bruno @ 2006-10-25  2:49 UTC (permalink / raw)
  To: James Bottomley; +Cc: Hannes Reinecke, SCSI Mailing List

On Tue, 2006-10-24 at 16:37 -0700, James Bottomley wrote:

This was in response to my reporting a fault when a LUN (either an array
or a disk) goes off-line and then is powered back on.  The error would
not occur in kernels up to 2.6.15.7 but would fail in 2.6.16 and higher.

Prior to this patch, my 29320 would lockup the box with errors like the
following:

------------[ cut here ]------------
kernel BUG at mm/slab.c:594!
invalid opcode: 0000 [#1]
SMP
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc iscsi_tcp
libiscsi scsi_transport_iscsi ipv6 video sbs i2c_ec i2c_core button
battery asus_acpi ac parport_pc lp parport snd_intel8x0 snd_ac97_codec
snd_ac97_bus sg snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer snd
soundcore snd_page_alloc serio_raw ide_cd skge cdrom pcspkr dm_snapshot
dm_zero dm_mirror dm_mod aic79xx scsi_transport_spi sd_mod scsi_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<c0169562>]    Not tainted VLI
EFLAGS: 00010246   (2.6.19-rc2 #1)
EIP is at kmem_cache_free+0x29/0x6d
eax: 00000000   ebx: dffae300   ecx: dff91b80   edx: c1a00000
esi: dffaaf80   edi: 00000000   ebp: d3f324c0   esp: d3fb9dd0
ds: 007b   es: 007b   ss: 0068
Process tdg_2 (pid: 2362, ti=d3fb9000 task=dfd6cd50 task.ti=d3fb9000)
Stack: dffae300 dffaaf80 00000000 c0154448 00000000 d3e09a80 dffaaf80
d3e09a80
       c018bafc 00001000 00000000 c018b822 e088efa0 00001000 00000000
0000000a
       d3fb9ef0 d43f76c8 00003000 00000000 00000001 c130cac8 00008000
00000000
Call Trace:
 [<c0154448>] mempool_free+0x66/0x6b
 [<c018bafc>] bio_free+0x25/0x30
 [<c018b822>] bio_put+0x28/0x29
 [<e088efa0>] scsi_execute_async+0x15f/0x33d [scsi_mod]
 [<e09c9913>] sg_common_write+0x704/0x772 [sg]
 [<e09c9ba6>] sg_new_write+0x225/0x248 [sg]
 [<e09cae45>] sg_write+0x106/0x33a [sg]
 [<c016dae7>] vfs_write+0xa8/0x159
 [<c016e114>] sys_write+0x41/0x67
 [<c0103dc9>] sysenter_past_esp+0x56/0x79
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x79

Leftover inexact backtrace:

 [<c031007b>] sleep_on+0x1e/0x6c
 =======================
Code: 5f c3 89 c1 8d 82 00 00 00 40 c1 e8 0c 57 89 d7 6b d0 28 03 15 00
d6 50 c0 56 53 8b 02 f6 c4 40 74 03 8b 52 0c 8b 02 84 c0 78 08 <0f> 0b
52 02 e6 6b 33 c0 39 4a 20 74 08 0f 0b ca 0d e6 6b 33 c0
EIP: [<c0169562>] kmem_cache_free+0x29/0x6d SS:ESP 0068:d3fb9dd0


You can review my comments on testing in the thread titled:
"Adaptec 29320 [aic79xx] fails on power cycle of LUN"

Hope this clarifies things.

Sean


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/4] aic79xx: Fixup external device reset
  2006-10-24 23:37 ` James Bottomley
  2006-10-25  2:49   ` Sean Bruno
@ 2006-10-25  7:01   ` Hannes Reinecke
  2006-10-25 15:14     ` James Bottomley
  1 sibling, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2006-10-25  7:01 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List, Tarte, Robert

James Bottomley wrote:
> On Mon, 2006-10-23 at 15:22 +0200, Hannes Reinecke wrote:
>> +                       /* Hand-craft TUR command */
>> +                       ahd_outb(ahd, SCB_CDB_STORE, 0);
>> +                       ahd_outb(ahd, SCB_CDB_STORE+1, 0);
>> +                       ahd_outb(ahd, SCB_CDB_STORE+2, 0);
>> +                       ahd_outb(ahd, SCB_CDB_STORE+3, 0);
>> +                       ahd_outb(ahd, SCB_CDB_STORE+4, 0);
>> +                       ahd_outb(ahd, SCB_CDB_STORE+5, 0);
>> +                       ahd_outb(ahd, SCB_CDB_LEN, 6);
>> +                       scb->hscb->control &= ~(TAG_ENB|SCB_TAG_TYPE);
>> +                       scb->hscb->control |= MK_MESSAGE;
>> +                       ahd_outb(ahd, SCB_CONTROL, scb->hscb->control);
>> +                       ahd_outb(ahd, MSG_OUT, HOST_MSG);
>> +                       ahd_outb(ahd, SAVED_SCSIID, scb->hscb->scsiid);
> 
> What's the reason for having to have this hand crafted test unit ready?
> 
I asked myself the same question. It's actually from the original 
adaptec sources, and i couldn't figure out why it was need, either.
That's why I removed it initially when doing the first round of external 
device reset patches.

Thing is whenever we send a SCB with the MK_MESSAGE flag set the 
sequencer will interrupt normal SCB delivery and ensure that the 
MK_MESSAGE SCB is sent immediately. Plus we can force the connection to 
non-packetized transfer as the resetted target will start out with 
normal transfers, too. And as we already have invalidated the 
negotiation settings for all targets a renegotiation will happen when 
this TUR is completed.
So the normal flow of operation can continue and only one command will 
have to be requeued from the midlayer.

I think. Or that's what I've glanced from the sequencer code. Maybe Rob 
can give some more insight here.

Admittedly, this is really a nasty tweaking. In theory we should have a 
proper error handler which handles this sort of thing. But the entire 
code is littered with such tweakings so that's quite a vain hope.
Unless someone passes me a theory of operation document for it.
Having a register description is quite pointless.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke			hare@suse.de
SuSE Linux Products GmbH		S390 & zSeries
Maxfeldstraße 5				+49 911 74053 688
90409 Nürnberg				http://www.suse.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/4] aic79xx: Fixup external device reset
  2006-10-25  7:01   ` Hannes Reinecke
@ 2006-10-25 15:14     ` James Bottomley
  0 siblings, 0 replies; 5+ messages in thread
From: James Bottomley @ 2006-10-25 15:14 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: SCSI Mailing List, Tarte, Robert

On Wed, 2006-10-25 at 09:01 +0200, Hannes Reinecke wrote:
> > What's the reason for having to have this hand crafted test unit ready?
> > 
> I asked myself the same question. It's actually from the original 
> adaptec sources, and i couldn't figure out why it was need, either.
> That's why I removed it initially when doing the first round of external 
> device reset patches.
> 
> Thing is whenever we send a SCB with the MK_MESSAGE flag set the 
> sequencer will interrupt normal SCB delivery and ensure that the 
> MK_MESSAGE SCB is sent immediately. Plus we can force the connection to 
> non-packetized transfer as the resetted target will start out with 
> normal transfers, too. And as we already have invalidated the 
> negotiation settings for all targets a renegotiation will happen when 
> this TUR is completed.
> So the normal flow of operation can continue and only one command will 
> have to be requeued from the midlayer.
> 
> I think. Or that's what I've glanced from the sequencer code. Maybe Rob 
> can give some more insight here.
> 
> Admittedly, this is really a nasty tweaking. In theory we should have a 
> proper error handler which handles this sort of thing. But the entire 
> code is littered with such tweakings so that's quite a vain hope.
> Unless someone passes me a theory of operation document for it.
> Having a register description is quite pointless.

Yes ... that's what I was wondering.  However, we can go with the "it's
magic" for the time being.

James



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-10-25 15:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-23 13:22 [PATCH 1/4] aic79xx: Fixup external device reset Hannes Reinecke
2006-10-24 23:37 ` James Bottomley
2006-10-25  2:49   ` Sean Bruno
2006-10-25  7:01   ` Hannes Reinecke
2006-10-25 15:14     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox