* [PATCH 1/4] aic79xx: Fixup external device reset
@ 2006-10-23 13:22 Hannes Reinecke
2006-10-24 23:37 ` James Bottomley
0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2006-10-23 13:22 UTC (permalink / raw)
To: James Bottomley; +Cc: SCSI Mailing List
[-- Attachment #1: Type: text/plain, Size: 465 bytes --]
Hi all,
Whenever we detect an external device reset we have to take care to not
confuse the SCSI bus. We already handle ML retry, but we have to
non-packetized command mode as well. And notify the SCSI engine about
the command abort.
This code is actually from the original adaptec driver.
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux Products GmbH S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
[-- Attachment #2: 0001-aic79xx-Fixup-external-device-reset.txt --]
[-- Type: text/plain, Size: 4863 bytes --]
From: Hannes Reinecke <hare@suse.de>
Date: Mon Oct 23 09:46:17 2006 +0200
Subject: aic79xx: Fixup external device reset
Whenever an external device is resetted we really have to take
care to keep the channel in sync. Just notifying SCSI-ML and retry
is not enough as we have to make sure the SCSI bus is not getting
confused, either.
So whenever we detect an external reset we rewrite the command to
TUR, disable packetized command and notify the internal engine
that an abort has happened. This way we trigger a proper bus
reset sequence and all devices will be renegotiated properly.
Kudos to Justin Gibbs and Luben Tuikov for this idea.
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/scsi/aic7xxx/aic79xx_core.c | 66 ++++++++++++++++++++++++++++++-----
1 files changed, 56 insertions(+), 10 deletions(-)
f23c5c586615db1224e5cc7926b2731c3246011f
diff --git a/drivers/scsi/aic7xxx/aic79xx_core.c b/drivers/scsi/aic7xxx/aic79xx_core.c
index 653818d..555920a 100644
--- a/drivers/scsi/aic7xxx/aic79xx_core.c
+++ b/drivers/scsi/aic7xxx/aic79xx_core.c
@@ -1053,10 +1053,12 @@ #endif
* If a target takes us into the command phase
* assume that it has been externally reset and
* has thus lost our previous packetized negotiation
- * agreement.
- * Revert to async/narrow transfers until we
- * can renegotiate with the device and notify
- * the OSM about the reset.
+ * agreement. Since we have not sent an identify
+ * message and may not have fully qualified the
+ * connection, we change our command to TUR, assert
+ * ATN and ABORT the task when we go to message in
+ * phase. The OSM will see the REQUEUE_REQUEST
+ * status and retry the command.
*/
scbid = ahd_get_scbptr(ahd);
scb = ahd_lookup_scb(ahd, scbid);
@@ -1083,7 +1085,28 @@ #endif
ahd_set_syncrate(ahd, &devinfo, /*period*/0,
/*offset*/0, /*ppr_options*/0,
AHD_TRANS_ACTIVE, /*paused*/TRUE);
- scb->flags |= SCB_EXTERNAL_RESET;
+ /* Hand-craft TUR command */
+ ahd_outb(ahd, SCB_CDB_STORE, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+1, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+2, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+3, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+4, 0);
+ ahd_outb(ahd, SCB_CDB_STORE+5, 0);
+ ahd_outb(ahd, SCB_CDB_LEN, 6);
+ scb->hscb->control &= ~(TAG_ENB|SCB_TAG_TYPE);
+ scb->hscb->control |= MK_MESSAGE;
+ ahd_outb(ahd, SCB_CONTROL, scb->hscb->control);
+ ahd_outb(ahd, MSG_OUT, HOST_MSG);
+ ahd_outb(ahd, SAVED_SCSIID, scb->hscb->scsiid);
+ /*
+ * The lun is 0, regardless of the SCB's lun
+ * as we have not sent an identify message.
+ */
+ ahd_outb(ahd, SAVED_LUN, 0);
+ ahd_outb(ahd, SEQ_FLAGS, 0);
+ ahd_assert_atn(ahd);
+ scb->flags &= ~SCB_PACKETIZED;
+ scb->flags |= SCB_ABORT|SCB_EXTERNAL_RESET;
ahd_freeze_devq(ahd, scb);
ahd_set_transaction_status(scb, CAM_REQUEUE_REQ);
ahd_freeze_scb(scb);
@@ -1519,8 +1542,10 @@ ahd_handle_scsiint(struct ahd_softc *ahd
/*
* Ignore external resets after a bus reset.
*/
- if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE))
+ if (((status & SCSIRSTI) != 0) && (ahd->flags & AHD_BUS_RESET_ACTIVE)) {
+ ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
return;
+ }
/*
* Clear bus reset flag
@@ -2200,6 +2225,22 @@ ahd_handle_nonpkt_busfree(struct ahd_sof
if (sent_msg == MSG_ABORT_TAG)
tag = SCB_GET_TAG(scb);
+ if ((scb->flags & SCB_EXTERNAL_RESET) != 0) {
+ /*
+ * This abort is in response to an
+ * unexpected switch to command phase
+ * for a packetized connection. Since
+ * the identify message was never sent,
+ * "saved lun" is 0. We really want to
+ * abort only the SCB that encountered
+ * this error, which could have a different
+ * lun. The SCB will be retried so the OS
+ * will see the UA after renegotiating to
+ * packetized.
+ */
+ tag = SCB_GET_TAG(scb);
+ saved_lun = scb->hscb->lun;
+ }
found = ahd_abort_scbs(ahd, target, 'A', saved_lun,
tag, ROLE_INITIATOR,
CAM_REQ_ABORTED);
@@ -7920,6 +7961,11 @@ #endif
ahd_clear_fifo(ahd, 1);
/*
+ * Clear SCSI interrupt status
+ */
+ ahd_outb(ahd, CLRSINT1, CLRSCSIRSTI);
+
+ /*
* Reenable selections
*/
ahd_outb(ahd, SIMODE1, ahd_inb(ahd, SIMODE1) | ENSCSIRST);
@@ -7952,10 +7998,6 @@ #ifdef AHD_TARGET_MODE
}
}
#endif
- /* Notify the XPT that a bus reset occurred */
- ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
- CAM_LUN_WILDCARD, AC_BUS_RESET);
-
/*
* Revert to async/narrow transfers until we renegotiate.
*/
@@ -7977,6 +8019,10 @@ #endif
}
}
+ /* Notify the XPT that a bus reset occurred */
+ ahd_send_async(ahd, devinfo.channel, CAM_TARGET_WILDCARD,
+ CAM_LUN_WILDCARD, AC_BUS_RESET);
+
ahd_restart(ahd);
return (found);
--
1.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/4] aic79xx: Fixup external device reset
2006-10-23 13:22 [PATCH 1/4] aic79xx: Fixup external device reset Hannes Reinecke
@ 2006-10-24 23:37 ` James Bottomley
2006-10-25 2:49 ` Sean Bruno
2006-10-25 7:01 ` Hannes Reinecke
0 siblings, 2 replies; 5+ messages in thread
From: James Bottomley @ 2006-10-24 23:37 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: SCSI Mailing List
On Mon, 2006-10-23 at 15:22 +0200, Hannes Reinecke wrote:
> + /* Hand-craft TUR command */
> + ahd_outb(ahd, SCB_CDB_STORE, 0);
> + ahd_outb(ahd, SCB_CDB_STORE+1, 0);
> + ahd_outb(ahd, SCB_CDB_STORE+2, 0);
> + ahd_outb(ahd, SCB_CDB_STORE+3, 0);
> + ahd_outb(ahd, SCB_CDB_STORE+4, 0);
> + ahd_outb(ahd, SCB_CDB_STORE+5, 0);
> + ahd_outb(ahd, SCB_CDB_LEN, 6);
> + scb->hscb->control &= ~(TAG_ENB|SCB_TAG_TYPE);
> + scb->hscb->control |= MK_MESSAGE;
> + ahd_outb(ahd, SCB_CONTROL, scb->hscb->control);
> + ahd_outb(ahd, MSG_OUT, HOST_MSG);
> + ahd_outb(ahd, SAVED_SCSIID, scb->hscb->scsiid);
What's the reason for having to have this hand crafted test unit ready?
James
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/4] aic79xx: Fixup external device reset
2006-10-24 23:37 ` James Bottomley
@ 2006-10-25 2:49 ` Sean Bruno
2006-10-25 7:01 ` Hannes Reinecke
1 sibling, 0 replies; 5+ messages in thread
From: Sean Bruno @ 2006-10-25 2:49 UTC (permalink / raw)
To: James Bottomley; +Cc: Hannes Reinecke, SCSI Mailing List
On Tue, 2006-10-24 at 16:37 -0700, James Bottomley wrote:
This was in response to my reporting a fault when a LUN (either an array
or a disk) goes off-line and then is powered back on. The error would
not occur in kernels up to 2.6.15.7 but would fail in 2.6.16 and higher.
Prior to this patch, my 29320 would lockup the box with errors like the
following:
------------[ cut here ]------------
kernel BUG at mm/slab.c:594!
invalid opcode: 0000 [#1]
SMP
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc iscsi_tcp
libiscsi scsi_transport_iscsi ipv6 video sbs i2c_ec i2c_core button
battery asus_acpi ac parport_pc lp parport snd_intel8x0 snd_ac97_codec
snd_ac97_bus sg snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm floppy snd_timer snd
soundcore snd_page_alloc serio_raw ide_cd skge cdrom pcspkr dm_snapshot
dm_zero dm_mirror dm_mod aic79xx scsi_transport_spi sd_mod scsi_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd
CPU: 0
EIP: 0060:[<c0169562>] Not tainted VLI
EFLAGS: 00010246 (2.6.19-rc2 #1)
EIP is at kmem_cache_free+0x29/0x6d
eax: 00000000 ebx: dffae300 ecx: dff91b80 edx: c1a00000
esi: dffaaf80 edi: 00000000 ebp: d3f324c0 esp: d3fb9dd0
ds: 007b es: 007b ss: 0068
Process tdg_2 (pid: 2362, ti=d3fb9000 task=dfd6cd50 task.ti=d3fb9000)
Stack: dffae300 dffaaf80 00000000 c0154448 00000000 d3e09a80 dffaaf80
d3e09a80
c018bafc 00001000 00000000 c018b822 e088efa0 00001000 00000000
0000000a
d3fb9ef0 d43f76c8 00003000 00000000 00000001 c130cac8 00008000
00000000
Call Trace:
[<c0154448>] mempool_free+0x66/0x6b
[<c018bafc>] bio_free+0x25/0x30
[<c018b822>] bio_put+0x28/0x29
[<e088efa0>] scsi_execute_async+0x15f/0x33d [scsi_mod]
[<e09c9913>] sg_common_write+0x704/0x772 [sg]
[<e09c9ba6>] sg_new_write+0x225/0x248 [sg]
[<e09cae45>] sg_write+0x106/0x33a [sg]
[<c016dae7>] vfs_write+0xa8/0x159
[<c016e114>] sys_write+0x41/0x67
[<c0103dc9>] sysenter_past_esp+0x56/0x79
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x79
Leftover inexact backtrace:
[<c031007b>] sleep_on+0x1e/0x6c
=======================
Code: 5f c3 89 c1 8d 82 00 00 00 40 c1 e8 0c 57 89 d7 6b d0 28 03 15 00
d6 50 c0 56 53 8b 02 f6 c4 40 74 03 8b 52 0c 8b 02 84 c0 78 08 <0f> 0b
52 02 e6 6b 33 c0 39 4a 20 74 08 0f 0b ca 0d e6 6b 33 c0
EIP: [<c0169562>] kmem_cache_free+0x29/0x6d SS:ESP 0068:d3fb9dd0
You can review my comments on testing in the thread titled:
"Adaptec 29320 [aic79xx] fails on power cycle of LUN"
Hope this clarifies things.
Sean
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/4] aic79xx: Fixup external device reset
2006-10-24 23:37 ` James Bottomley
2006-10-25 2:49 ` Sean Bruno
@ 2006-10-25 7:01 ` Hannes Reinecke
2006-10-25 15:14 ` James Bottomley
1 sibling, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2006-10-25 7:01 UTC (permalink / raw)
To: James Bottomley; +Cc: SCSI Mailing List, Tarte, Robert
James Bottomley wrote:
> On Mon, 2006-10-23 at 15:22 +0200, Hannes Reinecke wrote:
>> + /* Hand-craft TUR command */
>> + ahd_outb(ahd, SCB_CDB_STORE, 0);
>> + ahd_outb(ahd, SCB_CDB_STORE+1, 0);
>> + ahd_outb(ahd, SCB_CDB_STORE+2, 0);
>> + ahd_outb(ahd, SCB_CDB_STORE+3, 0);
>> + ahd_outb(ahd, SCB_CDB_STORE+4, 0);
>> + ahd_outb(ahd, SCB_CDB_STORE+5, 0);
>> + ahd_outb(ahd, SCB_CDB_LEN, 6);
>> + scb->hscb->control &= ~(TAG_ENB|SCB_TAG_TYPE);
>> + scb->hscb->control |= MK_MESSAGE;
>> + ahd_outb(ahd, SCB_CONTROL, scb->hscb->control);
>> + ahd_outb(ahd, MSG_OUT, HOST_MSG);
>> + ahd_outb(ahd, SAVED_SCSIID, scb->hscb->scsiid);
>
> What's the reason for having to have this hand crafted test unit ready?
>
I asked myself the same question. It's actually from the original
adaptec sources, and i couldn't figure out why it was need, either.
That's why I removed it initially when doing the first round of external
device reset patches.
Thing is whenever we send a SCB with the MK_MESSAGE flag set the
sequencer will interrupt normal SCB delivery and ensure that the
MK_MESSAGE SCB is sent immediately. Plus we can force the connection to
non-packetized transfer as the resetted target will start out with
normal transfers, too. And as we already have invalidated the
negotiation settings for all targets a renegotiation will happen when
this TUR is completed.
So the normal flow of operation can continue and only one command will
have to be requeued from the midlayer.
I think. Or that's what I've glanced from the sequencer code. Maybe Rob
can give some more insight here.
Admittedly, this is really a nasty tweaking. In theory we should have a
proper error handler which handles this sort of thing. But the entire
code is littered with such tweakings so that's quite a vain hope.
Unless someone passes me a theory of operation document for it.
Having a register description is quite pointless.
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux Products GmbH S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/4] aic79xx: Fixup external device reset
2006-10-25 7:01 ` Hannes Reinecke
@ 2006-10-25 15:14 ` James Bottomley
0 siblings, 0 replies; 5+ messages in thread
From: James Bottomley @ 2006-10-25 15:14 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: SCSI Mailing List, Tarte, Robert
On Wed, 2006-10-25 at 09:01 +0200, Hannes Reinecke wrote:
> > What's the reason for having to have this hand crafted test unit ready?
> >
> I asked myself the same question. It's actually from the original
> adaptec sources, and i couldn't figure out why it was need, either.
> That's why I removed it initially when doing the first round of external
> device reset patches.
>
> Thing is whenever we send a SCB with the MK_MESSAGE flag set the
> sequencer will interrupt normal SCB delivery and ensure that the
> MK_MESSAGE SCB is sent immediately. Plus we can force the connection to
> non-packetized transfer as the resetted target will start out with
> normal transfers, too. And as we already have invalidated the
> negotiation settings for all targets a renegotiation will happen when
> this TUR is completed.
> So the normal flow of operation can continue and only one command will
> have to be requeued from the midlayer.
>
> I think. Or that's what I've glanced from the sequencer code. Maybe Rob
> can give some more insight here.
>
> Admittedly, this is really a nasty tweaking. In theory we should have a
> proper error handler which handles this sort of thing. But the entire
> code is littered with such tweakings so that's quite a vain hope.
> Unless someone passes me a theory of operation document for it.
> Having a register description is quite pointless.
Yes ... that's what I was wondering. However, we can go with the "it's
magic" for the time being.
James
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-10-25 15:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-23 13:22 [PATCH 1/4] aic79xx: Fixup external device reset Hannes Reinecke
2006-10-24 23:37 ` James Bottomley
2006-10-25 2:49 ` Sean Bruno
2006-10-25 7:01 ` Hannes Reinecke
2006-10-25 15:14 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox