public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Overlapped command error handling
@ 2006-02-02 13:00 Hannes Reinecke
  2006-02-02 15:12 ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2006-02-02 13:00 UTC (permalink / raw)
  To: SCSI Mailing List

Hi all,

and here is something for the SCSI Gods out there:

my IOMEGA Zip drive is going into error handling upon access.
Surprisingly enough, the device recovered so it's not really critical.
However, I wonder whether the error recovery is really a hardware error
on the driver itself or whether it's again this blasted aic79xx driver.

What happens is the following:

sd 4:0:6:0: send 0xffff810076858dc0  sd 4:0:6:0:
        command: Read (10): 28 00 00 00 00 00 00 00 08 00
-> Read some bytes, ok.

sd 4:0:6:0: send 0xffff810076858488  sd 4:0:6:0:
        command: Read (10): 28 00 00 00 00 08 00 00 38 00
-> send request to read some more bytes, ok. Note that the previous
   request has not returned (yet).

sd 4:0:6:0: done 0xffff810076858488 RETRY    8000002 sd 4:0:6:0:
        command: Read (10): 28 00 00 00 00 08 00 00 38 00
sdb: Current: sense key: Aborted Command
    Additional sense: Overlapped commands attempted
-> this _looks_ like an error on the drive as the commands do not really
   overlap. It looks more as if the device can't handle more than one
   command at a time. Do we have a blacklist flag for such things?

sd 4:0:6:0: send 0xffff810076858488                  sd 4:0:6:0:
        command: Read (10): 28 00 00 00 00 08 00 00 38 00
-> Command requeued, okay.

sd 4:0:6:0: done 0xffff810076858488 SUCCESS        0 sd 4:0:6:0:
        command: Read (10): 28 00 00 00 00 08 00 00 38 00
-> command finished, okay.

sd 4:0:6:0: done 0xffff810076858dc0 TIMEOUT        0 sd 4:0:6:0:
        command: Read (10): 28 00 00 00 00 00 00 00 08 00
-> First command was still out there, but appearently never returned.
   So of course error recovery started for this command.

It looks to me as if the device has aborted _both_ commands when
returning the 'overlapped commands attempted' sense.
Is that conformant behaviour?

SCSI-2 states in section 7.5.2:
A target that detects an incorrect initiator connection shall abort all
I/O processes for the initiator on the logical unit or target routine
and shall return CHECK CONDITION status. The sense key shall be set to
ABORTED COMMAND and the additional sense code shall be set to OVERLAPPED
COMMANDS ATTEMPTED.

So the behaviour of the drive seems to be okay; however, the scsi stack
doesn't abort all commands (yet).
Might that be an error in the scsi stack?

Curious,

Hannes
-- 
Dr. Hannes Reinecke			hare@suse.de
SuSE Linux Products GmbH		S390 & zSeries
Maxfeldstraße 5				+49 911 74053 688
90409 Nürnberg				http://www.suse.de

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overlapped command error handling
  2006-02-02 13:00 Overlapped command error handling Hannes Reinecke
@ 2006-02-02 15:12 ` James Bottomley
  2006-02-03  9:34   ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2006-02-02 15:12 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: SCSI Mailing List

On Thu, 2006-02-02 at 14:00 +0100, Hannes Reinecke wrote:
> It looks to me as if the device has aborted _both_ commands when
> returning the 'overlapped commands attempted' sense.
> Is that conformant behaviour?
> 
> SCSI-2 states in section 7.5.2:
> A target that detects an incorrect initiator connection shall abort all
> I/O processes for the initiator on the logical unit or target routine
> and shall return CHECK CONDITION status. The sense key shall be set to
> ABORTED COMMAND and the additional sense code shall be set to OVERLAPPED
> COMMANDS ATTEMPTED.

Actually, I think it's probably complaining about tag reuse in the sense
of SAM-3 section 5.9.3.  You return this sense if an initiator attempts
to use a duplicate tag.  Since the aic79xx manges its own tags, then
it's probably a driver error.  You could verify this by having the
aic79xx print out the tag as it sends the command out on the wire.  (The
device does support TCQ, doesn't it?)

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overlapped command error handling
  2006-02-02 15:12 ` James Bottomley
@ 2006-02-03  9:34   ` Hannes Reinecke
  2006-02-03 15:31     ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2006-02-03  9:34 UTC (permalink / raw)
  To: SCSI Mailing List

[-- Attachment #1: Type: text/plain, Size: 2089 bytes --]

James Bottomley wrote:
> On Thu, 2006-02-02 at 14:00 +0100, Hannes Reinecke wrote:
>> It looks to me as if the device has aborted _both_ commands when
>> returning the 'overlapped commands attempted' sense.
>> Is that conformant behaviour?
>>
>> SCSI-2 states in section 7.5.2:
>> A target that detects an incorrect initiator connection shall abort all
>> I/O processes for the initiator on the logical unit or target routine
>> and shall return CHECK CONDITION status. The sense key shall be set to
>> ABORTED COMMAND and the additional sense code shall be set to OVERLAPPED
>> COMMANDS ATTEMPTED.
> 
> Actually, I think it's probably complaining about tag reuse in the sense
> of SAM-3 section 5.9.3.  You return this sense if an initiator attempts
> to use a duplicate tag.  Since the aic79xx manges its own tags, then
> it's probably a driver error.  You could verify this by having the
> aic79xx print out the tag as it sends the command out on the wire.  (The
> device does support TCQ, doesn't it?)
> 
Argl. Been bitten by the classical 'I know better than thou' attitude.

drivers/scsi/aic7xx/aic79xx_osm.c:ahd_platform_set_tags()
	default:
		/*
		 * We allow the OS to queue 2 untagged transactions to
		 * us at any time even though we can only execute them
		 * serially on the controller/device.  This should
		 * remove some latency.
		 */

but as we also removed the aic7xxx internal timeout handling this means
that the queue_depth is always '2', hence the midlayer might attempt to
send two command simultaneously to the device.
Which is what it did in my case. And, surprisingly enough, setting the
queue depth for the non-TCQ case to '1' fixed the issue.

Can someone explain to my how having a queue depth of '2' in the non-TCQ
case can be correct?
If not, please apply the following patch. Oh, and aic7xxx suffers from
the same problem, naturally.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke			hare@suse.de
SuSE Linux Products GmbH		S390 & zSeries
Maxfeldstraße 5				+49 911 74053 688
90409 Nürnberg				http://www.suse.de

[-- Attachment #2: aic79xx-disable-tcq-fix --]
[-- Type: text/plain, Size: 1138 bytes --]

diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c b/drivers/scsi/aic7xxx/aic79xx_osm.c
index 78c8338..b9e3776 100644
--- a/drivers/scsi/aic7xxx/aic79xx_osm.c
+++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
@@ -1346,25 +1346,15 @@ ahd_platform_set_tags(struct ahd_softc *
 
 	switch ((dev->flags & (AHD_DEV_Q_BASIC|AHD_DEV_Q_TAGGED))) {
 	case AHD_DEV_Q_BASIC:
-		scsi_adjust_queue_depth(sdev,
-					MSG_SIMPLE_TASK,
-					dev->openings + dev->active);
+		scsi_set_tag_type(sdev, MSG_SIMPLE_TASK);
+		scsi_activate_tcq(sdev, dev->openings + dev->active);
 		break;
 	case AHD_DEV_Q_TAGGED:
-		scsi_adjust_queue_depth(sdev,
-					MSG_ORDERED_TASK,
-					dev->openings + dev->active);
+		scsi_set_tag_type(sdev, MSG_ORDERED_TASK);
+		scsi_activate_tcq(sdev, dev->openings + dev->active);
 		break;
 	default:
-		/*
-		 * We allow the OS to queue 2 untagged transactions to
-		 * us at any time even though we can only execute them
-		 * serially on the controller/device.  This should
-		 * remove some latency.
-		 */
-		scsi_adjust_queue_depth(sdev,
-					/*NON-TAGGED*/0,
-					/*queue depth*/2);
+		scsi_deactivate_tcq(sdev, 1);
 		break;
 	}
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Overlapped command error handling
@ 2006-02-03 11:00 Emmanuel Fusté
  0 siblings, 0 replies; 7+ messages in thread
From: Emmanuel Fusté @ 2006-02-03 11:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: Hannes Reinecke, James Bottomley

Hello,

Now compiling, results this evening.
The patch for aic7xxx :

---
linux-source-2.6.15-orig/drivers/scsi/aic7xxx/aic7xxx_osm.c
2006-02-03 10:56:04.000000000 +0100
+++ linux-source-2.6.15/drivers/scsi/aic7xxx/aic7xxx_osm.c   
  2006-02-03 11:00:41.000000000 +0100
@@ -1327,13 +1327,7 @@ ahc_platform_set_tags(struct ahc_softc *
                scsi_activate_tcq(sdev, dev->openings +
dev->active);
                break;
        default:
-               /*
-                * We allow the OS to queue 2 untagged
transactions to
-                * us at any time even though we can only
execute them
-                * serially on the controller/device.  This should
-                * remove some latency.
-                */
-               scsi_deactivate_tcq(sdev, 2);
+               scsi_deactivate_tcq(sdev, 1);
                break;
        }
 }

---

Accédez au courrier électronique de La Poste : www.laposte.net ; 
3615 LAPOSTENET (0,34 €/mn) ; tél : 08 92 68 13 50 (0,34€/mn)



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overlapped command error handling
@ 2006-02-03 11:13 Emmanuel Fusté
  0 siblings, 0 replies; 7+ messages in thread
From: Emmanuel Fusté @ 2006-02-03 11:13 UTC (permalink / raw)
  To: linux-scsi; +Cc: hare, James.Bottomley, emmanuel.fuste

[-- Attachment #1: Type: text/plain, Size: 266 bytes --]

> Hello,
>
> Now compiling, results this evening.
> The patch for aic7xxx :
>
New try as an attachment...

Emmanuel.
---

Accédez au courrier électronique de La Poste : www.laposte.net ; 
3615 LAPOSTENET (0,34 €/mn) ; tél : 08 92 68 13 50 (0,34€/mn)



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: tcq7xxx.patch --]
[-- Type: text/x-patch; name="tcq7xxx.patch", Size: 649 bytes --]

--- linux-source-2.6.15-orig/drivers/scsi/aic7xxx/aic7xxx_osm.c	2006-02-03 10:56:04.000000000 +0100
+++ linux-source-2.6.15/drivers/scsi/aic7xxx/aic7xxx_osm.c	2006-02-03 11:00:41.000000000 +0100
@@ -1327,13 +1327,7 @@ ahc_platform_set_tags(struct ahc_softc *
 		scsi_activate_tcq(sdev, dev->openings + dev->active);
 		break;
 	default:
-		/*
-		 * We allow the OS to queue 2 untagged transactions to
-		 * us at any time even though we can only execute them
-		 * serially on the controller/device.  This should
-		 * remove some latency.
-		 */
-		scsi_deactivate_tcq(sdev, 2);
+		scsi_deactivate_tcq(sdev, 1);
 		break;
 	}
 }


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overlapped command error handling
  2006-02-03  9:34   ` Hannes Reinecke
@ 2006-02-03 15:31     ` James Bottomley
  0 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2006-02-03 15:31 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: SCSI Mailing List

On Fri, 2006-02-03 at 10:34 +0100, Hannes Reinecke wrote:

> Can someone explain to my how having a queue depth of '2' in the non-TCQ
> case can be correct?
> If not, please apply the following patch. Oh, and aic7xxx suffers from
> the same problem, naturally.

Yes, the driver is supposed to be counting both the tagged and untagged
commands outstanding and return SCSI_MLQUEUE_DEVICE_BUSY if the
mid-layer tries to queue over that.

The premise of setting this to 2 is that we always have one command in
progress and one completely prepared ready to fire as soon as the
in-progress one returns.  Jens has postulated that this is no-longer
necessary; we could simply reduce the queue depth to 1 for the non-TCQ
case.  What this would do is transfer the setup work from the sending
process to the block softirq, so it might increase latency in the
non-TCQ case ... but no-one knows for sure.

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overlapped command error handling
@ 2006-02-04 17:01 Emmanuel Fusté
  0 siblings, 0 replies; 7+ messages in thread
From: Emmanuel Fusté @ 2006-02-04 17:01 UTC (permalink / raw)
  To: emmanuel.fuste; +Cc: linux-scsi, hare, James.Bottomley

> Hello,
> 
> Now compiling, results this evening.
> The patch for aic7xxx :
...
> ---

Work, but cdrwtool -d /dev/sr0 -q completely freeze the
machine, no recovery possible this time, only the power button.
Will retry later...

Cheers,
Emmanuel.
---

Accédez au courrier électronique de La Poste : www.laposte.net ; 
3615 LAPOSTENET (0,34 €/mn) ; tél : 08 92 68 13 50 (0,34€/mn)



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-02-04 17:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-02 13:00 Overlapped command error handling Hannes Reinecke
2006-02-02 15:12 ` James Bottomley
2006-02-03  9:34   ` Hannes Reinecke
2006-02-03 15:31     ` James Bottomley
  -- strict thread matches above, loose matches on Subject: below --
2006-02-03 11:00 Emmanuel Fusté
2006-02-03 11:13 Emmanuel Fusté
2006-02-04 17:01 Emmanuel Fusté

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox