* Re: 'Device not ready' issue on mpt2sas since 3.1.10
[not found] ` <20120722173146.GE5144@dhcp-172-17-108-109.mtv.corp.google.com>
@ 2012-07-25 14:19 ` James Bottomley
2012-07-25 17:17 ` Tejun Heo
0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2012-07-25 14:19 UTC (permalink / raw)
To: Tejun Heo
Cc: Matthias Prager, Robert Trace, linux-scsi, Jens Axboe, Eric Moore,
Alan, Darrick J. Wong, linux-ide
On Sun, 2012-07-22 at 10:31 -0700, Tejun Heo wrote:
> Hello,
>
> On Sat, Jul 21, 2012 at 02:15:56PM +0200, Matthias Prager wrote:
> > Now I'm not sure this isn't taping over another bug. Which leads me to
> > my question: What is the correct behavior?
> >
> > #1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o
> > by setting allow_restart=1 for sata disks on sas controllers
> >
> > or
> >
> > #2 Teaching the sas drivers they do not need spin-up commands and can
> > simply start issuing i/o to sata disks
>
> I haven't consulted SAT but it seems like a bug in SAS driver or
> firmware. If it's a driver bug, we better fix it there. If a
> firmware bug, working around those is one of major roles of drivers,
> so I think setting allow_restart is fine.
Actually, I don't think so. SAT-2 section 8.12.2 does say
if the device is in the stopped state as the result of
processing a START STOP UNIT command (see 9.11), then the SATL
shall terminate the TEST UNIT READY command with CHECK CONDITION
status with the sense key set to NOT READY and the additional
sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
REQUIRED;
START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and
that's what hdparm -y issues. We don't see this in /drivers/ata because
TEST UNIT READY always returns success.
So it looks like the mpt2sas SAT is doing the correct thing and we only
don't see this problem in normal SATA devices because of a bug in the
libata-scsi SAT.
However, the kernel log
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Sense Key : Not Ready [current]
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Add. Sense: Logical unit not ready, initializing command required
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] CDB: Write(10): 2a 00 57 54 52 3f 00 00 08 00
Indicates we got the NOT READY to a non-TUR command, so I suspect what's
happening is that sending the TUR causes the SAT to remember the standby
state and respond NOT READY to all subsequent commands. However, if we
just send an ordinary command, not a TUR, it quietly wakes the drive and
we don't see any problems.
There is support in SAT for this behaviour because there's a note on the
START STOP UNIT command saying
After returning GOOD status for a START STOP UNIT command with
the START bit set to zero, the SATL shall consider the ATA
device to be in the Stopped power state (see SBC-2)
Which in SCSI terms would mean return NOT READY to any subsequent
commands.
Can someone verify this is indeed what the mpt2sas HBA is doing?
James
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 'Device not ready' issue on mpt2sas since 3.1.10
2012-07-25 14:19 ` 'Device not ready' issue on mpt2sas since 3.1.10 James Bottomley
@ 2012-07-25 17:17 ` Tejun Heo
2012-07-25 19:55 ` James Bottomley
0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2012-07-25 17:17 UTC (permalink / raw)
To: James Bottomley
Cc: Matthias Prager, Robert Trace, linux-scsi, Jens Axboe, Eric Moore,
Alan, Darrick J. Wong, linux-ide
Hello, James.
On Wed, Jul 25, 2012 at 06:19:13PM +0400, James Bottomley wrote:
> > I haven't consulted SAT but it seems like a bug in SAS driver or
> > firmware. If it's a driver bug, we better fix it there. If a
> > firmware bug, working around those is one of major roles of drivers,
> > so I think setting allow_restart is fine.
>
> Actually, I don't think so. SAT-2 section 8.12.2 does say
>
> if the device is in the stopped state as the result of
> processing a START STOP UNIT command (see 9.11), then the SATL
> shall terminate the TEST UNIT READY command with CHECK CONDITION
> status with the sense key set to NOT READY and the additional
> sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
> REQUIRED;
>
> START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and
> that's what hdparm -y issues. We don't see this in /drivers/ata because
> TEST UNIT READY always returns success.
Urgh... ATA device in standby mode is ready for any command and
definitely doesn't need an "initializing command". Oh, well...
> So it looks like the mpt2sas SAT is doing the correct thing and we only
> don't see this problem in normal SATA devices because of a bug in the
> libata-scsi SAT.
libata is inconsistent with the standard but I think the standard is
wrong here. :(
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 'Device not ready' issue on mpt2sas since 3.1.10
2012-07-25 17:17 ` Tejun Heo
@ 2012-07-25 19:55 ` James Bottomley
2012-07-25 23:56 ` Matthias Prager
2012-08-16 18:26 ` Robert Trace
0 siblings, 2 replies; 8+ messages in thread
From: James Bottomley @ 2012-07-25 19:55 UTC (permalink / raw)
To: Tejun Heo
Cc: Matthias Prager, Robert Trace, linux-scsi, Jens Axboe, Eric Moore,
Alan, Darrick J. Wong, linux-ide
On Wed, 2012-07-25 at 10:17 -0700, Tejun Heo wrote:
> Hello, James.
>
> On Wed, Jul 25, 2012 at 06:19:13PM +0400, James Bottomley wrote:
> > > I haven't consulted SAT but it seems like a bug in SAS driver or
> > > firmware. If it's a driver bug, we better fix it there. If a
> > > firmware bug, working around those is one of major roles of drivers,
> > > so I think setting allow_restart is fine.
> >
> > Actually, I don't think so. SAT-2 section 8.12.2 does say
> >
> > if the device is in the stopped state as the result of
> > processing a START STOP UNIT command (see 9.11), then the SATL
> > shall terminate the TEST UNIT READY command with CHECK CONDITION
> > status with the sense key set to NOT READY and the additional
> > sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
> > REQUIRED;
> >
> > START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and
> > that's what hdparm -y issues. We don't see this in /drivers/ata because
> > TEST UNIT READY always returns success.
>
> Urgh... ATA device in standby mode is ready for any command and
> definitely doesn't need an "initializing command". Oh, well...
Well, it does in sleep mode ... which seems to most closely map to what
SCSI thinks of as a stopped unit. I checked the specs just in case there
was an error ... they all say STANDBY not SLEEP.
> > So it looks like the mpt2sas SAT is doing the correct thing and we only
> > don't see this problem in normal SATA devices because of a bug in the
> > libata-scsi SAT.
>
> libata is inconsistent with the standard but I think the standard is
> wrong here. :(
Well, reading it, so do I. Unfortunately, we get to deal with the world
as it is rather than as we would wish it to be. We likely have this
problem with a lot of USB SATLs as well ...
It looks like a hack like this might be needed.
James
---
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4a6381c..7e59a7f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -42,6 +42,8 @@
#include <trace/events/scsi.h>
+static void scsi_eh_done(struct scsi_cmnd *scmd);
+
#define SENSE_TIMEOUT (10*HZ)
/*
@@ -241,6 +243,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
if (! scsi_command_normalize_sense(scmd, &sshdr))
return FAILED; /* no valid sense data */
+ if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
+ /*
+ * nasty: for mid-layer issued TURs, we need to return the
+ * actual sense data without any recovery attempt. For eh
+ * issued ones, we need to try to recover and interpret
+ */
+ return SUCCESS;
+
if (scsi_sense_is_deferred(&sshdr))
return NEEDS_RETRY;
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 56a9379..91d3366 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -764,6 +764,16 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
sdev->model = (char *) (sdev->inquiry + 16);
sdev->rev = (char *) (sdev->inquiry + 32);
+ if (strncmp(sdev->vendor, "ATA ", 8) == 0) {
+ /*
+ * sata emulation layer device. This is a hack to work around
+ * the SATL power management specifications which state that
+ * when the SATL detects the device has gone into standby
+ * mode, it shall respond with NOT READY.
+ */
+ sdev->allow_restart = 1;
+ }
+
if (*bflags & BLIST_ISROM) {
sdev->type = TYPE_ROM;
sdev->removable = 1;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: 'Device not ready' issue on mpt2sas since 3.1.10
2012-07-25 19:55 ` James Bottomley
@ 2012-07-25 23:56 ` Matthias Prager
2012-07-26 19:16 ` Robert Trace
2012-08-16 18:26 ` Robert Trace
1 sibling, 1 reply; 8+ messages in thread
From: Matthias Prager @ 2012-07-25 23:56 UTC (permalink / raw)
To: James Bottomley
Cc: Tejun Heo, Robert Trace, linux-scsi, Jens Axboe, Eric Moore, Alan,
Darrick J. Wong, linux-ide, Matthias Prager
Hello James,
Am 25.07.2012 21:55, schrieb James Bottomley:>
> It looks like a hack like this might be needed.
>
> James
>
<SNIP>
I don't yet understand all the code but I'm following your discussion
with Tejun: I've set up a minimal vm running gentoo with a mpt2sas
driven controller in passthrough mode. I've applied your proposed patch
against the vanilla 3.5.0 kernel (which includes Tejun's commit), and
I'm happy to report the problem does seem to get fixed by it.
Well at least sending the sata drive in standby using 'hdparm -y' now
works (according to 'hdparm -C') without these nasty i/o errors on later
i/o. That is to say the drive wakes up again (e.g. from a 'fdisk -l
/dev/sda' command) and returns data.
--
Matthias
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 'Device not ready' issue on mpt2sas since 3.1.10
2012-07-25 23:56 ` Matthias Prager
@ 2012-07-26 19:16 ` Robert Trace
0 siblings, 0 replies; 8+ messages in thread
From: Robert Trace @ 2012-07-26 19:16 UTC (permalink / raw)
To: Matthias Prager
Cc: James Bottomley, Tejun Heo, linux-scsi, Jens Axboe, Eric Moore,
Alan, Darrick J. Wong, linux-ide
On 07/25/2012 07:56 PM, Matthias Prager wrote:
>
> I don't yet understand all the code but I'm following your discussion
> with Tejun: I've set up a minimal vm running gentoo with a mpt2sas
> driven controller in passthrough mode. I've applied your proposed patch
> against the vanilla 3.5.0 kernel (which includes Tejun's commit), and
> I'm happy to report the problem does seem to get fixed by it.
I can confirm this on my hardware as well with both 3.4.4 and 3.5.0.
Without James' patch the kernels will immediately drop the I/O and with
the patch both kernels will wake the SATA disks and then complete the
I/O successfully.
-- Robert
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 'Device not ready' issue on mpt2sas since 3.1.10
2012-07-25 19:55 ` James Bottomley
2012-07-25 23:56 ` Matthias Prager
@ 2012-08-16 18:26 ` Robert Trace
2012-08-16 20:24 ` Matthias Prager
1 sibling, 1 reply; 8+ messages in thread
From: Robert Trace @ 2012-08-16 18:26 UTC (permalink / raw)
To: James Bottomley
Cc: Tejun Heo, Matthias Prager, linux-scsi, Jens Axboe, Eric Moore,
Alan, Darrick J. Wong, linux-ide
On 07/25/2012 03:55 PM, James Bottomley wrote:
>
> Well, reading it, so do I. Unfortunately, we get to deal with the world
> as it is rather than as we would wish it to be. We likely have this
> problem with a lot of USB SATLs as well ...
Has this patch made it into the main git trees yet?
I haven't seen anything about it in nearly a month, but I've been using
the James' patch since he posted it and the sleep/wakeup behavior seems
improved/correct.
-- Robert
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 'Device not ready' issue on mpt2sas since 3.1.10
2012-08-16 18:26 ` Robert Trace
@ 2012-08-16 20:24 ` Matthias Prager
2012-08-16 20:33 ` Robert Trace
0 siblings, 1 reply; 8+ messages in thread
From: Matthias Prager @ 2012-08-16 20:24 UTC (permalink / raw)
To: Robert Trace
Cc: James Bottomley, Tejun Heo, linux-scsi, Jens Axboe, Eric Moore,
Alan, Darrick J. Wong, linux-ide, Matthias Prager
Am 16.08.2012 20:26, schrieb Robert Trace:
> On 07/25/2012 03:55 PM, James Bottomley wrote:
>>
>> Well, reading it, so do I. Unfortunately, we get to deal with the world
>> as it is rather than as we would wish it to be. We likely have this
>> problem with a lot of USB SATLs as well ...
>
> Has this patch made it into the main git trees yet?
Not yet, but it is in James scsi misc tree and last I heard was
scheduled for inclusion in the 3.6 kernel.
Anyways here is his commit:
<http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb>
>
> I haven't seen anything about it in nearly a month, but I've been using
> the James' patch since he posted it and the sleep/wakeup behavior seems
> improved/correct.
I have been running smoothly with the patch too - problem solved I'd say :-)
>
> -- Robert
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 'Device not ready' issue on mpt2sas since 3.1.10
2012-08-16 20:24 ` Matthias Prager
@ 2012-08-16 20:33 ` Robert Trace
0 siblings, 0 replies; 8+ messages in thread
From: Robert Trace @ 2012-08-16 20:33 UTC (permalink / raw)
To: Matthias Prager
Cc: James Bottomley, Tejun Heo, linux-scsi, Jens Axboe, Eric Moore,
Alan, Darrick J. Wong, linux-ide
On 08/16/2012 04:24 PM, Matthias Prager wrote:
>
> Not yet, but it is in James scsi misc tree and last I heard was
> scheduled for inclusion in the 3.6 kernel.
Close enough. :-) I didn't track the changes on the SCSI tree and I
just wanted to make sure that it didn't slip through the cracks.
Thanks to all involved for all of the help and a speedy fix!
-- Robert
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-08-16 20:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4FFB32E5.1050109@farcaster.org>
[not found] ` <4FFB7354.8040809@matthiasprager.de>
[not found] ` <4FFB8A86.7000009@farcaster.org>
[not found] ` <4FFCBA4C.4000502@farcaster.org>
[not found] ` <4FFD6F3D.2030708@matthiasprager.de>
[not found] ` <4FFD8410.7050604@matthiasprager.de>
[not found] ` <20120717180932.GB2878@google.com>
[not found] ` <5005BF7D.2050703@matthiasprager.de>
[not found] ` <20120717200136.GC24336@google.com>
[not found] ` <500A9D7C.8080801@matthiasprager.de>
[not found] ` <20120722173146.GE5144@dhcp-172-17-108-109.mtv.corp.google.com>
2012-07-25 14:19 ` 'Device not ready' issue on mpt2sas since 3.1.10 James Bottomley
2012-07-25 17:17 ` Tejun Heo
2012-07-25 19:55 ` James Bottomley
2012-07-25 23:56 ` Matthias Prager
2012-07-26 19:16 ` Robert Trace
2012-08-16 18:26 ` Robert Trace
2012-08-16 20:24 ` Matthias Prager
2012-08-16 20:33 ` Robert Trace
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).