From: Hannes Reinecke <hare@suse.de>
To: James Bottomley <jbottomley@parallels.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"hch@infradead.org" <hch@infradead.org>,
"elliot@hp.com" <elliot@hp.com>
Subject: Re: [PATCH 1/2] scsi_scan: Send TEST UNIT READY to the LUN before scanning
Date: Wed, 11 Jun 2014 17:13:38 +0200 [thread overview]
Message-ID: <53987222.6060600@suse.de> (raw)
In-Reply-To: <1402498000.2523.11.camel@dabdike.int.hansenpartnership.com>
On 06/11/2014 04:46 PM, James Bottomley wrote:
> On Wed, 2014-06-11 at 16:33 +0200, Hannes Reinecke wrote:
>> On 06/11/2014 04:24 PM, James Bottomley wrote:
>>> On Thu, 2014-06-05 at 09:26 +0200, Hannes Reinecke wrote:
>>>> REPORT_LUN_SCAN does not report any outstanding unit attention
>>>> condition as per SAM. However, the target might not be fully
>>>> initialized at that time, so we might end up getting a
>>>> default entry (or even a partially filled one).
>>>> But as we're not able to process the REPORT LUN DATA HAS CHANGED
>>>> unit attention correctly we'll be missing out some LUNs during
>>>> startup.
>>>> So it's better to send a TEST UNIT READY for modern implementations
>>>> and wait until the unit attention condition goes away.
>>>
>>> Are you sure this is a good idea: we just spent ages tuning SCSI init so
>>> we don't slow systems down. This patch, in the event the array is
>>> having a power on problem, takes us right back to waiting for init
>>> again ... basically the busy wait in scsi_test_lun.
>>>
>>> Since the array should send us a UA anyway when it's got itself sorted
>>> out, what's wrong with just processing the report luns data has changed
>>> condition?
>>>
>> Because we can't.
>>
>> _If_ we were attempting this we'd run into several issues:
>> a) Boot will fail, as REPORT LUNs will return 0 LUNs (or just LUN 0).
>> So the scanning code will assume everything's fine. Booting will
>> continue, only to figure out that no LUNs are present.
>> As there is _no_ indication that REPORT LUNs should indeed have
>> returned an error (only it can't due to SAM) we wouldn't even
>> now that there _is_ an issue.
>> (In fact, that's what triggered the patchset in the first place.)
>> b) Even _if_ we're able so somehow recover from that we will have
>> to rescan the host and any attached devices.
>> The only way to do this currently is to _remove_ all devices
>> from that host and then do a full rescan.
>> Trying this with any devices which are already part of some
>> complex setup will become ... interesting.
>
> OK, go back to first principles and tell us what the actual problem is,
> with traces and details. Is this some weird SCSI-3 device with a single
> LUN that's screwing up report luns ... in which case we can just
> blacklist it. Or is it boot from an array?
>
The problem is as follows:
> Right after the "inquiry" the scsi subsystem sends a "report luns"
> to the RAID array.
> The RAID answers the "report luns" with only the 8 byte header
> and an empty (i.e. not existing) LUN list after this header
> because the LUNs still execute their initialization phase and
> did not reach their ready state yet.
> The RAID manufacturer describes this behaviour as an indication
> for: "there are no LUNs available".
>
> Then immediately follows a "test unit ready" command from the
> scsi subsystem to LUN 0 which is answered by the RAID firmware
> with a "check condition" "not ready, initialisation in progress".
>
As per SPC 'REPORT LUN' cannot return any check condition.
So we cannot distinguish by evaluating the 'REPORT LUN' response
whether it refers to a valid response or not.
Hence my approach to send a TEST UNIT READY prior to REPORT LUN,
as this would return any outstanding unit attention codes and
we can wait until the initialisation is finished.
Plus we're sending a TEST UNIT READY anyway when we're scanning
the LUN from sd.c:spin_up_disk(), so in effect we're just
moving the call.
>> So the easy way out here is indeed just to send a TEST UNIT READY.
>> And as we're checking for a reasonably SCSI compliance we should
>> be catching most of the oddballs.
>
> I don't object hugely to TUR ... except it binds us to spin up because
> most devices will respond not ready. I do object to busy waiting in the
> init thread until we get the right answer.
>
The problem is indeed in SPC:
The REPORT LUNS parameter data should be returned even though the
device server is not ready for other commands. The report of the
logical unit inventory should be available without incurring any
media access delays. If the device server is not ready with the
logical unit inventory or if the inventory list is null for the
requesting I_T nexus and the SELECT REPORT field set to 02h, then
the device server shall provide a default logical unit inventory
that contains at least LUN 0 or the REPORT LUNS well known logical
unit (see 8.2). A non-empty peripheral device logical unit inventory
that does not contain either LUN 0 or the REPORT LUNS
well known logical unit is valid.
So the above array is perfectly within spec.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-06-11 15:13 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-05 7:26 [PATCH 0/2] scanning fixes Hannes Reinecke
2014-06-05 7:26 ` [PATCH 1/2] scsi_scan: Send TEST UNIT READY to the LUN before scanning Hannes Reinecke
2014-06-11 13:40 ` Christoph Hellwig
2014-06-11 14:24 ` James Bottomley
2014-06-11 14:33 ` Hannes Reinecke
2014-06-11 14:46 ` James Bottomley
2014-06-11 15:13 ` Hannes Reinecke [this message]
2014-06-11 15:25 ` James Bottomley
2014-06-11 15:04 ` Jeremy Linton
2014-09-07 16:24 ` Christoph Hellwig
2014-09-14 8:17 ` Hannes Reinecke
2014-06-05 7:26 ` [PATCH 2/2] scsi: Handle power-on reset unit attention Hannes Reinecke
2014-06-11 12:49 ` Christoph Hellwig
2014-06-11 14:19 ` Ewan Milne
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53987222.6060600@suse.de \
--to=hare@suse.de \
--cc=elliot@hp.com \
--cc=hch@infradead.org \
--cc=jbottomley@parallels.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).