linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Steffen Maier <maier@linux.vnet.ibm.com>
Cc: Vaughan Cao <vaughan.cao@oracle.com>,
	JBottomley@parallels.com, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: PROBLEM: special sense code asc,ascq=04h,0Ch abort scsi scan in the middle
Date: Mon, 14 Oct 2013 15:32:06 +0200	[thread overview]
Message-ID: <525BF256.6060707@suse.de> (raw)
In-Reply-To: <525BEF2B.2030907@suse.de>

On 10/14/2013 03:18 PM, Hannes Reinecke wrote:
> On 10/14/2013 02:51 PM, Steffen Maier wrote:
>> Hi Hannes,
>>
>> On 10/14/2013 01:13 PM, Hannes Reinecke wrote:
>>> On 10/13/2013 07:23 PM, Vaughan Cao wrote:
>>>> Hi James,
>>>>
>>>> [1.] One line summary of the problem:
>>>> special sense code asc,ascq=04h,0Ch abort scsi scan in the middle
>>>>
>>>> [2.] Full description of the problem/report:
>>>> For instance, storage represents 8 iscsi LUNs, however the LUN No.7
>>>> is not well configured or has something wrong.
>>>> Then messages received:
>>>> kernel: scsi 5:0:0:0: Unexpected response from lun 7 while scanning, scan aborted
>>>> Which will make LUN No.8 unavailable.
>>>> It's confirmed that Windows and Solaris systems will continue the
>>>> scan and make LUN No.1,2,3,4,5,6 and 8 available.
>>>>
>>>> Log snippet is as below:
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: scsi scan: INQUIRY pass 1 length 36
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Send: 0xffff8801e9bd4280
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: CDB: Inquiry: 12 00 00 00 24 00
>>>> Aug 24 00:32:49 vmhodtest019 kernel: buffer = 0xffff8801f71fc180, bufflen = 36, queuecommand 0xffffffffa00b99e7
>>>> Aug 24 00:32:49 vmhodtest019 kernel: leaving scsi_dispatch_cmnd()
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Done: 0xffff8801e9bd4280 SUCCESS
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: CDB: Inquiry: 12 00 00 00 24 00
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Sense Key : Not Ready [current]
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Add. Sense: Logical unit not accessible, target port in unavailable state
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: scsi host busy 1 failed 0
>>>> Aug 24 00:32:49 vmhodtest019 kernel: 0 sectors total, 36 bytes done.
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi scan: INQUIRY failed with code 0x8000002
>>>> Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:0: Unexpected response from lun 7 while scanning, scan aborted
>>>>
>>>> According to scsi_report_lun_scan(), I found:
>>>> Linux use an inquiry command to probe a lun according to the result
>>>> of report_lun command.
>>>> It assumes every probe cmd will get a legal result. Otherwise, it
>>>> regards the whole peripheral not exist or dead.
>>>> If the return of inquiry passes its legal checking and indicates
>>>> 'LUN not present', it won't break but also continue with the scan
>>>> process.
>>>> In the log, inquiry to LUN7 return a sense - asc,ascq=04h,0Ch
>>>> (Logical unit not accessible, target port in unavailable state).
>>>> And this is ignored, so scsi_probe_lun() returns -EIO and the scan
>>>> process is aborted.
>>>>
>>>> I have two questions:
>>>> 1. Is it correct for hardware to return a sense 04h,0Ch to inquiry
>>>> again, even after presenting this lun in responce to REPORT_LUN
>>>> command?
>>> Yes, this is correct. 'REPORT LUNS' is supported in 'Unavailable' state.
>>>
>>>> 2. Since windows and solaris can continue scan, is it reasonable for
>>>> linux to do the same, even for a fault-tolerance purpose?
>>>>
>>> Hmm. Yes, and no.
>>>
>>> _Actually_ this is an issue with the target, as it looks as if it
>>> will return the above sense code while sending an 'INQUIRY' to the
>>> device.
>>> SPC explicitely states that the INQUIRY command should _not_ fail
>>> for unavailable devices.
>>> But yeah, we probably should work around this issues.
>>> Nevertheless, please raise this issue with your array vendor.
>>>
>>> Please try the attached patch.
>>>
>>> Cheers,
>>>
>>> Hannes
>>>
>>
>>> From b0e90778f012010c881f8bdc03bce63a36921b77 Mon Sep 17 00:00:00 2001
>>> From: Hannes Reinecke <hare@suse.de>
>>> Date: Mon, 14 Oct 2013 13:11:22 +0200
>>> Subject: [PATCH] scsi_scan: continue report_lun_scan after error
>>>
>>> When scsi_probe_and_add_lun() fails in scsi_report_lun_scan() this
>>> does _not_ indicate that the entire target is done for.
>>> So continue scanning for the remaining devices.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>
>>> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>>> index 307a811..973a121 100644
>>> --- a/drivers/scsi/scsi_scan.c
>>> +++ b/drivers/scsi/scsi_scan.c
>>> @@ -1484,13 +1484,12 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags,
>>>  				lun, NULL, NULL, rescan, NULL);
>>>  			if (res == SCSI_SCAN_NO_RESPONSE) {
>>>  				/*
>>> -				 * Got some results, but now none, abort.
>>> +				 * Got some results, but now none, ignore.
>>>  				 */
>>>  				sdev_printk(KERN_ERR, sdev,
>>>  					"Unexpected response"
>>> -				        " from lun %d while scanning, scan"
>>> -				        " aborted\n", lun);
>>> -				break;
>>> +					" from lun %d while scanning,"
>>> +					" ignoring device\n", lun);
>>>  			}
>>>  		}
>>>  	}
>>
>> In LLDDs that do their own initiator based LUN masking (because the midlayer does not have this
>> functionality to enable hardware virtualization without NPIV, or
> to work around suboptimal LUN
>> masking on the target), they are likely to return -ENXIO from
> slave_alloc(), making scsi_alloc_sdev()
>> return NULL, being converted to SCSI_SCAN_NO_RESPONSE by
> scsi_probe_and_add_lun() and thus going
>> through the same code path above.
>>
> Ah. Hmm. Yes, they would.
> 
> However, I personally would question this approach, as SPC states that
> 
>> The REPORT LUNS command (see table 284) requests the device
>> server to return the peripheral device logical unit inventory
>> accessible to the I_T nexus.
> 
> So by plain reading this would meant that you either should modify
> 'REPORT LUNS' to not show the masked LUNs, or set the pqual field to
> '0x10' or '0x11' for those LUNs.
> 
>> E.g. zfcp does return -ENXIO if the particular LUN was not made known to the unit whitelist
>> (via zfcp sysfs attribute unit_add).
>> If we attach LUN 0 (via unit_add) and trigger a target scan with SCAN_WILD_CARD for the scsi
>> lun (e.g. on remote port recovery), we see exactly above error
> message for the first LUN in
>> the response of report lun which is not explicitly attached to zfcp.
>> IIRC, other LLDDs such as bfa also do similar stuff [http://marc.info/?l=linux-scsi&m=134489842105383&w=2].
>>
>> For those cases, I think it makes sense to abort scsi_report_lun_scan().
>> Otherwise we would force the LLDD to return -ENXIO for every
> single LUN reported by report lun but not
>> explicitly added to the LLDD LUN whitelist; and this would likely
> *flood kernel messages*.
>>
>> Maybe Vaughan's case needs to be distinguished in a patch.
>>
> Well, as mentioned initially, the real issue is that the target
> aborts an INQUIRY while being in 'Unavailable'. Which, according to
> SPC-3 (or later), is a violation of the spec.
> 
> So we _could_ just tell them to go away, but admittedly that's bad
> style. Which means we'll have to implement a workaround; the above
> was just a simple way of implementing it. If that's not working of
> course we'll have to do something else.
> 
What about this patch:

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 973a121..01a7d69 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -594,6 +594,19 @@ static int scsi_probe_lun(struct scsi_device
*sdev, unsigne
d char *inq_result,
                                     (sshdr.asc == 0x29)) &&
                                    (sshdr.ascq == 0))
                                        continue;
+                               /*
+                                * Some buggy implementations return
+                                * 'target port in unavailable state'
+                                * even on INQUIRY.
+                                * Set peripheral qualifier 3
+                                * for these devices.
+                                */
+                               if ((sshdr.sense_key == NOT_READY) &&
+                                   ((sshdr.asc == 0x04) &&
+                                    (sshdr.ascq == 0x0C))) {
+                                   inq_result[0] = 3 << 5;
+                                   return 0;
+                               }
                        }
                } else {
                        /*

(watchout, linebreaks mangled and all that).
Should be working for this particular case without interrupting
normal workflow, now should it not?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

  reply	other threads:[~2013-10-14 13:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-13 17:23 PROBLEM: special sense code asc,ascq=04h,0Ch abort scsi scan in the middle Vaughan Cao
2013-10-14 11:13 ` Hannes Reinecke
2013-10-14 12:51   ` Steffen Maier
2013-10-14 13:18     ` Hannes Reinecke
2013-10-14 13:32       ` Hannes Reinecke [this message]
2013-10-14 15:24         ` Steffen Maier
2013-10-16  6:52           ` Hannes Reinecke
2013-10-16  7:26             ` vaughan
2013-10-21  6:07             ` vaughan
2013-10-22 17:05               ` Hannes Reinecke
2013-12-18 13:51               ` Vaughan Cao
2014-02-19  8:29             ` vaughan
2013-10-14 15:18       ` Vaughan Cao
2013-10-15  3:32   ` vaughan
2013-10-15  5:51     ` Hannes Reinecke
2013-10-15 11:46       ` Vaughan Cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=525BF256.6060707@suse.de \
    --to=hare@suse.de \
    --cc=JBottomley@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=maier@linux.vnet.ibm.com \
    --cc=vaughan.cao@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).