Why does SCSI mid layer mark the LUN offline in this situation?

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Why does SCSI mid layer mark the LUN offline in this situation?
@ 2009-10-01  5:46 G S
  2009-10-01  6:31 ` Joe Eykholt
  0 siblings, 1 reply; 5+ messages in thread
From: G S @ 2009-10-01  5:46 UTC (permalink / raw)
  To: linux-scsi

Howdy,

I have a Linux (2.6) using Emulex and QLogic FC HBA's to a disk array
product, with a single LUN presented, say LUN 1.

The dsf is created for LUN 1 and i can send SCSI commands to LUN 1.
And i'm using "sg".

If i delete LUN 1 from disk array.  Reboot the disk array.  Array
boots up only with LUN 0.

I have recreated LUN 1 on the target storage array.

But any attempt to send SCSI command to LUN 1 fails because LUN 1 has
been marked offline by SCSI mid layer.

Why?  Is it because RSCN seen by HBA driver is passed up to SCSI mid
layer to trigger re-scan?  And re-scan no longer finds LUN 1, so LUN 1
kernel structures are torned down, and LUN 1 marked offline by SCSI
mid layer?

Doing following to add back LUN 1 will bring it back for access,

# echo "scsi add-single-device <H> <B> <T> <L>" > /proc/scsi/scsi

Above "echo" seems to cause a blind re-scan by sending SCSI INQUIRY to
LUN 1 on the h/b/t/l hardware path.  That SCSI INQUIRY succeeds.  And
that success seems to cause LUN 1 to be marked online again.

Thanks much,

G

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why does SCSI mid layer mark the LUN offline in this situation?
  2009-10-01  5:46 Why does SCSI mid layer mark the LUN offline in this situation? G S
@ 2009-10-01  6:31 ` Joe Eykholt
  2009-10-01 14:18   ` James Smart
       [not found]   ` <4AC4B642.5050000@emulex.com>
  0 siblings, 2 replies; 5+ messages in thread
From: Joe Eykholt @ 2009-10-01  6:31 UTC (permalink / raw)
  To: G S; +Cc: linux-scsi

G S wrote:
> Howdy,
> 
> I have a Linux (2.6) using Emulex and QLogic FC HBA's to a disk array
> product, with a single LUN presented, say LUN 1.
> 
> The dsf is created for LUN 1 and i can send SCSI commands to LUN 1.
> And i'm using "sg".
> 
> If i delete LUN 1 from disk array.  Reboot the disk array.  Array
> boots up only with LUN 0.
> 
> I have recreated LUN 1 on the target storage array.
> 
> But any attempt to send SCSI command to LUN 1 fails because LUN 1 has
> been marked offline by SCSI mid layer.
> 
> Why?  Is it because RSCN seen by HBA driver is passed up to SCSI mid
> layer to trigger re-scan?  And re-scan no longer finds LUN 1, so LUN 1
> kernel structures are torned down, and LUN 1 marked offline by SCSI
> mid layer?

If I understand your sequence correctly, rebooting the disk array
would cause a RSCN to the HBA, and that would cause it to delete LUN 0 and 1.
When the disk array comes up and logs into the fabric again, another
RSCN goes to the HBA and it sees the target (array) and presents
it to the transport layer and SCSI.  It scans LUN0 (does REPORT LUNS)
and it reports no other LUNs.  No LUN 1 at this point.

Then you add LUN 1 on the array.  There's no event caused by that
as far as I know.   I'm not a complete expert on this and it
depends on your array, I think.  It may cause an check condition
on the next I/O that goes to LUN0, but that may never happen.
So nothing happens on the server.   It doesn't cause an RSCN because
the array didn't re-login to the fabric (that would be disruptive
for other initiators).

> Doing following to add back LUN 1 will bring it back for access,
> 
> # echo "scsi add-single-device <H> <B> <T> <L>" > /proc/scsi/scsi
> 
> Above "echo" seems to cause a blind re-scan by sending SCSI INQUIRY to
> LUN 1 on the h/b/t/l hardware path.  That SCSI INQUIRY succeeds.  And
> that success seems to cause LUN 1 to be marked online again.

OK.  I think you can also echo 1 to /sys/class/scsi_host/hostX/scan

I hope that helps and someone will correct me if any of this is wrong.

	Joe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why does SCSI mid layer mark the LUN offline in this situation?
  2009-10-01  6:31 ` Joe Eykholt
@ 2009-10-01 14:18   ` James Smart
       [not found]   ` <4AC4B642.5050000@emulex.com>
  1 sibling, 0 replies; 5+ messages in thread
From: James Smart @ 2009-10-01 14:18 UTC (permalink / raw)
  To: Joe Eykholt; +Cc: G S, linux-scsi@vger.kernel.org

Joe's description is correct.   Target-level change is detected by the 
transport, and the transport kicks off the scans.  Lun-level change is 
not detected by the transport, thus its up to the midlayer or admin to 
rescan. Currently, the midlayer doesn't understand the "luns changed" 
sense codes and does not rescan.  Thus you must use the steps indicated 
to scan  (please avoid anything in /proc as, as much as it continues to 
exist, it is being deprecated).

-- james s


Joe Eykholt wrote:
> G S wrote:
>   
>> Howdy,
>>
>> I have a Linux (2.6) using Emulex and QLogic FC HBA's to a disk array
>> product, with a single LUN presented, say LUN 1.
>>
>> The dsf is created for LUN 1 and i can send SCSI commands to LUN 1.
>> And i'm using "sg".
>>
>> If i delete LUN 1 from disk array.  Reboot the disk array.  Array
>> boots up only with LUN 0.
>>
>> I have recreated LUN 1 on the target storage array.
>>
>> But any attempt to send SCSI command to LUN 1 fails because LUN 1 has
>> been marked offline by SCSI mid layer.
>>
>> Why?  Is it because RSCN seen by HBA driver is passed up to SCSI mid
>> layer to trigger re-scan?  And re-scan no longer finds LUN 1, so LUN 1
>> kernel structures are torned down, and LUN 1 marked offline by SCSI
>> mid layer?
>>     
>
> If I understand your sequence correctly, rebooting the disk array
> would cause a RSCN to the HBA, and that would cause it to delete LUN 0 and 1.
> When the disk array comes up and logs into the fabric again, another
> RSCN goes to the HBA and it sees the target (array) and presents
> it to the transport layer and SCSI.  It scans LUN0 (does REPORT LUNS)
> and it reports no other LUNs.  No LUN 1 at this point.
>
> Then you add LUN 1 on the array.  There's no event caused by that
> as far as I know.   I'm not a complete expert on this and it
> depends on your array, I think.  It may cause an check condition
> on the next I/O that goes to LUN0, but that may never happen.
> So nothing happens on the server.   It doesn't cause an RSCN because
> the array didn't re-login to the fabric (that would be disruptive
> for other initiators).
>
>   
>> Doing following to add back LUN 1 will bring it back for access,
>>
>> # echo "scsi add-single-device <H> <B> <T> <L>" > /proc/scsi/scsi
>>
>> Above "echo" seems to cause a blind re-scan by sending SCSI INQUIRY to
>> LUN 1 on the h/b/t/l hardware path.  That SCSI INQUIRY succeeds.  And
>> that success seems to cause LUN 1 to be marked online again.
>>     
>
> OK.  I think you can also echo 1 to /sys/class/scsi_host/hostX/scan
>
> I hope that helps and someone will correct me if any of this is wrong.
>
> 	Joe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why does SCSI mid layer mark the LUN offline in this situation?
       [not found]   ` <4AC4B642.5050000@emulex.com>
@ 2009-10-01 18:08     ` G S
  2009-10-01 18:53       ` James Smart
  0 siblings, 1 reply; 5+ messages in thread
From: G S @ 2009-10-01 18:08 UTC (permalink / raw)
  To: James Smart; +Cc: Joe Eykholt, linux-scsi@vger.kernel.org

Joe, James, thanks for the replies.

Bit more follow up.

I understand about the target-level change is detected by the
transport.  And the transport kicks off the scans.  And the LUN level
change is not detected by the transport.

So, my follow up comments and questions are,

a) When the disk array comes back up from restart with LUN 1 deleted,
that rebooting will cause RSCN at the transport layer.

b) HBA driver kicks off the scans, and this is scans of FC ports and
not a scan for LUN-level changes, right?

c) Is this behavior i noted in (b) at the HBA driver different between
the "standard" versus "inbox" (aka. upstream) HBA drivers?

d) Does the HBA driver notify the SCSI mid layer to kick off LUN-level
scan, to look for LUN-level changes?

e) If the HBA driver does not notify the SCSI mid layer of transport
level change (from RSCN), then will SCSI mid layer continue to think
that LUN 1, and kernel structures for LUN 1 will still be intact in
the SCSI mid layer?

f) If SCSI mid layer still has LUN 1 marked online, then should the
application (using "sg" dsf) be able to access LUN 1 once the LUN 1 is
recreated on the disk array, without having to cause manual scan
through /proc ?

Thanks much,

G

On Thu, Oct 1, 2009 at 8:01 AM, James Smart <James.Smart@emulex.com> wrote:
>
> Joe's description is correct.   Target-level change is detected by the transport, and the transport kicks off the scans.  Lun-level change is not detected by the transport, thus its up to the midlayer or admin to rescan. Currently, the midlayer doesn't understand the "luns changed" sense codes and does not rescan.  Thus you must use the steps indicated to scan  (please avoid anything in /proc as, as much as it continues to exist, it is being deprecated).
>
> -- james s
>
>
> Joe Eykholt wrote:
>
> G S wrote:
>
>
> Howdy,
>
> I have a Linux (2.6) using Emulex and QLogic FC HBA's to a disk array
> product, with a single LUN presented, say LUN 1.
>
> The dsf is created for LUN 1 and i can send SCSI commands to LUN 1.
> And i'm using "sg".
>
> If i delete LUN 1 from disk array.  Reboot the disk array.  Array
> boots up only with LUN 0.
>
> I have recreated LUN 1 on the target storage array.
>
> But any attempt to send SCSI command to LUN 1 fails because LUN 1 has
> been marked offline by SCSI mid layer.
>
> Why?  Is it because RSCN seen by HBA driver is passed up to SCSI mid
> layer to trigger re-scan?  And re-scan no longer finds LUN 1, so LUN 1
> kernel structures are torned down, and LUN 1 marked offline by SCSI
> mid layer?
>
>
> If I understand your sequence correctly, rebooting the disk array
> would cause a RSCN to the HBA, and that would cause it to delete LUN 0 and 1.
> When the disk array comes up and logs into the fabric again, another
> RSCN goes to the HBA and it sees the target (array) and presents
> it to the transport layer and SCSI.  It scans LUN0 (does REPORT LUNS)
> and it reports no other LUNs.  No LUN 1 at this point.
>
> Then you add LUN 1 on the array.  There's no event caused by that
> as far as I know.   I'm not a complete expert on this and it
> depends on your array, I think.  It may cause an check condition
> on the next I/O that goes to LUN0, but that may never happen.
> So nothing happens on the server.   It doesn't cause an RSCN because
> the array didn't re-login to the fabric (that would be disruptive
> for other initiators).
>
>
>
> Doing following to add back LUN 1 will bring it back for access,
>
> # echo "scsi add-single-device <H> <B> <T> <L>" > /proc/scsi/scsi
>
> Above "echo" seems to cause a blind re-scan by sending SCSI INQUIRY to
> LUN 1 on the h/b/t/l hardware path.  That SCSI INQUIRY succeeds.  And
> that success seems to cause LUN 1 to be marked online again.
>
>
> OK.  I think you can also echo 1 to /sys/class/scsi_host/hostX/
> scan
>
> I hope that helps and someone will correct me if any of this is wrong.
>
> 	Joe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Why does SCSI mid layer mark the LUN offline in this situation?
  2009-10-01 18:08     ` G S
@ 2009-10-01 18:53       ` James Smart
  0 siblings, 0 replies; 5+ messages in thread
From: James Smart @ 2009-10-01 18:53 UTC (permalink / raw)
  To: G S; +Cc: Joe Eykholt, linux-scsi@vger.kernel.org



G S wrote:
> Joe, James, thanks for the replies.
>
> Bit more follow up.
>
> I understand about the target-level change is detected by the
> transport.  And the transport kicks off the scans.  And the LUN level
> change is not detected by the transport.
>
> So, my follow up comments and questions are,
>
> a) When the disk array comes back up from restart with LUN 1 deleted,
> that rebooting will cause RSCN at the transport layer.
>
> b) HBA driver kicks off the scans, and this is scans of FC ports and
> not a scan for LUN-level changes, right?
>   
The hba, after detecting/logging in to the remote port, adds the remote 
port to the transport. The transport then scans the port (aka fcp 
target), and the scan looks for all luns subject to the responses from 
Lun 0 (e.g. what scsi level, report luns support, etc).

> c) Is this behavior i noted in (b) at the HBA driver different between
> the "standard" versus "inbox" (aka. upstream) HBA drivers?
>   
In general, No, as most upstream drivers are also the inbox drivers. 
But, with older kernels/distros, you may not have the same feature 
level, so it may differ.

> d) Does the HBA driver notify the SCSI mid layer to kick off LUN-level
> scan, to look for LUN-level changes?
>   
In general, no, although, it could.

> e) If the HBA driver does not notify the SCSI mid layer of transport
> level change (from RSCN), then will SCSI mid layer continue to think
> that LUN 1, and kernel structures for LUN 1 will still be intact in
> the SCSI mid layer?
>   
Until the midlayer sees something from the hba/transport, or from errors 
reported on i/o's - yes.
> f) If SCSI mid layer still has LUN 1 marked online, then should the
> application (using "sg" dsf) be able to access LUN 1 once the LUN 1 is
> recreated on the disk array, without having to cause manual scan
> through /proc ?
>   
As long as there's no RSCN's, etc - just a change in lun state - yes.

-- james


> Thanks much,
>
> G
>
> On Thu, Oct 1, 2009 at 8:01 AM, James Smart <James.Smart@emulex.com> wrote:
>   
>> Joe's description is correct.   Target-level change is detected by the transport, and the transport kicks off the scans.  Lun-level change is not detected by the transport, thus its up to the midlayer or admin to rescan. Currently, the midlayer doesn't understand the "luns changed" sense codes and does not rescan.  Thus you must use the steps indicated to scan  (please avoid anything in /proc as, as much as it continues to exist, it is being deprecated).
>>
>> -- james s
>>
>>
>> Joe Eykholt wrote:
>>
>> G S wrote:
>>
>>
>> Howdy,
>>
>> I have a Linux (2.6) using Emulex and QLogic FC HBA's to a disk array
>> product, with a single LUN presented, say LUN 1.
>>
>> The dsf is created for LUN 1 and i can send SCSI commands to LUN 1.
>> And i'm using "sg".
>>
>> If i delete LUN 1 from disk array.  Reboot the disk array.  Array
>> boots up only with LUN 0.
>>
>> I have recreated LUN 1 on the target storage array.
>>
>> But any attempt to send SCSI command to LUN 1 fails because LUN 1 has
>> been marked offline by SCSI mid layer.
>>
>> Why?  Is it because RSCN seen by HBA driver is passed up to SCSI mid
>> layer to trigger re-scan?  And re-scan no longer finds LUN 1, so LUN 1
>> kernel structures are torned down, and LUN 1 marked offline by SCSI
>> mid layer?
>>
>>
>> If I understand your sequence correctly, rebooting the disk array
>> would cause a RSCN to the HBA, and that would cause it to delete LUN 0 and 1.
>> When the disk array comes up and logs into the fabric again, another
>> RSCN goes to the HBA and it sees the target (array) and presents
>> it to the transport layer and SCSI.  It scans LUN0 (does REPORT LUNS)
>> and it reports no other LUNs.  No LUN 1 at this point.
>>
>> Then you add LUN 1 on the array.  There's no event caused by that
>> as far as I know.   I'm not a complete expert on this and it
>> depends on your array, I think.  It may cause an check condition
>> on the next I/O that goes to LUN0, but that may never happen.
>> So nothing happens on the server.   It doesn't cause an RSCN because
>> the array didn't re-login to the fabric (that would be disruptive
>> for other initiators).
>>
>>
>>
>> Doing following to add back LUN 1 will bring it back for access,
>>
>> # echo "scsi add-single-device <H> <B> <T> <L>" > /proc/scsi/scsi
>>
>> Above "echo" seems to cause a blind re-scan by sending SCSI INQUIRY to
>> LUN 1 on the h/b/t/l hardware path.  That SCSI INQUIRY succeeds.  And
>> that success seems to cause LUN 1 to be marked online again.
>>
>>
>> OK.  I think you can also echo 1 to /sys/class/scsi_host/hostX/
>> scan
>>
>> I hope that helps and someone will correct me if any of this is wrong.
>>
>> 	Joe
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>     
>
>   

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-10-01 18:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-01  5:46 Why does SCSI mid layer mark the LUN offline in this situation? G S
2009-10-01  6:31 ` Joe Eykholt
2009-10-01 14:18   ` James Smart
     [not found]   ` <4AC4B642.5050000@emulex.com>
2009-10-01 18:08     ` G S
2009-10-01 18:53       ` James Smart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).