[multipath] SCSI device capacity mess

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [multipath] SCSI device capacity mess
@ 2004-10-26 23:27 christophe varoqui
  2004-10-26 21:34 ` James Bottomley
  2004-10-27  8:17 ` [dm-devel] " Lars Marowsky-Bree
  0 siblings, 2 replies; 23+ messages in thread
From: christophe varoqui @ 2004-10-26 23:27 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi@vger.kernel.org

Let me present an interesting problem :

Some SAN hardware present for a same LUN a bunch of valid paths and  a
bunch ghost paths. In my case, the ghosts responds to standard INQUIRY
(EVPD 0x83, 0x80, ...) but the READ_CAPACITY fails :

Attached scsi generic sg29 at scsi2, channel 0, id 3, lun 0,  type 12
  Vendor: DEC       Model: HSG80             Rev: V86S
  Type:   Direct-Access                      ANSI SCSI revision: 02
sdw : READ CAPACITY failed.
sdw : status=0, message=00, host=0, driver=08 
Current sd: sense key Not Ready
Additional sense: Logical unit not ready, initializing cmd. required
sdw: asking for cache data failed
sdw: assuming drive cache: write through
 /dev/scsi/host2/bus0/target3/lun1:<6>Device sdw not ready.
end_request: I/O error, dev sdw, sector 0
Buffer I/O error on device sdw, logical block 0
Device sdw not ready.
end_request: I/O error, dev sdw, sector 0
Buffer I/O error on device sdw, logical block 0
 unable to read partition table
Attached scsi disk sdw at scsi2, channel 0, id 3, lun 1

Somehow the sysfs attribute gets set to a random value :

# cat /sys/block/sdw/size
2097152

Say /dev/sdq is valid path to the same LUN. Its size is rightly reported
as 213338334 512-byte blocks.

Now when I want to map a multipath target over those two paths, the
device-mapper faithfully rejects saying /dev/sdw is too small for a
213338334 blocks devmap.

So what is the correct way to treat this situation ?

1) make the /sys/block/*/size attribute writable
2) resurrect a BLKSETSIZE ioctl
3) make device-mapper less strict, and hope we can fix the size by a
device rescan when it get activated
4) sell the culprit hardware

Other ideas ? Can someone provide help and kernel coding skills ?

regards,
-- 
christophe varoqui <christophe.varoqui@free.fr>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [multipath] SCSI device capacity mess
  2004-10-26 23:27 [multipath] SCSI device capacity mess christophe varoqui
@ 2004-10-26 21:34 ` James Bottomley
  2004-10-26 21:46   ` christophe varoqui
  2004-10-27  8:17 ` [dm-devel] " Lars Marowsky-Bree
  1 sibling, 1 reply; 23+ messages in thread
From: James Bottomley @ 2004-10-26 21:34 UTC (permalink / raw)
  To: christophe varoqui; +Cc: device-mapper development, linux-scsi@vger.kernel.org

On Tue, 2004-10-26 at 19:27, christophe varoqui wrote:
> 1) make the /sys/block/*/size attribute writable
> 2) resurrect a BLKSETSIZE ioctl
> 3) make device-mapper less strict, and hope we can fix the size by a
> device rescan when it get activated
> 4) sell the culprit hardware

Only 3 & 4 look viable.  There's little point introducing an extra API
or ioctl specifically to defeat a check, since the check will now be
useless anyway.

The "random" value is 1GB by the way.  Since the dawn of time SCSI has
reported this for devices that failed read capacity.

James




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [multipath] SCSI device capacity mess
  2004-10-26 21:34 ` James Bottomley
@ 2004-10-26 21:46   ` christophe varoqui
  0 siblings, 0 replies; 23+ messages in thread
From: christophe varoqui @ 2004-10-26 21:46 UTC (permalink / raw)
  To: James Bottomley; +Cc: device-mapper development, linux-scsi@vger.kernel.org

Le mardi 26 octobre 2004 à 17:34 -0400, James Bottomley a écrit :
> On Tue, 2004-10-26 at 19:27, christophe varoqui wrote:
> > 1) make the /sys/block/*/size attribute writable
> > 2) resurrect a BLKSETSIZE ioctl
> > 3) make device-mapper less strict, and hope we can fix the size by a
> > device rescan when it get activated
> > 4) sell the culprit hardware
> 
> Only 3 & 4 look viable.  There's little point introducing an extra API
> or ioctl specifically to defeat a check, since the check will now be
> useless anyway.
> 
:) I feared you would vote for 4)
That is not that viable for me, as you can imagine.

The culprit hardware, should I have reported, is a HSG80 controler pair
configured in multibus mode.

I though you could have considered 1) as it is just another way for root
to shoot itself and the multipath daemon can apply to detected valid
size on other paths to apply it to ghost paths. That didn't seem so
awkward to me.

I fear Alasdair reaction to 3) will be cold ... I still hope we can come
out with a solution, even one I havn't thought about.

> The "random" value is 1GB by the way.  Since the dawn of time SCSI has
> reported this for devices that failed read capacity.
> 
Ah, good to know. Thanks

regards,
-- 
christophe varoqui <christophe.varoqui@free.fr>

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-26 23:27 [multipath] SCSI device capacity mess christophe varoqui
  2004-10-26 21:34 ` James Bottomley
@ 2004-10-27  8:17 ` Lars Marowsky-Bree
  2004-10-27  8:42   ` Lars Marowsky-Bree
  2004-10-27 19:02   ` christophe varoqui
  1 sibling, 2 replies; 23+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-27  8:17 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi@vger.kernel.org

On 2004-10-27T01:27:57, christophe varoqui <christophe.varoqui@free.fr> wrote:

Hi christophe, let's see when I can flush this answer out of my e-mail
cache - German trains really need WLAN hotspots ;-)

> Some SAN hardware present for a same LUN a bunch of valid paths and  a
> bunch ghost paths. In my case, the ghosts responds to standard INQUIRY
> (EVPD 0x83, 0x80, ...) but the READ_CAPACITY fails :

As a note, this is one mode the EMC CLARiiON arrays can also operate in.
Even worse, they won't present the block device at all, just the SCSI
generic mode. However, for the CLARiiONs, they can be configured to
behave sanely and reply to a READ_CAPACITY too (just all I/O will be
errored), if setting the failovermode to 1.

I wonder whether your system can also be configured as such?

> Now when I want to map a multipath target over those two paths, the
> device-mapper faithfully rejects saying /dev/sdw is too small for a
> 213338334 blocks devmap.
> 
> So what is the correct way to treat this situation ?
> 
> 1) make the /sys/block/*/size attribute writable
> 2) resurrect a BLKSETSIZE ioctl
> 3) make device-mapper less strict, and hope we can fix the size by a
> device rescan when it get activated
> 4) sell the culprit hardware

Personally I would opt for 4), but 3) is the likely path to solve this.

Using the new priority group initialization code (where we sent magic
commands down to activate the newly switched-to PG) which Alasdair and I
are currently doing for the CLARiiON pampering and which provides a
plugin-architecture to the dm-mpath system, you should be able to plug
in a hardware-specific handler for your system too.

However, "relaxing" this check should likely also be a property of the
hardware plugin loaded; I'd not wish to have it relaxed in all
scenarios.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27  8:17 ` [dm-devel] " Lars Marowsky-Bree
@ 2004-10-27  8:42   ` Lars Marowsky-Bree
  2004-10-27 18:51     ` Bryan Henderson
  2004-10-27 19:02   ` christophe varoqui
  1 sibling, 1 reply; 23+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-27  8:42 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi@vger.kernel.org

On 2004-10-27T10:17:13, Lars Marowsky-Bree <lmb@suse.de> wrote:

> Using the new priority group initialization code (where we sent magic
> commands down to activate the newly switched-to PG) which Alasdair and I
> are currently doing for the CLARiiON pampering and which provides a
> plugin-architecture to the dm-mpath system, you should be able to plug
> in a hardware-specific handler for your system too.
> 
> However, "relaxing" this check should likely also be a property of the
> hardware plugin loaded; I'd not wish to have it relaxed in all
> scenarios.

-> have the hardware-specific plugin export a path_init() function which
is called the first time a path is added to the table (or even on
reinstate?).

In addition verifying the size and stuff it could also check that the
paths really (still) point to the same device (by storing the LUN WWN in
the hw_handler context, for example), or that the path is setup
correctly for failover to work etc (of course, the list of things to
verify is a hw-specific issue). Paranoia is a good thing.


Mit freundlichen Grüßen,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27  8:42   ` Lars Marowsky-Bree
@ 2004-10-27 18:51     ` Bryan Henderson
  2004-10-29 14:12       ` Lars Marowsky-Bree
  0 siblings, 1 reply; 23+ messages in thread
From: Bryan Henderson @ 2004-10-27 18:51 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: device-mapper development, linux-scsi@vger.kernel.org,
	linux-scsi-owner

>In addition verifying the size and stuff it could also check that the
>paths really (still) point to the same device (by storing the LUN WWN in
>the hw_handler context, for example)

Just to eliminate any ambiguity here, using terminology from the 
standards:  A LUN is a logical unit number.  A WWN is a fibre channel port 
or node name.  There's no such thing as a LUN WWN.  I presume you mean to 
check the logical unit device identifier, which is the world wide unique 
and persistent identifier of a logical unit.  (That's what you get (among 
other things) from Page 0x83 of the SCSI VPD read from the LU).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27 18:51     ` Bryan Henderson
@ 2004-10-29 14:12       ` Lars Marowsky-Bree
  2004-10-29 16:48         ` Bryan Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-29 14:12 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi-owner, linux-scsi@vger.kernel.org

On 2004-10-27T11:51:42, Bryan Henderson <hbryan@us.ibm.com> wrote:

> >In addition verifying the size and stuff it could also check that the
> >paths really (still) point to the same device (by storing the LUN WWN in
> >the hw_handler context, for example)
> 
> Just to eliminate any ambiguity here, using terminology from the 
> standards:  A LUN is a logical unit number.  A WWN is a fibre channel port 
> or node name.  There's no such thing as a LUN WWN.  I presume you mean to 
> check the logical unit device identifier, which is the world wide unique 
> and persistent identifier of a logical unit.  (That's what you get (among 
> other things) from Page 0x83 of the SCSI VPD read from the LU).

Thanks for the clarification; EMC at least partially calls said
world-wide unique identifier the "LUN's WWN" though. And if we are
nitpicking, it's EVPD. ;-)



Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-29 14:12       ` Lars Marowsky-Bree
@ 2004-10-29 16:48         ` Bryan Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Bryan Henderson @ 2004-10-29 16:48 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: device-mapper development, linux-scsi@vger.kernel.org

>EMC at least partially calls said
>world-wide unique identifier the "LUN's WWN" though.

I'm not surprised.  We all know how common it is to call a logical unit 
(LU) a LUN, and I've also seen the made-up "LUN ID" (and of course the 
truly ridiculous "LUN number," loved by all those people who key their PIN 
number into the ATM machine).

>And if we are nitpicking, it's EVPD. ;-)

Oh, we're definitely nitpicking, but EVPD is the abbreviation for the 
"Enable Vital Product Data" bit in an Inquiry CDB that tells the target 
you're asking for the Vital Product Data (VPD).  At least that's how the 
SCSI standard uses it.  Maybe EMC has its own terminology here too.

--
Bryan Henderson                          IBM Almaden Research Center
San Jose CA                              Filesystems

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27  8:17 ` [dm-devel] " Lars Marowsky-Bree
  2004-10-27  8:42   ` Lars Marowsky-Bree
@ 2004-10-27 19:02   ` christophe varoqui
  2004-10-27 19:37     ` Eddie Williams
                       ` (3 more replies)
  1 sibling, 4 replies; 23+ messages in thread
From: christophe varoqui @ 2004-10-27 19:02 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi@vger.kernel.org


> > Some SAN hardware present for a same LUN a bunch of valid paths and  a
> > bunch ghost paths. In my case, the ghosts responds to standard INQUIRY
> > (EVPD 0x83, 0x80, ...) but the READ_CAPACITY fails :
> 
> As a note, this is one mode the EMC CLARiiON arrays can also operate in.
> Even worse, they won't present the block device at all, just the SCSI
> generic mode. However, for the CLARiiONs, they can be configured to
> behave sanely and reply to a READ_CAPACITY too (just all I/O will be
> errored), if setting the failovermode to 1.
> 
> I wonder whether your system can also be configured as such?
> 
Yes it could, but it's a controler wide setting.

Compatibility with other OS sharing the same controlers might impose
this mode though. So I'd like to straight this situation up.

> > 
> > 1) make the /sys/block/*/size attribute writable
> > 2) resurrect a BLKSETSIZE ioctl
> > 3) make device-mapper less strict, and hope we can fix the size by a
> > device rescan when it get activated
> > 4) sell the culprit hardware
> 
> Personally I would opt for 4), but 3) is the likely path to solve this.
> 
> Using the new priority group initialization code (where we sent magic
> commands down to activate the newly switched-to PG) which Alasdair and I
> are currently doing for the CLARiiON pampering and which provides a
> plugin-architecture to the dm-mpath system, you should be able to plug
> in a hardware-specific handler for your system too.
> 
> However, "relaxing" this check should likely also be a property of the
> hardware plugin loaded; I'd not wish to have it relaxed in all
> scenarios.
> 
I wonder if it's not simpler just to remove the NOSTARTONADD flag on
this devices in scsi_devinfo.c. I tested that and all the READ CAPACITY
succeed as expected (DEC HSG80 / COMPAQ HSV*).

Wasn't this flag in part motivated by the lack of multipath support
anyway ?

Even in a cluster context, I don't really buy the annoyance of
occasional LUN ping-pong.

regards,
-- 
christophe varoqui <christophe.varoqui@free.fr>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27 19:02   ` christophe varoqui
@ 2004-10-27 19:37     ` Eddie Williams
  2004-10-27 20:19       ` christophe varoqui
  2004-10-27 20:28     ` Philip R Auld
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Eddie Williams @ 2004-10-27 19:37 UTC (permalink / raw)
  To: christophe varoqui; +Cc: device-mapper development, linux-scsi@vger.kernel.org

On Wed, 2004-10-27 at 15:02, christophe varoqui wrote:

> > 
> I wonder if it's not simpler just to remove the NOSTARTONADD flag on
> this devices in scsi_devinfo.c. I tested that and all the READ CAPACITY
> succeed as expected (DEC HSG80 / COMPAQ HSV*).
> 
> Wasn't this flag in part motivated by the lack of multipath support
> anyway ?
> 
> Even in a cluster context, I don't really buy the annoyance of
> occasional LUN ping-pong.

Yes this change was instigated due to lack of multipath support.  Where
the Qlogic Failover driver intercepts start unit commands this change
would not be necessary.  But with the other multipath endeavors I am not
sure they will affect the need for this feature?

So to the reason for the change, with a single server involved there is
no concern issuing the start unit.  With 2 servers involved it is an
annoyance that can be overlooked for the most part.  However when > 2
servers are involved and especially if the number of LUNs involved is
significant, say > 16, then the annoyance increases such that it is not
easily overlooked.

Switching LUNs from one path to another is not a fast process and if
this happens during periods of heaving IO periods, significant thrashing
can result.

I think if we want to play at the Enterprise level where there are many
servers connected up to large arrays I don't think it is a good idea to
knowingly cause disruptions to others on the SAN.  Causing LUNs to
switch from one path to another is such a behavior.

Eddie Williams

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27 19:37     ` Eddie Williams
@ 2004-10-27 20:19       ` christophe varoqui
  2004-10-27 20:34         ` Greg Freemyer
  0 siblings, 1 reply; 23+ messages in thread
From: christophe varoqui @ 2004-10-27 20:19 UTC (permalink / raw)
  To: Eddie Williams; +Cc: device-mapper development, linux-scsi@vger.kernel.org


> Switching LUNs from one path to another is not a fast process and if
> this happens during periods of heaving IO periods, significant thrashing
> can result.
> 
> I think if we want to play at the Enterprise level where there are many
> servers connected up to large arrays I don't think it is a good idea to
> knowingly cause disruptions to others on the SAN.  Causing LUNs to
> switch from one path to another is such a behavior.
> 
I certainly don't militate for the NOSTARTONADD flag removal for all
devices. I just consider it annoying for the HSG80 / HSV* family
controlers :

- Either they are in failover mode and there is no bouncing possible, so
start_stop is harmless

- Either they are in multibus mode, then not sending start_stop before
READ CAPA lead to cmd failure and wrong size stored. Hence ghosts path
are ununsable even when they come up. [Here, I shamely hide the
device-mapper hooks on path activation could work]

- Either way, these controler families are *not* high-end devices,
whatever the criteria (capacity, throughput, cache, alarming, ...). The
whole ghost-path notion being the best evidence.

regards,
-- 
christophe varoqui <christophe.varoqui@free.fr>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27 20:19       ` christophe varoqui
@ 2004-10-27 20:34         ` Greg Freemyer
  0 siblings, 0 replies; 23+ messages in thread
From: Greg Freemyer @ 2004-10-27 20:34 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi@vger.kernel.org

> I just consider it annoying for the HSG80 / HSV* family
> controlers :
> 

<snip>

> - Either way, these controler families are *not* high-end devices,
> whatever the criteria (capacity, throughput, cache, alarming, ...). The
> whole ghost-path notion being the best evidence.
> 

HP will not agree with "not high-end devices".  The HSV line in
particular is a very good performing system and is designed to compete
with the EMC Symmetrix line.

The EVA (HSV based) is definately targeted at Data Centers with lots
of servers using it as back-end storage simultaneously.

Greg
-- 
Greg Freemyer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27 19:02   ` christophe varoqui
  2004-10-27 19:37     ` Eddie Williams
@ 2004-10-27 20:28     ` Philip R Auld
  2004-10-27 21:57     ` James Bottomley
  2004-10-28 11:35     ` Lars Marowsky-Bree
  3 siblings, 0 replies; 23+ messages in thread
From: Philip R Auld @ 2004-10-27 20:28 UTC (permalink / raw)
  To: christophe varoqui; +Cc: device-mapper development, linux-scsi@vger.kernel.org

Rumor has it that on Wed, Oct 27, 2004 at 09:02:39PM +0200 christophe varoqui said:
> 
> > > Some SAN hardware present for a same LUN a bunch of valid paths and  a
> > > bunch ghost paths. In my case, the ghosts responds to standard INQUIRY
> > > (EVPD 0x83, 0x80, ...) but the READ_CAPACITY fails :
> > 
> > As a note, this is one mode the EMC CLARiiON arrays can also operate in.
> > Even worse, they won't present the block device at all, just the SCSI
> > generic mode. However, for the CLARiiONs, they can be configured to
> > behave sanely and reply to a READ_CAPACITY too (just all I/O will be
> > errored), if setting the failovermode to 1.
> > 
> > I wonder whether your system can also be configured as such?
> > 
> Yes it could, but it's a controler wide setting.
> 

So is the CLARiiON setting. 

> Compatibility with other OS sharing the same controlers might impose
> this mode though. So I'd like to straight this situation up.
> 

I suspect anything else doing failover on this device would already require 
that setting. And if not doing failover probably won't see the passive 
controller due to cabling/zoning setup.

I don't see anything wrong with asking for specific array settings if they are
needed for multipathing. Some arrays don't return meaningful LU IDs if not 
configured for it. 


> > > 
> > > 1) make the /sys/block/*/size attribute writable
> > > 2) resurrect a BLKSETSIZE ioctl
> > > 3) make device-mapper less strict, and hope we can fix the size by a
> > > device rescan when it get activated
> > > 4) sell the culprit hardware
> > 
> > Personally I would opt for 4), but 3) is the likely path to solve this.
> > 
> > Using the new priority group initialization code (where we sent magic
> > commands down to activate the newly switched-to PG) which Alasdair and I
> > are currently doing for the CLARiiON pampering and which provides a
> > plugin-architecture to the dm-mpath system, you should be able to plug
> > in a hardware-specific handler for your system too.
> > 
> > However, "relaxing" this check should likely also be a property of the
> > hardware plugin loaded; I'd not wish to have it relaxed in all
> > scenarios.
> > 
> I wonder if it's not simpler just to remove the NOSTARTONADD flag on
> this devices in scsi_devinfo.c. I tested that and all the READ CAPACITY
> succeed as expected (DEC HSG80 / COMPAQ HSV*).
> 

Doesnt that just failover the LUN with the START? I think you'd end up 
with all LUNs active on which ever controller you scanned last. This may 
not be ideal.

And that wouldn't work for a (misconfigured) CLARiiON anyway, I think.


> Wasn't this flag in part motivated by the lack of multipath support
> anyway ?
> 
> Even in a cluster context, I don't really buy the annoyance of
> occasional LUN ping-pong.
> 

This is very possible. You can bring many active/passive arrays to their 
knees if you flip it back and forth under load. Although, I don't see that 
this is specifc to the READ_CAPACITY issue.  


Cheers,

Phil

> regards,
> -- 
> christophe varoqui <christophe.varoqui@free.fr>
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27 19:02   ` christophe varoqui
  2004-10-27 19:37     ` Eddie Williams
  2004-10-27 20:28     ` Philip R Auld
@ 2004-10-27 21:57     ` James Bottomley
  2004-10-28 11:37       ` Lars Marowsky-Bree
  2004-10-28 11:35     ` Lars Marowsky-Bree
  3 siblings, 1 reply; 23+ messages in thread
From: James Bottomley @ 2004-10-27 21:57 UTC (permalink / raw)
  To: christophe varoqui; +Cc: device-mapper development, linux-scsi@vger.kernel.org

On Wed, 2004-10-27 at 15:02, christophe varoqui wrote:
> I wonder if it's not simpler just to remove the NOSTARTONADD flag on
> this devices in scsi_devinfo.c. I tested that and all the READ CAPACITY
> succeed as expected (DEC HSG80 / COMPAQ HSV*).
> 
> Wasn't this flag in part motivated by the lack of multipath support
> anyway ?
> 
> Even in a cluster context, I don't really buy the annoyance of
> occasional LUN ping-pong.

It's not occasional, it happens every time a single machine in the
cluster reboots and the HSG80 can take a while to transfer luns. 
Configure one with several hundred luns in a 32 node cluster and you'll
see why this flag exists...

James



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27 21:57     ` James Bottomley
@ 2004-10-28 11:37       ` Lars Marowsky-Bree
  2004-10-28 18:14         ` Patrick Mansfield
  0 siblings, 1 reply; 23+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-28 11:37 UTC (permalink / raw)
  To: device-mapper development, christophe varoqui; +Cc: linux-scsi@vger.kernel.org

On 2004-10-27T17:57:21, James Bottomley <James.Bottomley@SteelEye.com> wrote:

> It's not occasional, it happens every time a single machine in the
> cluster reboots and the HSG80 can take a while to transfer luns. 
> Configure one with several hundred luns in a 32 node cluster and you'll
> see why this flag exists...

While we are at that, ghost (or non-active) paths lead to _tons_ of
errors while the kernel stubbornly tries to read the partition table,
totally flooding any useful information out of the kernel buffers.

Can't we do a test-unit-ready before trying to read and then just not
read the partition table w/o so much noise?


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-28 11:37       ` Lars Marowsky-Bree
@ 2004-10-28 18:14         ` Patrick Mansfield
  2004-10-28 18:21           ` Lars Marowsky-Bree
  0 siblings, 1 reply; 23+ messages in thread
From: Patrick Mansfield @ 2004-10-28 18:14 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: device-mapper development, christophe varoqui,
	linux-scsi@vger.kernel.org

On Thu, Oct 28, 2004 at 01:37:28PM +0200, Lars Marowsky-Bree wrote:

> 
> Can't we do a test-unit-ready before trying to read and then just not
> read the partition table w/o so much noise?

We already issue a TUR in sd_spinup_disk() prior to the READ CAPACITY and
partition reads.

We might not be able to even black list them and not excute specific code
if these devices can be configured differently - they might need special
code (yuck) to query the device and figure out the configuration.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-28 18:14         ` Patrick Mansfield
@ 2004-10-28 18:21           ` Lars Marowsky-Bree
  2004-10-30  0:41             ` Patrick Mansfield
  0 siblings, 1 reply; 23+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-28 18:21 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: device-mapper development, christophe varoqui,
	linux-scsi@vger.kernel.org

On 2004-10-28T11:14:36, Patrick Mansfield <patmans@us.ibm.com> wrote:

> > Can't we do a test-unit-ready before trying to read and then just not
> > read the partition table w/o so much noise?
> We already issue a TUR in sd_spinup_disk() prior to the READ CAPACITY and
> partition reads.

I'm just saying we should stop on the first failed command (TUR) and not
retry to read all 64 sectors regardless. I can live with one error per
passive path, but 64 really kills the logs.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-28 18:21           ` Lars Marowsky-Bree
@ 2004-10-30  0:41             ` Patrick Mansfield
  2004-10-30  1:01               ` Patrick Mansfield
  2004-10-30  7:21               ` christophe varoqui
  0 siblings, 2 replies; 23+ messages in thread
From: Patrick Mansfield @ 2004-10-30  0:41 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: device-mapper development, christophe varoqui,
	linux-scsi@vger.kernel.org

On Thu, Oct 28, 2004 at 08:21:54PM +0200, Lars Marowsky-Bree wrote:
> On 2004-10-28T11:14:36, Patrick Mansfield <patmans@us.ibm.com> wrote:
> 
> > > Can't we do a test-unit-ready before trying to read and then just not
> > > read the partition table w/o so much noise?
> > We already issue a TUR in sd_spinup_disk() prior to the READ CAPACITY and
> > partition reads.
> 
> I'm just saying we should stop on the first failed command (TUR) and not
> retry to read all 64 sectors regardless. I can live with one error per
> passive path, but 64 really kills the logs.

That makes sense.

What is the result of the TUR? I did not see any information about it in other posts in this thread.

We should only send the READ CAPACTIY if the TUR says the device is
ready. Maybe the code in media_not_present() is wrong and causing a
problem.

This code in sd.c:

static int media_not_present(struct scsi_disk *sdkp, struct scsi_request *srp)
{
        if (!srp->sr_result)
                return 0;
        if (!(driver_byte(srp->sr_result) & DRIVER_SENSE))
                return 0;
        if (srp->sr_sense_buffer[2] != NOT_READY &&
            srp->sr_sense_buffer[2] != UNIT_ATTENTION)
                return 0;
        if (srp->sr_sense_buffer[12] != 0x3A) /* medium not present */
                return 0;

        set_media_not_present(sdkp);
        return 1;
}

For example if we got sr_result of a DID_NO_CONNECT or other error that
does not end up setting DRIVER_SENSE, driver_byte(sr_result) could be
0, and then we would return 0. 

Agree?

And then we allow a READ CAPACITY later on if media is present.

Maybe we have a similiar problem with the TUR to this device.

We should change state somehow if the TUR failed or was not ready, maybe
call set_media_not_present(), or maybe change the sdev state to
SDEV_OFFLINE - though that might cause extra logging.

We might not normally every hit this since the scan (INQUIRY failure)
would likely prevent the device from showing up at all. 

Plus, we only call sd_spinup_disk() during discovery and not on open,
unlike the calls to sd_media_changed().

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-30  0:41             ` Patrick Mansfield
@ 2004-10-30  1:01               ` Patrick Mansfield
  2004-10-30  7:21               ` christophe varoqui
  1 sibling, 0 replies; 23+ messages in thread
From: Patrick Mansfield @ 2004-10-30  1:01 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: device-mapper development, christophe varoqui,
	linux-scsi@vger.kernel.org

On Fri, Oct 29, 2004 at 05:41:31PM -0700, Patrick Mansfield wrote:
> 
> What is the result of the TUR? I did not see any information about it in other posts in this thread.
> 

Also, if the device is already showing up you can just send a TUR via sg,
you don't have to hack sd.c.

I think the sg_turs command in sg_utils might tell us enough information
on error, though I don't have any non-ready or borken devices readily
available.

elm3a211:~ # sg_turs  /dev/sdb
Completed 1 Test Unit Ready commands with 0 errors

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-30  0:41             ` Patrick Mansfield
  2004-10-30  1:01               ` Patrick Mansfield
@ 2004-10-30  7:21               ` christophe varoqui
  2004-10-30  8:22                 ` christophe varoqui
  2004-11-02 15:23                 ` Lars Marowsky-Bree
  1 sibling, 2 replies; 23+ messages in thread
From: christophe varoqui @ 2004-10-30  7:21 UTC (permalink / raw)
  To: device-mapper development; +Cc: Lars Marowsky-Bree, linux-scsi@vger.kernel.org


> > I'm just saying we should stop on the first failed command (TUR) and not
> > retry to read all 64 sectors regardless. I can live with one error per
> > passive path, but 64 really kills the logs.
> 
> That makes sense.
> 
> What is the result of the TUR? I did not see any information about it in other posts in this thread.
> 

the tur checker in multipath-tools report ghosts as failing.

> We should change state somehow if the TUR failed or was not ready, maybe
> call set_media_not_present(), or maybe change the sdev state to
> SDEV_OFFLINE - though that might cause extra logging.
> 
> We might not normally every hit this since the scan (INQUIRY failure)
> would likely prevent the device from showing up at all. 

scsi_id and INQUIRY succeed on ghost paths, which allow multipath to
group them with valid paths to the same LU.

> Plus, we only call sd_spinup_disk() during discovery and not on open,
> unlike the calls to sd_media_changed().
> 
Would it be workable to add a scsi_devinfo flag for devices with ghosts.
If this flag is set and state is down, open calls sd_media_changed().
The multipath-tools and device-mapper will try as hard as possible to
avoid opening those device if not necessary, by grouping them in a low
priority path group.

regards,
-- 
christophe varoqui <christophe.varoqui@free.fr>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-30  7:21               ` christophe varoqui
@ 2004-10-30  8:22                 ` christophe varoqui
  2004-11-02 15:23                 ` Lars Marowsky-Bree
  1 sibling, 0 replies; 23+ messages in thread
From: christophe varoqui @ 2004-10-30  8:22 UTC (permalink / raw)
  To: device-mapper development; +Cc: Lars Marowsky-Bree, linux-scsi@vger.kernel.org


> > Plus, we only call sd_spinup_disk() during discovery and not on open,
> > unlike the calls to sd_media_changed().
> > 
> Would it be workable to add a scsi_devinfo flag for devices with ghosts.
> If this flag is set and state is down, open calls sd_media_changed().
> The multipath-tools and device-mapper will try as hard as possible to
> avoid opening those device if not necessary, by grouping them in a low
> priority path group.
> 
In fact the NOSTARTONADD flag seems just used for devices with ghosts
paths. So we could extend its logic : no start on add, but start on IO
is state is down.

SG_IO should not activate the path though, for the path checker not to
activate it.

regards,
-- 
christophe varoqui <christophe.varoqui@free.fr>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-30  7:21               ` christophe varoqui
  2004-10-30  8:22                 ` christophe varoqui
@ 2004-11-02 15:23                 ` Lars Marowsky-Bree
  1 sibling, 0 replies; 23+ messages in thread
From: Lars Marowsky-Bree @ 2004-11-02 15:23 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi@vger.kernel.org

On 2004-10-30T09:21:10, christophe varoqui <christophe.varoqui@free.fr> wrote:

> > What is the result of the TUR? I did not see any information about
> > it in other posts in this thread.
> > 
> the tur checker in multipath-tools report ghosts as failing.

Note: This is about the kernel-internal "I'll try to read the partition
table of every block device I see, no matter what errors I get, and
NOTHING WILL STOP ME BWAHAHAHA", not about the multipath-tools. ;-)

(A behaviour which tends to mess up the logfiles quite badly.)

> > We might not normally every hit this since the scan (INQUIRY failure)
> > would likely prevent the device from showing up at all. 
> scsi_id and INQUIRY succeed on ghost paths, which allow multipath to
> group them with valid paths to the same LU.

Yeah, multipath-tools et al are behaving quite correctly for these
scenarios.

> > Plus, we only call sd_spinup_disk() during discovery and not on open,
> > unlike the calls to sd_media_changed().
> Would it be workable to add a scsi_devinfo flag for devices with ghosts.

No. I think the kernel should simply notice that the TUR failed, or
abort reading the partition table on the first error. This would at
least automatically detect these cases w/o requiring explicit
blacklisting.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dm-devel] [multipath] SCSI device capacity mess
  2004-10-27 19:02   ` christophe varoqui
                       ` (2 preceding siblings ...)
  2004-10-27 21:57     ` James Bottomley
@ 2004-10-28 11:35     ` Lars Marowsky-Bree
  3 siblings, 0 replies; 23+ messages in thread
From: Lars Marowsky-Bree @ 2004-10-28 11:35 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi@vger.kernel.org

On 2004-10-27T21:02:39, christophe varoqui <christophe.varoqui@free.fr> wrote:

> > As a note, this is one mode the EMC CLARiiON arrays can also operate in.
> > Even worse, they won't present the block device at all, just the SCSI
> > generic mode. However, for the CLARiiONs, they can be configured to
> > behave sanely and reply to a READ_CAPACITY too (just all I/O will be
> > errored), if setting the failovermode to 1.
> > 
> > I wonder whether your system can also be configured as such?
> Yes it could, but it's a controler wide setting.

Not per LUN? Too bad.

> Compatibility with other OS sharing the same controlers might impose
> this mode though. So I'd like to straight this situation up.

Then I think hardware-specific hooks in the dm-mpath are the way to go.
We already have them in place for the priority group initialization code
anyway, we can add them for anything else which needs them, like this
one.

> I wonder if it's not simpler just to remove the NOSTARTONADD flag on
> this devices in scsi_devinfo.c. I tested that and all the READ CAPACITY
> succeed as expected (DEC HSG80 / COMPAQ HSV*).

As James points out, this won't be quite the thing to do.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX AG - A Novell company

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2004-11-02 15:25 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-26 23:27 [multipath] SCSI device capacity mess christophe varoqui
2004-10-26 21:34 ` James Bottomley
2004-10-26 21:46   ` christophe varoqui
2004-10-27  8:17 ` [dm-devel] " Lars Marowsky-Bree
2004-10-27  8:42   ` Lars Marowsky-Bree
2004-10-27 18:51     ` Bryan Henderson
2004-10-29 14:12       ` Lars Marowsky-Bree
2004-10-29 16:48         ` Bryan Henderson
2004-10-27 19:02   ` christophe varoqui
2004-10-27 19:37     ` Eddie Williams
2004-10-27 20:19       ` christophe varoqui
2004-10-27 20:34         ` Greg Freemyer
2004-10-27 20:28     ` Philip R Auld
2004-10-27 21:57     ` James Bottomley
2004-10-28 11:37       ` Lars Marowsky-Bree
2004-10-28 18:14         ` Patrick Mansfield
2004-10-28 18:21           ` Lars Marowsky-Bree
2004-10-30  0:41             ` Patrick Mansfield
2004-10-30  1:01               ` Patrick Mansfield
2004-10-30  7:21               ` christophe varoqui
2004-10-30  8:22                 ` christophe varoqui
2004-11-02 15:23                 ` Lars Marowsky-Bree
2004-10-28 11:35     ` Lars Marowsky-Bree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).