* Handling multiple paths to enclosure devices?
@ 2011-07-28 17:05 Roland Dreier
2011-07-28 20:33 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Roland Dreier @ 2011-07-28 17:05 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, eric
Hi,
I'm seeing an issue with the current design of our enclosure
handling. In a system with a bunch of drives in an enclosure, it's
definitely helpful to have a way to go from sdXX to which slot in the
enclosure that drive is in, and that's what the symlink
/sys/block/sdXX/device/enclosure_device:NN provides.
However, in a system with multiple paths to the enclosure, eg an HBA
with two external SAS ports, both connected to a SAS expander in a
JBOD, ie in lame ASCII graphics, something like:
+-----+ /-- drv1
| | +-----+ /--- .
| |==SAS==| |-/---- .
| HBA | | exp |------ .
| |==SAS==| |-\---- .
| | +-----+ \--- .
+-----+ \-- drvN
we have two paths to each drive, so each gets two names, sdXX and
sdYY. However, in drivers/misc/enclosure.c, the code only allows one
device in each component and so what happens is that sdXX gets
discovered, then gets an enclosure_device:NN link, then sdYY is
discovered, so sdXX's enclosure_device:NN link is removed and one is
added for sdYY. And so if I want to figure out which enclosure slot
sdXX is in, I'm in for a hard time.
It would be a simple matter of writing code to allow all the block
devices in a slot to link back to that slot -- we would have to be a
bit more careful of keeping track of what links exist, but it should
be doable.
The wrinkle is that there are also /sys/class/enclosure/ZZZ/NN/device
symlinks that allow going the other way. And it's harder to see how
to express multiple block devices in one enclosure slot.
Thoughts on how to improve our enclosure handling?
Thanks!
Roland
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Handling multiple paths to enclosure devices?
2011-07-28 17:05 Handling multiple paths to enclosure devices? Roland Dreier
@ 2011-07-28 20:33 ` James Bottomley
2011-07-28 20:57 ` Roland Dreier
2011-07-28 22:01 ` Douglas Gilbert
0 siblings, 2 replies; 7+ messages in thread
From: James Bottomley @ 2011-07-28 20:33 UTC (permalink / raw)
To: Roland Dreier; +Cc: linux-scsi, eric
On Thu, 2011-07-28 at 10:05 -0700, Roland Dreier wrote:
> Hi,
>
> I'm seeing an issue with the current design of our enclosure
> handling. In a system with a bunch of drives in an enclosure, it's
> definitely helpful to have a way to go from sdXX to which slot in the
> enclosure that drive is in, and that's what the symlink
> /sys/block/sdXX/device/enclosure_device:NN provides.
>
> However, in a system with multiple paths to the enclosure, eg an HBA
> with two external SAS ports, both connected to a SAS expander in a
> JBOD, ie in lame ASCII graphics, something like:
>
> +-----+ /-- drv1
> | | +-----+ /--- .
> | |==SAS==| |-/---- .
> | HBA | | exp |------ .
> | |==SAS==| |-\---- .
> | | +-----+ \--- .
> +-----+ \-- drvN
So this configuration should form a single wide port and thus not
actually be multiple paths. However, if you have two HBAs instead of
one (or a non-SAS HBA), I grant it becomes multipath.
> we have two paths to each drive, so each gets two names, sdXX and
> sdYY. However, in drivers/misc/enclosure.c, the code only allows one
> device in each component and so what happens is that sdXX gets
> discovered, then gets an enclosure_device:NN link, then sdYY is
> discovered, so sdXX's enclosure_device:NN link is removed and one is
> added for sdYY. And so if I want to figure out which enclosure slot
> sdXX is in, I'm in for a hard time.
>
> It would be a simple matter of writing code to allow all the block
> devices in a slot to link back to that slot -- we would have to be a
> bit more careful of keeping track of what links exist, but it should
> be doable.
>
> The wrinkle is that there are also /sys/class/enclosure/ZZZ/NN/device
> symlinks that allow going the other way. And it's harder to see how
> to express multiple block devices in one enclosure slot.
>
> Thoughts on how to improve our enclosure handling?
My initial thought is that in a multi-path situation, as above, we get
two enclosures appearing as well (one down each path). If we
incorporated the idea of topological subtrees into the identity matching
code, we'd end up filling each of the enclosures with the path connected
devices. That seems to be an easy situation for multi-path drivers to
sort out and one requiring no alteration of the existing enclosure code
(except to do the topological subtree search).
How does that sound?
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Handling multiple paths to enclosure devices?
2011-07-28 20:33 ` James Bottomley
@ 2011-07-28 20:57 ` Roland Dreier
2011-07-29 7:09 ` James Bottomley
2011-07-28 22:01 ` Douglas Gilbert
1 sibling, 1 reply; 7+ messages in thread
From: Roland Dreier @ 2011-07-28 20:57 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, eric
> So this configuration should form a single wide port and thus not
> actually be multiple paths. However, if you have two HBAs instead of
> one (or a non-SAS HBA), I grant it becomes multipath.
Yeah ... actually my case is an HBA with two wide (4-lane) ports,
where either the expander or the HBA is not able to fuse them into a
single 8-lane port. But clearly this can happen at least with two
HBAs pretty easily.
> My initial thought is that in a multi-path situation, as above, we get
> two enclosures appearing as well (one down each path). If we
> incorporated the idea of topological subtrees into the identity matching
> code, we'd end up filling each of the enclosures with the path connected
> devices. That seems to be an easy situation for multi-path drivers to
> sort out and one requiring no alteration of the existing enclosure code
> (except to do the topological subtree search).
Ah, good insight: we do have two copies of the expander device, and so
it would be natural to attach each disk to the expander device on the
same path to that disk.
I'm a bit confused by what you mean about multi-path drivers though --
it would seem we would need the enclosure stuff to handle this
somehow? It seems that if I have this situation, my HBA driver (eg
mpt2sas) will discover the SCSI bus, hit an enclosure, trigger loading
the ses driver (which pulls in the enclosure driver), and then
continue discovering disks. And it seems this code needs to get the
topology sorted by itself -- how could a multi-path driver inject
itself into the symlink creation in enclosure.c?
- R.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Handling multiple paths to enclosure devices?
2011-07-28 20:57 ` Roland Dreier
@ 2011-07-29 7:09 ` James Bottomley
2011-07-29 7:21 ` Hannes Reinecke
0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2011-07-29 7:09 UTC (permalink / raw)
To: Roland Dreier; +Cc: linux-scsi, eric
On Thu, 2011-07-28 at 13:57 -0700, Roland Dreier wrote:
> > My initial thought is that in a multi-path situation, as above, we get
> > two enclosures appearing as well (one down each path). If we
> > incorporated the idea of topological subtrees into the identity matching
> > code, we'd end up filling each of the enclosures with the path connected
> > devices. That seems to be an easy situation for multi-path drivers to
> > sort out and one requiring no alteration of the existing enclosure code
> > (except to do the topological subtree search).
>
> Ah, good insight: we do have two copies of the expander device, and so
> it would be natural to attach each disk to the expander device on the
> same path to that disk.
>
> I'm a bit confused by what you mean about multi-path drivers though --
> it would seem we would need the enclosure stuff to handle this
> somehow? It seems that if I have this situation, my HBA driver (eg
> mpt2sas) will discover the SCSI bus, hit an enclosure, trigger loading
> the ses driver (which pulls in the enclosure driver), and then
> continue discovering disks. And it seems this code needs to get the
> topology sorted by itself -- how could a multi-path driver inject
> itself into the symlink creation in enclosure.c?
I merely meant that our current philosophy is to layer multi-path
awareness in a separate driver: dm. There's certainly no expander
awareness in dm-mp at the moment, but there could be. It should be
quite simple: match the expanders to something like a dm-exp and then
basically use the slot information in the expander to construct dm
devices for each of the physicals (the rule for dm device creation would
be dm-exp slot matching).
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Handling multiple paths to enclosure devices?
2011-07-29 7:09 ` James Bottomley
@ 2011-07-29 7:21 ` Hannes Reinecke
2011-07-29 7:25 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2011-07-29 7:21 UTC (permalink / raw)
To: James Bottomley; +Cc: Roland Dreier, linux-scsi, eric
On 07/29/2011 09:09 AM, James Bottomley wrote:
> On Thu, 2011-07-28 at 13:57 -0700, Roland Dreier wrote:
>> > My initial thought is that in a multi-path situation, as above, we get
>> > two enclosures appearing as well (one down each path). If we
>> > incorporated the idea of topological subtrees into the identity matching
>> > code, we'd end up filling each of the enclosures with the path connected
>> > devices. That seems to be an easy situation for multi-path drivers to
>> > sort out and one requiring no alteration of the existing enclosure code
>> > (except to do the topological subtree search).
>>
>> Ah, good insight: we do have two copies of the expander device, and so
>> it would be natural to attach each disk to the expander device on the
>> same path to that disk.
>>
>> I'm a bit confused by what you mean about multi-path drivers though --
>> it would seem we would need the enclosure stuff to handle this
>> somehow? It seems that if I have this situation, my HBA driver (eg
>> mpt2sas) will discover the SCSI bus, hit an enclosure, trigger loading
>> the ses driver (which pulls in the enclosure driver), and then
>> continue discovering disks. And it seems this code needs to get the
>> topology sorted by itself -- how could a multi-path driver inject
>> itself into the symlink creation in enclosure.c?
>
> I merely meant that our current philosophy is to layer multi-path
> awareness in a separate driver: dm. There's certainly no expander
> awareness in dm-mp at the moment, but there could be. It should be
> quite simple: match the expanders to something like a dm-exp and then
> basically use the slot information in the expander to construct dm
> devices for each of the physicals (the rule for dm device creation would
> be dm-exp slot matching).
>
Doesn't help as device-mapper is only concerned with block devices,
what with dm devices being essentially abstract block devices.
But yes, we should stick to our current philosophy of treating
multipath devices as separate trees underneath the accessing HBA.
We should not (and, in fact, cannot) try to map the external
topology in sysfs, rather should stick to a logical view as seen
from each HBA. So the enclosure class needs to be updated
int include the HBA number in there to avoid duplicate links.
The correct mapping and accessing these 'multipathed enclosures'
will then be left as an exercise to the reader :-)
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Handling multiple paths to enclosure devices?
2011-07-29 7:21 ` Hannes Reinecke
@ 2011-07-29 7:25 ` James Bottomley
0 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2011-07-29 7:25 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Roland Dreier, linux-scsi, eric
On Fri, 2011-07-29 at 09:21 +0200, Hannes Reinecke wrote:
> On 07/29/2011 09:09 AM, James Bottomley wrote:
> > On Thu, 2011-07-28 at 13:57 -0700, Roland Dreier wrote:
> >> > My initial thought is that in a multi-path situation, as above, we get
> >> > two enclosures appearing as well (one down each path). If we
> >> > incorporated the idea of topological subtrees into the identity matching
> >> > code, we'd end up filling each of the enclosures with the path connected
> >> > devices. That seems to be an easy situation for multi-path drivers to
> >> > sort out and one requiring no alteration of the existing enclosure code
> >> > (except to do the topological subtree search).
> >>
> >> Ah, good insight: we do have two copies of the expander device, and so
> >> it would be natural to attach each disk to the expander device on the
> >> same path to that disk.
> >>
> >> I'm a bit confused by what you mean about multi-path drivers though --
> >> it would seem we would need the enclosure stuff to handle this
> >> somehow? It seems that if I have this situation, my HBA driver (eg
> >> mpt2sas) will discover the SCSI bus, hit an enclosure, trigger loading
> >> the ses driver (which pulls in the enclosure driver), and then
> >> continue discovering disks. And it seems this code needs to get the
> >> topology sorted by itself -- how could a multi-path driver inject
> >> itself into the symlink creation in enclosure.c?
> >
> > I merely meant that our current philosophy is to layer multi-path
> > awareness in a separate driver: dm. There's certainly no expander
> > awareness in dm-mp at the moment, but there could be. It should be
> > quite simple: match the expanders to something like a dm-exp and then
> > basically use the slot information in the expander to construct dm
> > devices for each of the physicals (the rule for dm device creation would
> > be dm-exp slot matching).
> >
> Doesn't help as device-mapper is only concerned with block devices,
> what with dm devices being essentially abstract block devices.
Well, all SCSI devices are effectively block devices as far as DM sees
it because they have request queues. It's exploiting a sick trick, I
know, but I don't think it will be too difficult to express them in the
current code.
> But yes, we should stick to our current philosophy of treating
> multipath devices as separate trees underneath the accessing HBA.
> We should not (and, in fact, cannot) try to map the external
> topology in sysfs, rather should stick to a logical view as seen
> from each HBA. So the enclosure class needs to be updated
> int include the HBA number in there to avoid duplicate links.
>
> The correct mapping and accessing these 'multipathed enclosures'
> will then be left as an exercise to the reader :-)
Agreed. I'm agnostic as to whether we trick dm or invent something new.
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Handling multiple paths to enclosure devices?
2011-07-28 20:33 ` James Bottomley
2011-07-28 20:57 ` Roland Dreier
@ 2011-07-28 22:01 ` Douglas Gilbert
1 sibling, 0 replies; 7+ messages in thread
From: Douglas Gilbert @ 2011-07-28 22:01 UTC (permalink / raw)
To: James Bottomley; +Cc: Roland Dreier, linux-scsi, eric
On 11-07-28 04:33 PM, James Bottomley wrote:
> On Thu, 2011-07-28 at 10:05 -0700, Roland Dreier wrote:
>> Hi,
>>
>> I'm seeing an issue with the current design of our enclosure
>> handling. In a system with a bunch of drives in an enclosure, it's
>> definitely helpful to have a way to go from sdXX to which slot in the
>> enclosure that drive is in, and that's what the symlink
>> /sys/block/sdXX/device/enclosure_device:NN provides.
>>
>> However, in a system with multiple paths to the enclosure, eg an HBA
>> with two external SAS ports, both connected to a SAS expander in a
>> JBOD, ie in lame ASCII graphics, something like:
>>
>> +-----+ /-- drv1
>> | | +-----+ /--- .
>> | |==SAS==| |-/---- .
>> | HBA | | exp |------ .
>> | |==SAS==| |-\---- .
>> | | +-----+ \--- .
>> +-----+ \-- drvN
>
> So this configuration should form a single wide port and thus not
> actually be multiple paths. However, if you have two HBAs instead of
> one (or a non-SAS HBA), I grant it becomes multipath.
Here are some conflicting results for the single HBA case.
I tested two LSI HBAs separately, each with 8 phys and wired 5
of those phys to a LSI SAS-2 expander.
Here are the results for a SAS-1.1 (3 Gbps) 3444E HBA seen
from a SMP DISCOVER on the expander:
phy 5:T:attached:[5001517e85c3efe5:00 t(SATA)] 3 Gbps
phy 7:T:attached:[5000c500215725bd:00 t(SSP)] 6 Gbps
phy 12:T:attached:[500605b00006f263:03 i(SSP+STP+SMP)] 3 Gbps
phy 20:T:attached:[500605b00006f264:04 i(SSP+STP+SMP)] 3 Gbps
phy 21:T:attached:[500605b00006f264:05 i(SSP+STP+SMP)] 3 Gbps
phy 22:T:attached:[500605b00006f264:06 i(SSP+STP+SMP)] 3 Gbps
phy 23:T:attached:[500605b00006f264:07 i(SSP+STP+SMP)] 3 Gbps
phy 24:D:attached:[5001517e85c3effd:00 V i(SSP+SMP) t(SSP)] 6 Gbps
That is two ports:
- a narrow port [HBA phy 3 to expander phy 12]
note the different SAS address of HBA phy 3: 500605b00006f263
- a wide port [HBA phys 4-7 to expander phys 20-23]
Should the HBA report as two hosts?? [It only reports as one host
in my test.]
So if a SAS HBA can't handle a wide port with more than
4 phys, it can just change the SAS addresses on one or more
phys at the HBA end.
Here are the results for a SAS-2 (6 Gbps) 9212 HBA seen
from a SMP DISCOVER on the same expander:
phy 5:T:attached:[5001517e85c3efe5:00 t(SATA)] 3 Gbps
phy 7:T:attached:[5000c500215725bd:00 t(SSP)] 6 Gbps
phy 12:T:attached:[500605b001d0d3e0:04 i(SSP+STP+SMP)] 6 Gbps
phy 20:T:attached:[500605b001d0d3e0:03 i(SSP+STP+SMP)] 6 Gbps
phy 21:T:attached:[500605b001d0d3e0:02 i(SSP+STP+SMP)] 6 Gbps
phy 22:T:attached:[500605b001d0d3e0:01 i(SSP+STP+SMP)] 6 Gbps
phy 23:T:attached:[500605b001d0d3e0:00 i(SSP+STP+SMP)] 6 Gbps
phy 24:D:attached:[5001517e85c3effd:00 V i(SSP+SMP) t(SSP)] 6 Gbps
Now that is a single wide port (5 phys wide) since all
5 expander phys have the same SAS address (not shown)
and all 5 HBA phys have the same address.
>> we have two paths to each drive, so each gets two names, sdXX and
>> sdYY. However, in drivers/misc/enclosure.c, the code only allows one
>> device in each component and so what happens is that sdXX gets
>> discovered, then gets an enclosure_device:NN link, then sdYY is
>> discovered, so sdXX's enclosure_device:NN link is removed and one is
>> added for sdYY. And so if I want to figure out which enclosure slot
>> sdXX is in, I'm in for a hard time.
>>
>> It would be a simple matter of writing code to allow all the block
>> devices in a slot to link back to that slot -- we would have to be a
>> bit more careful of keeping track of what links exist, but it should
>> be doable.
>>
>> The wrinkle is that there are also /sys/class/enclosure/ZZZ/NN/device
>> symlinks that allow going the other way. And it's harder to see how
>> to express multiple block devices in one enclosure slot.
>>
>> Thoughts on how to improve our enclosure handling?
IN SAS-2 the SMP DISCOVER response should contain slot information.
Looking at sysfs for my test system:
# lsscsi
[1:0:0:0] disk ATA ST3320620AS 3.AA /dev/sda
[6:0:0:0] disk ATA ST31000528AS CC38 /dev/sdb
[6:0:1:0] disk SEAGATE ST32000444SS 0006 /dev/sdc
[6:0:2:0] enclosu Intel RES2SV240 0600 -
then fetching the corresponding SAS port addresses:
# lsscsi -t
[1:0:0:0] disk sata: /dev/sda
[6:0:0:0] disk sas:0x5001517e85c3efe5 /dev/sdb
[6:0:1:0] disk sas:0x5000c500215725bd /dev/sdc
[6:0:2:0] enclosu sas:0x5001517e85c3effd -
Referring to the SMP DISCOVER response above, it can be seen
that /dev/sdc is connected to expander phy 7. So getting the
long form output for a SMP DISCOVER on phy 7 :
# smp_discover -p 7 /dev/bsg/expander-6\:0
Discover response:
...
attached SAS address: 0x5000c500215725bd
attached phy identifier: 0
...
device slot number: 255
device slot group number: 255
device slot group output connector:
Sadly the value of 255 means not available. YMMV
> My initial thought is that in a multi-path situation, as above, we get
> two enclosures appearing as well (one down each path). If we
> incorporated the idea of topological subtrees into the identity matching
> code, we'd end up filling each of the enclosures with the path connected
> devices. That seems to be an easy situation for multi-path drivers to
> sort out and one requiring no alteration of the existing enclosure code
> (except to do the topological subtree search).
>
> How does that sound?
Does this also solve the problem reported a few weeks back
in which a SES logical unit reported duplicate element
descriptor names?
Doug Gilbert
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-07-29 7:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-28 17:05 Handling multiple paths to enclosure devices? Roland Dreier
2011-07-28 20:33 ` James Bottomley
2011-07-28 20:57 ` Roland Dreier
2011-07-29 7:09 ` James Bottomley
2011-07-29 7:21 ` Hannes Reinecke
2011-07-29 7:25 ` James Bottomley
2011-07-28 22:01 ` Douglas Gilbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox