* Failed path will not be recovered when disabling/enabling remote port
@ 2009-07-02 11:27 Christian May
2009-07-02 11:44 ` Hannes Reinecke
2009-07-02 17:51 ` Chandra Seetharaman
0 siblings, 2 replies; 8+ messages in thread
From: Christian May @ 2009-07-02 11:27 UTC (permalink / raw)
To: dm-devel
Hi,
I've setup an IBM z10 LPAR (mainframe server) with 2.6.30-kernel.
Attached to the System z10 was an IBM DS8000 storage server. 10x SCSI
LUNs were assigned to LPAR via two pathes:
Example:
36005076303ffc1040000000000001269 dm-9 IBM,2107900
size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-2 status=active
|- 0:0:0:1080639506 sdw 65:96 active undef running
`- 1:0:1:1080639506 sdt 65:48 active undef running
Special parameter setting: dev_loss_tmo=90sec; fast_io_fail_tmo=5sec
multipath tools: multipath-tools v0.4.9 (04/04, 2009)
device-mapper: device-mapper-1.02.27-7.fc10.s390x,
device-mapper-libs-1.02.27-7.fc10.s390x
When removing a remote port (disabling a port on the BROCADE FC switch)
one path failed.
root@h42lp26/ESAME:~]
> multipath -l
36005076303ffc1040000000000001268 dm-8 ,
size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-2 status=active
|- #:#:#:# - #:# failed undef running
`- 1:0:1:1080573970 sdr 65:16 active undef running
After a while (>90sec) SCSI LUNs were removed from system:
UEVENT[1246531815.619428] add /kernel/uids/74 (uids)
UDEV [1246531815.621708] add /kernel/uids/74 (uids)
UEVENT[1246531816.725299] remove /kernel/uids/74 (uids)
UDEV [1246531816.726151] remove /kernel/uids/74 (uids)
UEVENT[1246531929.959709] change /devices/virtual/block/dm-0 (block)
UEVENT[1246531929.959749] change /devices/virtual/block/dm-3 (block)
UEVENT[1246531929.959759] change /devices/virtual/block/dm-4 (block)
UEVENT[1246531929.959769] change /devices/virtual/block/dm-5 (block)
UEVENT[1246531929.966647] change /devices/virtual/block/dm-7 (block)
UDEV [1246531930.045444] change /devices/virtual/block/dm-4 (block)
UDEV [1246531930.048923] change /devices/virtual/block/dm-7 (block)
UDEV [1246531930.054614] change /devices/virtual/block/dm-0 (block)
UDEV [1246531930.060091] change /devices/virtual/block/dm-3 (block)
UDEV [1246531930.071744] change /devices/virtual/block/dm-5 (block)
UEVENT[1246531949.278541] change /devices/virtual/block/dm-9 (block)
UDEV [1246531949.369690] change /devices/virtual/block/dm-9 (block)
UEVENT[1246531950.295756] change /devices/virtual/block/dm-8 (block)
UEVENT[1246531950.297597] change /devices/virtual/block/dm-6 (block)
UEVENT[1246531950.297610] change /devices/virtual/block/dm-2 (block)
UEVENT[1246531950.297620] change /devices/virtual/block/dm-1 (block)
UDEV [1246531950.430097] change /devices/virtual/block/dm-8 (block)
UDEV [1246531950.588626] change /devices/virtual/block/dm-2 (block)
UDEV [1246531950.632482] change /devices/virtual/block/dm-1 (block)
UDEV [1246531950.634515] change /devices/virtual/block/dm-6 (block)
UEVENT[1246532034.277177] remove
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_generic/sg0
(scsi_generic)
UEVENT[1246532034.277214] remove
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_device/0:0:0:1080377362
(scsi_device)
UEVENT[1246532034.277226] remove
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_disk/0:0:0:1080377362
(scsi_disk)
UEVENT[1246532034.277236] remove /devices/virtual/bdi/8:0 (bdi)
UEVENT[1246532034.277247] remove
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/block/sda
(block)
UEVENT[1246532034.277258] remove
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362
(scsi)
UEVENT[1246532034.277384] remove
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_generic/sg2
(scsi_generic)
UEVENT[1246532034.277594] remove
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_device/0:0:0:1080836114
(scsi_device)
UEVENT[1246532034.277864] remove
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_disk/0:0:0:1080836114
(scsi_disk)
UEVENT[1246532034.278035] remove /devices/virtual/bdi/8:32 (bdi)...
....
When re-enabling the path, SCSI LUNS were reassigned to system but path
didn't recover:
UEVENT[1246532107.387169] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114
(scsi)
UEVENT[1246532107.387209] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_device/0:0:0:1080836114
(scsi_device)
UEVENT[1246532107.387220] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_generic/sg0
(scsi_generic)
UEVENT[1246532107.387230] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_disk/0:0:0:1080836114
(scsi_disk)
UEVENT[1246532107.388941] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362
(scsi)
UEVENT[1246532107.388952] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_device/0:0:0:1080377362
(scsi_device)
UEVENT[1246532107.388963] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_generic/sg2
(scsi_generic)
UEVENT[1246532107.397111] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/block/sdu
(block)
UEVENT[1246532107.399249] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080639506
(scsi)
UEVENT[1246532107.399261] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080639506/scsi_device/0:0:0:1080639506
(scsi_device)
UEVENT[1246532107.399272] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080639506/scsi_generic/sg4
(scsi_generic)
UEVENT[1246532107.399711] add /devices/virtual/bdi/65:64 (bdi)
UEVENT[1246532107.399722] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_disk/0:0:0:1080377362
(scsi_disk)
UEVENT[1246532107.401605] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080573970
(scsi)
UEVENT[1246532107.401617] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080573970/scsi_device/0:0:0:1080573970
(scsi_device)
UEVENT[1246532107.401628] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080573970/scsi_generic/sg6
(scsi_generic)
UEVENT[1246532107.403731] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080967186
(scsi)
UEVENT[1246532107.403742] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080967186/scsi_device/0:0:0:1080967186
(scsi_device)
UEVENT[1246532107.403753] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080967186/scsi_generic/sg8
(scsi_generic)
UEVENT[1246532107.405963] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/block/sdv
(block)
UEVENT[1246532107.406168] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080901650
(scsi)
UEVENT[1246532107.407608] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080901650/scsi_device/0:0:0:1080901650
(scsi_device)
UEVENT[1246532107.407624] add
/devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080901650/scsi_generic/sg10
(scsi_generic)
UEVENT[1246532107.407880] add /devices/virtual/bdi/65:80 (bdi)
[root@h42lp26/ESAME:~]
> multipath -l
36005076303ffc1040000000000001268 dm-8 ,
size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-2 status=active
|- #:#:#:# - #:# failed undef running
`- 1:0:1:1080573970 sdr 65:16 active undef running
Running "multipath" command will recover the failed path but that's not
way it should be...can somebody help to fix this? Why is the path not
recovered automatically?
Regards,
Christian May
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failed path will not be recovered when disabling/enabling remote port
2009-07-02 11:27 Failed path will not be recovered when disabling/enabling remote port Christian May
@ 2009-07-02 11:44 ` Hannes Reinecke
2009-07-02 13:06 ` Konrad Rzeszutek
2009-07-02 17:51 ` Chandra Seetharaman
1 sibling, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2009-07-02 11:44 UTC (permalink / raw)
To: device-mapper development
Christian May wrote:
> Hi,
>
> I've setup an IBM z10 LPAR (mainframe server) with 2.6.30-kernel.
> Attached to the System z10 was an IBM DS8000 storage server. 10x SCSI
> LUNs were assigned to LPAR via two pathes:
>
> Example:
> 36005076303ffc1040000000000001269 dm-9 IBM,2107900
> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=-2 status=active
> |- 0:0:0:1080639506 sdw 65:96 active undef running
> `- 1:0:1:1080639506 sdt 65:48 active undef running
>
> Special parameter setting: dev_loss_tmo=90sec; fast_io_fail_tmo=5sec
>
> multipath tools: multipath-tools v0.4.9 (04/04, 2009)
> device-mapper: device-mapper-1.02.27-7.fc10.s390x,
> device-mapper-libs-1.02.27-7.fc10.s390x
>
> When removing a remote port (disabling a port on the BROCADE FC switch)
> one path failed.
>
> root@h42lp26/ESAME:~]
>> multipath -l
> 36005076303ffc1040000000000001268 dm-8 ,
> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=-2 status=active
> |- #:#:#:# - #:# failed undef running
> `- 1:0:1:1080573970 sdr 65:16 active undef running
>
> After a while (>90sec) SCSI LUNs were removed from system:
>
[ .. ]
>
> When re-enabling the path, SCSI LUNS were reassigned to system but path
> didn't recover:
>
[ .. ]
>
>
> [root@h42lp26/ESAME:~]
>> multipath -l
> 36005076303ffc1040000000000001268 dm-8 ,
> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=-2 status=active
> |- #:#:#:# - #:# failed undef running
> `- 1:0:1:1080573970 sdr 65:16 active undef running
>
>
> Running "multipath" command will recover the failed path but that's not
> way it should be...can somebody help to fix this? Why is the path not
> recovered automatically?
>
It should, really.
The problem is that the paths have _not_ been reconnected;
the hashes indicates that the in-kernel multipath code references
a device for which no information is available.
And the new device has _not_ been reconnected, as otherwise
you'd end up with _three_ paths here.
Probably missing udev integration.
I really have to push my patches upstream ... sigh.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failed path will not be recovered when disabling/enabling remote port
2009-07-02 11:44 ` Hannes Reinecke
@ 2009-07-02 13:06 ` Konrad Rzeszutek
2009-07-02 13:16 ` Hannes Reinecke
0 siblings, 1 reply; 8+ messages in thread
From: Konrad Rzeszutek @ 2009-07-02 13:06 UTC (permalink / raw)
To: device-mapper development
On Thu, Jul 02, 2009 at 01:44:18PM +0200, Hannes Reinecke wrote:
> Christian May wrote:
> > Hi,
> >
> > I've setup an IBM z10 LPAR (mainframe server) with 2.6.30-kernel.
> > Attached to the System z10 was an IBM DS8000 storage server. 10x SCSI
> > LUNs were assigned to LPAR via two pathes:
> >
> > Example:
> > 36005076303ffc1040000000000001269 dm-9 IBM,2107900
> > size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> > `-+- policy='round-robin 0' prio=-2 status=active
> > |- 0:0:0:1080639506 sdw 65:96 active undef running
> > `- 1:0:1:1080639506 sdt 65:48 active undef running
> >
> > Special parameter setting: dev_loss_tmo=90sec; fast_io_fail_tmo=5sec
> >
> > multipath tools: multipath-tools v0.4.9 (04/04, 2009)
> > device-mapper: device-mapper-1.02.27-7.fc10.s390x,
> > device-mapper-libs-1.02.27-7.fc10.s390x
> >
> > When removing a remote port (disabling a port on the BROCADE FC switch)
> > one path failed.
> >
> > root@h42lp26/ESAME:~]
> >> multipath -l
> > 36005076303ffc1040000000000001268 dm-8 ,
> > size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> > `-+- policy='round-robin 0' prio=-2 status=active
> > |- #:#:#:# - #:# failed undef running
> > `- 1:0:1:1080573970 sdr 65:16 active undef running
> >
> > After a while (>90sec) SCSI LUNs were removed from system:
> >
> [ .. ]
> >
> > When re-enabling the path, SCSI LUNS were reassigned to system but path
> > didn't recover:
> >
> [ .. ]
>
> >
> >
> > [root@h42lp26/ESAME:~]
> >> multipath -l
> > 36005076303ffc1040000000000001268 dm-8 ,
> > size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> > `-+- policy='round-robin 0' prio=-2 status=active
> > |- #:#:#:# - #:# failed undef running
> > `- 1:0:1:1080573970 sdr 65:16 active undef running
> >
> >
> > Running "multipath" command will recover the failed path but that's not
> > way it should be...can somebody help to fix this? Why is the path not
> > recovered automatically?
> >
> It should, really.
>
> The problem is that the paths have _not_ been reconnected;
> the hashes indicates that the in-kernel multipath code references
> a device for which no information is available.
> And the new device has _not_ been reconnected, as otherwise
> you'd end up with _three_ paths here.
>
> Probably missing udev integration.
Could also be a race condition that is present in SLES10 + RHEL5
kernels. Where the SysFS directories are created (and the udev event it
sent out), but the kernel hasn't populated the SysFS directories. So
when multipathd tries to read them it finds no pertient information and
shoves it off to the 'orphan' state.
I did post a patch for this a while back. Granted this isn't a problem
with the more recent kernels.
>
> I really have to push my patches upstream ... sigh.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> hare@suse.de +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failed path will not be recovered when disabling/enabling remote port
2009-07-02 13:06 ` Konrad Rzeszutek
@ 2009-07-02 13:16 ` Hannes Reinecke
2009-07-20 16:46 ` Konrad Rzeszutek
0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2009-07-02 13:16 UTC (permalink / raw)
To: device-mapper development
Hi all,
Konrad Rzeszutek wrote:
> On Thu, Jul 02, 2009 at 01:44:18PM +0200, Hannes Reinecke wrote:
>> Christian May wrote:
>>> Hi,
>>>
>>> I've setup an IBM z10 LPAR (mainframe server) with 2.6.30-kernel.
>>> Attached to the System z10 was an IBM DS8000 storage server. 10x SCSI
>>> LUNs were assigned to LPAR via two pathes:
>>>
>>> Example:
>>> 36005076303ffc1040000000000001269 dm-9 IBM,2107900
>>> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
>>> `-+- policy='round-robin 0' prio=-2 status=active
>>> |- 0:0:0:1080639506 sdw 65:96 active undef running
>>> `- 1:0:1:1080639506 sdt 65:48 active undef running
>>>
>>> Special parameter setting: dev_loss_tmo=90sec; fast_io_fail_tmo=5sec
>>>
>>> multipath tools: multipath-tools v0.4.9 (04/04, 2009)
>>> device-mapper: device-mapper-1.02.27-7.fc10.s390x,
>>> device-mapper-libs-1.02.27-7.fc10.s390x
>>>
>>> When removing a remote port (disabling a port on the BROCADE FC switch)
>>> one path failed.
>>>
>>> root@h42lp26/ESAME:~]
>>>> multipath -l
>>> 36005076303ffc1040000000000001268 dm-8 ,
>>> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
>>> `-+- policy='round-robin 0' prio=-2 status=active
>>> |- #:#:#:# - #:# failed undef running
>>> `- 1:0:1:1080573970 sdr 65:16 active undef running
>>>
>>> After a while (>90sec) SCSI LUNs were removed from system:
>>>
>> [ .. ]
>>> When re-enabling the path, SCSI LUNS were reassigned to system but path
>>> didn't recover:
>>>
>> [ .. ]
>>
>>>
>>> [root@h42lp26/ESAME:~]
>>>> multipath -l
>>> 36005076303ffc1040000000000001268 dm-8 ,
>>> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
>>> `-+- policy='round-robin 0' prio=-2 status=active
>>> |- #:#:#:# - #:# failed undef running
>>> `- 1:0:1:1080573970 sdr 65:16 active undef running
>>>
>>>
>>> Running "multipath" command will recover the failed path but that's not
>>> way it should be...can somebody help to fix this? Why is the path not
>>> recovered automatically?
>>>
>> It should, really.
>>
>> The problem is that the paths have _not_ been reconnected;
>> the hashes indicates that the in-kernel multipath code references
>> a device for which no information is available.
>> And the new device has _not_ been reconnected, as otherwise
>> you'd end up with _three_ paths here.
>>
>> Probably missing udev integration.
>
> Could also be a race condition that is present in SLES10 + RHEL5
> kernels. Where the SysFS directories are created (and the udev event it
> sent out), but the kernel hasn't populated the SysFS directories. So
> when multipathd tries to read them it finds no pertient information and
> shoves it off to the 'orphan' state.
>
Really? With SLES10? Have you actually observed this?
We're running multipath _after_ udev has processed the event.
And udev already waited for sysfs, so we should be safe there.
It might be applicable to mainline multipath-tools, but
the SLES10 one ... I'd be surprised.
Well, reasonably surprised. multipath keeps on throwing
an amazing number of issues still.
Do you have more information here?
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failed path will not be recovered when disabling/enabling remote port
2009-07-02 11:27 Failed path will not be recovered when disabling/enabling remote port Christian May
2009-07-02 11:44 ` Hannes Reinecke
@ 2009-07-02 17:51 ` Chandra Seetharaman
1 sibling, 0 replies; 8+ messages in thread
From: Chandra Seetharaman @ 2009-07-02 17:51 UTC (permalink / raw)
To: device-mapper development
One simple question. Did you observe if multipathd is (still) running ?
(when the port was enabled)
On Thu, 2009-07-02 at 13:27 +0200, Christian May wrote:
> Hi,
>
> I've setup an IBM z10 LPAR (mainframe server) with 2.6.30-kernel.
> Attached to the System z10 was an IBM DS8000 storage server. 10x SCSI
> LUNs were assigned to LPAR via two pathes:
>
> Example:
> 36005076303ffc1040000000000001269 dm-9 IBM,2107900
> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=-2 status=active
> |- 0:0:0:1080639506 sdw 65:96 active undef running
> `- 1:0:1:1080639506 sdt 65:48 active undef running
>
> Special parameter setting: dev_loss_tmo=90sec; fast_io_fail_tmo=5sec
>
> multipath tools: multipath-tools v0.4.9 (04/04, 2009)
> device-mapper: device-mapper-1.02.27-7.fc10.s390x,
> device-mapper-libs-1.02.27-7.fc10.s390x
>
> When removing a remote port (disabling a port on the BROCADE FC switch)
> one path failed.
>
> root@h42lp26/ESAME:~]
> > multipath -l
> 36005076303ffc1040000000000001268 dm-8 ,
> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=-2 status=active
> |- #:#:#:# - #:# failed undef running
> `- 1:0:1:1080573970 sdr 65:16 active undef running
>
> After a while (>90sec) SCSI LUNs were removed from system:
>
>
> UEVENT[1246531815.619428] add /kernel/uids/74 (uids)
> UDEV [1246531815.621708] add /kernel/uids/74 (uids)
> UEVENT[1246531816.725299] remove /kernel/uids/74 (uids)
> UDEV [1246531816.726151] remove /kernel/uids/74 (uids)
> UEVENT[1246531929.959709] change /devices/virtual/block/dm-0 (block)
> UEVENT[1246531929.959749] change /devices/virtual/block/dm-3 (block)
> UEVENT[1246531929.959759] change /devices/virtual/block/dm-4 (block)
> UEVENT[1246531929.959769] change /devices/virtual/block/dm-5 (block)
> UEVENT[1246531929.966647] change /devices/virtual/block/dm-7 (block)
> UDEV [1246531930.045444] change /devices/virtual/block/dm-4 (block)
> UDEV [1246531930.048923] change /devices/virtual/block/dm-7 (block)
> UDEV [1246531930.054614] change /devices/virtual/block/dm-0 (block)
> UDEV [1246531930.060091] change /devices/virtual/block/dm-3 (block)
> UDEV [1246531930.071744] change /devices/virtual/block/dm-5 (block)
> UEVENT[1246531949.278541] change /devices/virtual/block/dm-9 (block)
> UDEV [1246531949.369690] change /devices/virtual/block/dm-9 (block)
> UEVENT[1246531950.295756] change /devices/virtual/block/dm-8 (block)
> UEVENT[1246531950.297597] change /devices/virtual/block/dm-6 (block)
> UEVENT[1246531950.297610] change /devices/virtual/block/dm-2 (block)
> UEVENT[1246531950.297620] change /devices/virtual/block/dm-1 (block)
> UDEV [1246531950.430097] change /devices/virtual/block/dm-8 (block)
> UDEV [1246531950.588626] change /devices/virtual/block/dm-2 (block)
> UDEV [1246531950.632482] change /devices/virtual/block/dm-1 (block)
> UDEV [1246531950.634515] change /devices/virtual/block/dm-6 (block)
> UEVENT[1246532034.277177] remove
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_generic/sg0
> (scsi_generic)
> UEVENT[1246532034.277214] remove
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_device/0:0:0:1080377362
> (scsi_device)
> UEVENT[1246532034.277226] remove
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_disk/0:0:0:1080377362
> (scsi_disk)
> UEVENT[1246532034.277236] remove /devices/virtual/bdi/8:0 (bdi)
> UEVENT[1246532034.277247] remove
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/block/sda
> (block)
> UEVENT[1246532034.277258] remove
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362
> (scsi)
> UEVENT[1246532034.277384] remove
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_generic/sg2
> (scsi_generic)
> UEVENT[1246532034.277594] remove
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_device/0:0:0:1080836114
> (scsi_device)
> UEVENT[1246532034.277864] remove
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_disk/0:0:0:1080836114
> (scsi_disk)
> UEVENT[1246532034.278035] remove /devices/virtual/bdi/8:32 (bdi)...
>
> ....
>
> When re-enabling the path, SCSI LUNS were reassigned to system but path
> didn't recover:
>
> UEVENT[1246532107.387169] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114
> (scsi)
> UEVENT[1246532107.387209] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_device/0:0:0:1080836114
> (scsi_device)
> UEVENT[1246532107.387220] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_generic/sg0
> (scsi_generic)
> UEVENT[1246532107.387230] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/scsi_disk/0:0:0:1080836114
> (scsi_disk)
> UEVENT[1246532107.388941] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362
> (scsi)
> UEVENT[1246532107.388952] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_device/0:0:0:1080377362
> (scsi_device)
> UEVENT[1246532107.388963] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_generic/sg2
> (scsi_generic)
> UEVENT[1246532107.397111] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080836114/block/sdu
> (block)
> UEVENT[1246532107.399249] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080639506
> (scsi)
> UEVENT[1246532107.399261] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080639506/scsi_device/0:0:0:1080639506
> (scsi_device)
> UEVENT[1246532107.399272] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080639506/scsi_generic/sg4
> (scsi_generic)
> UEVENT[1246532107.399711] add /devices/virtual/bdi/65:64 (bdi)
> UEVENT[1246532107.399722] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/scsi_disk/0:0:0:1080377362
> (scsi_disk)
> UEVENT[1246532107.401605] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080573970
> (scsi)
> UEVENT[1246532107.401617] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080573970/scsi_device/0:0:0:1080573970
> (scsi_device)
> UEVENT[1246532107.401628] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080573970/scsi_generic/sg6
> (scsi_generic)
> UEVENT[1246532107.403731] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080967186
> (scsi)
> UEVENT[1246532107.403742] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080967186/scsi_device/0:0:0:1080967186
> (scsi_device)
> UEVENT[1246532107.403753] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080967186/scsi_generic/sg8
> (scsi_generic)
> UEVENT[1246532107.405963] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080377362/block/sdv
> (block)
> UEVENT[1246532107.406168] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080901650
> (scsi)
> UEVENT[1246532107.407608] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080901650/scsi_device/0:0:0:1080901650
> (scsi_device)
> UEVENT[1246532107.407624] add
> /devices/css0/0.0.0330/0.0.1780/host0/rport-0:0-0/target0:0:0/0:0:0:1080901650/scsi_generic/sg10
> (scsi_generic)
> UEVENT[1246532107.407880] add /devices/virtual/bdi/65:80 (bdi)
>
>
> [root@h42lp26/ESAME:~]
> > multipath -l
> 36005076303ffc1040000000000001268 dm-8 ,
> size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=-2 status=active
> |- #:#:#:# - #:# failed undef running
> `- 1:0:1:1080573970 sdr 65:16 active undef running
>
>
> Running "multipath" command will recover the failed path but that's not
> way it should be...can somebody help to fix this? Why is the path not
> recovered automatically?
>
>
> Regards,
>
>
> Christian May
>
>
>
>
>
>
>
>
>
>
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failed path will not be recovered when disabling/enabling remote port
2009-07-02 13:16 ` Hannes Reinecke
@ 2009-07-20 16:46 ` Konrad Rzeszutek
2009-07-21 6:19 ` Hannes Reinecke
0 siblings, 1 reply; 8+ messages in thread
From: Konrad Rzeszutek @ 2009-07-20 16:46 UTC (permalink / raw)
To: device-mapper development
> > Could also be a race condition that is present in SLES10 + RHEL5
> > kernels. Where the SysFS directories are created (and the udev event it
> > sent out), but the kernel hasn't populated the SysFS directories. So
> > when multipathd tries to read them it finds no pertient information and
> > shoves it off to the 'orphan' state.
> >
> Really? With SLES10? Have you actually observed this?
With SLES10 SP2 to be exact. It wasn't an issue with SLES10 since the
initial patch was there. The equipment I used to test this was an
AX150FC with failed batteries (so no cache writes) and with a failed
controller so it would run extra slow.
> We're running multipath _after_ udev has processed the event.
Right, the one where the SysFS directory is created. Then multipatd
reads the data. I remember posting it here and mentioning that this
problem exists on SLES10SP2 and RHEL5 but not on the upstream kernels.
> And udev already waited for sysfs, so we should be safe there.
Not so. The udev gets the SCSI uevent creation, creates the /dev/sdX, and
so. But the kernel hasn't yet fully populated the SysFS entries (so
/sys/block/sdX/device/vendor does exist, but has no data in it).
>
> It might be applicable to mainline multipath-tools, but
It really depends on how the SysFS directories are populated and how
slow the SCSI target is.
> the SLES10 one ... I'd be surprised.
>
> Well, reasonably surprised. multipath keeps on throwing
> an amazing number of issues still.
>
> Do you have more information here?
Here is the patch along with a detailed description.
The "multipath-tools-add-wait" patch is a backport/write of the
wait_for_file routine used in the sysfs_get_[vendor|model|rev]
macros. The SLES10 SP2 back-ported a lot of the upstream features
of multipath, and one of those was getting rid of this function.
I haven't yet found out the reason why it was deleted - looks
as if a mistake as the upstream kernel _should_ cause the same
set of problems with multipath.
[update: Upstream kernel has this fixed]
The reason a wait is necessary is due to the way the kernel
sends the event. When a SCSI device is added the SCSI subsystem
pursues this path:
_sysfs_add_sdev:
calls device_add ...
[ '/devices/platform/host16/session6/target16:0:0/16:0:0:17'] uevent
bus_attach_device
bus_for_each_drv
driver_probe_device
sd_probe
['/class/scsi_disk/16:0:0:17' ] uevent
add_disk
['/block/sdai'] [ Here multipath starts its job ]
calls class_device_add ...
[ '/class/scsi_device/16:0:0:17' ] uevent
sg_add:
[ '/class/scsi_generic/sg35' ] uevent
done with device_add, and now we add the attributes:
--> scsi_sysfs_sdev_attrs[i].vendor, model, rev <-- THIS is the
problem.
[Multipathd at the 'block/sdai' event has started analyzing the data, and
it reads the SysFS, but the 'vendor', 'model' have no data so multipathd
discards them an orphans the devices. That data gets to be there once
'device_add' is finished.]
There are four uevents sent from the kernel in the creation of
a SysFS representation of the device. After the last event, the
SysFs entries for vendor,model, rev are populated. Which can
lead to a race condition when multipath investigates the
the new block device and finds it can't read the vendor. This
patch adds the wait_for_file routine which adds some wait
time.
diff -uNrp multipath-tools-0.4.7.orig/libmultipath/discovery.c multipath-tools-0.4.7/libmultipath/discovery.c
--- multipath-tools-0.4.7.orig/libmultipath/discovery.c 2008-09-25 14:02:28.000000000 -0400
+++ multipath-tools-0.4.7/libmultipath/discovery.c 2008-09-25 19:07:50.000000000 -0400
@@ -125,11 +125,54 @@ path_discovery (vector pathvec, struct c
return r;
}
+
+/*
+ * the daemon can race udev upon path add,
+ * not multipath(8), ran by udev
+ */
+#if DAEMON
+#define WAIT_MAX_SECONDS 5
+#define WAIT_LOOP_PER_SECOND 5
+
+static int
+wait_for_file (char * filename)
+{
+ int loop;
+ struct stat stats;
+
+ loop = WAIT_MAX_SECONDS * WAIT_LOOP_PER_SECOND;
+
+ while (--loop) {
+ if (stat(filename, &stats) == 0)
+ return 0;
+
+ if (errno != ENOENT)
+ return 1;
+
+ usleep(1000 * 1000 / WAIT_LOOP_PER_SECOND);
+ }
+ return 1;
+}
+#else
+static int
+wait_for_file (char * filename)
+{
+ return 0;
+}
+#endif
+
#define declare_sysfs_get_str(fname) \
extern int \
sysfs_get_##fname (struct sysfs_device * dev, char * buff, size_t len) \
{ \
char *attr; \
+ char attr_path[SYSFS_PATH_SIZE]; \
+\
+ if (safe_sprintf(attr_path, "%s/%s/%s", sysfs_path, dev->devpath, #fname)) \
+ return 1; \
+\
+ if (wait_for_file(attr_path)) \
+ return 1; \
\
attr = sysfs_attr_get_value(dev->devpath, #fname); \
if (!attr) \
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> hare@suse.de +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failed path will not be recovered when disabling/enabling remote port
2009-07-20 16:46 ` Konrad Rzeszutek
@ 2009-07-21 6:19 ` Hannes Reinecke
2009-07-21 21:42 ` Konrad Rzeszutek
0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2009-07-21 6:19 UTC (permalink / raw)
To: device-mapper development
Hi Konrad,
Konrad Rzeszutek wrote:
>>> Could also be a race condition that is present in SLES10 + RHEL5
>>> kernels. Where the SysFS directories are created (and the udev event it
>>> sent out), but the kernel hasn't populated the SysFS directories. So
>>> when multipathd tries to read them it finds no pertient information and
>>> shoves it off to the 'orphan' state.
>>>
>> Really? With SLES10? Have you actually observed this?
>
> With SLES10 SP2 to be exact. It wasn't an issue with SLES10 since the
> initial patch was there. The equipment I used to test this was an
> AX150FC with failed batteries (so no cache writes) and with a failed
> controller so it would run extra slow.
>
>> We're running multipath _after_ udev has processed the event.
>
> Right, the one where the SysFS directory is created. Then multipatd
> reads the data. I remember posting it here and mentioning that this
> problem exists on SLES10SP2 and RHEL5 but not on the upstream kernels.
>
>> And udev already waited for sysfs, so we should be safe there.
>
> Not so. The udev gets the SCSI uevent creation, creates the /dev/sdX, and
> so. But the kernel hasn't yet fully populated the SysFS entries (so
> /sys/block/sdX/device/vendor does exist, but has no data in it).
>> It might be applicable to mainline multipath-tools, but
>
> It really depends on how the SysFS directories are populated and how
> slow the SCSI target is.
>
>> the SLES10 one ... I'd be surprised.
>>
>> Well, reasonably surprised. multipath keeps on throwing
>> an amazing number of issues still.
>>
>> Do you have more information here?
>
> Here is the patch along with a detailed description.
>
> The "multipath-tools-add-wait" patch is a backport/write of the
> wait_for_file routine used in the sysfs_get_[vendor|model|rev]
> macros. The SLES10 SP2 back-ported a lot of the upstream features
> of multipath, and one of those was getting rid of this function.
> I haven't yet found out the reason why it was deleted - looks
> as if a mistake as the upstream kernel _should_ cause the same
> set of problems with multipath.
> [update: Upstream kernel has this fixed]
>
> The reason a wait is necessary is due to the way the kernel
> sends the event. When a SCSI device is added the SCSI subsystem
> pursues this path:
>
> _sysfs_add_sdev:
> calls device_add ...
> [ '/devices/platform/host16/session6/target16:0:0/16:0:0:17'] uevent
> bus_attach_device
> bus_for_each_drv
> driver_probe_device
> sd_probe
> ['/class/scsi_disk/16:0:0:17' ] uevent
> add_disk
> ['/block/sdai'] [ Here multipath starts its job ]
>
> calls class_device_add ...
> [ '/class/scsi_device/16:0:0:17' ] uevent
> sg_add:
> [ '/class/scsi_generic/sg35' ] uevent
>
>
> done with device_add, and now we add the attributes:
> --> scsi_sysfs_sdev_attrs[i].vendor, model, rev <-- THIS is the
> problem.
>
> [Multipathd at the 'block/sdai' event has started analyzing the data, and
> it reads the SysFS, but the 'vendor', 'model' have no data so multipathd
> discards them an orphans the devices. That data gets to be there once
> 'device_add' is finished.]
>
Ah. Hmm. Seems you are correct.
I'll have to apply the patch, then.
Fancy opening a bugzilla for it?
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Failed path will not be recovered when disabling/enabling remote port
2009-07-21 6:19 ` Hannes Reinecke
@ 2009-07-21 21:42 ` Konrad Rzeszutek
0 siblings, 0 replies; 8+ messages in thread
From: Konrad Rzeszutek @ 2009-07-21 21:42 UTC (permalink / raw)
To: device-mapper development
> Ah. Hmm. Seems you are correct.
>
> I'll have to apply the patch, then.
>
> Fancy opening a bugzilla for it?
Done. BZ #524018.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke zSeries & Storage
> hare@suse.de +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Markus Rex, HRB 16746 (AG Nürnberg)
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-07-21 21:42 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-02 11:27 Failed path will not be recovered when disabling/enabling remote port Christian May
2009-07-02 11:44 ` Hannes Reinecke
2009-07-02 13:06 ` Konrad Rzeszutek
2009-07-02 13:16 ` Hannes Reinecke
2009-07-20 16:46 ` Konrad Rzeszutek
2009-07-21 6:19 ` Hannes Reinecke
2009-07-21 21:42 ` Konrad Rzeszutek
2009-07-02 17:51 ` Chandra Seetharaman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.