public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Disabling dev_loss_tmo?
@ 2007-11-13  9:11 Tore Anderson
  2007-11-13 10:24 ` Michael Loehr
  2007-11-13 16:18 ` James Smart
  0 siblings, 2 replies; 9+ messages in thread
From: Tore Anderson @ 2007-11-13  9:11 UTC (permalink / raw)
  To: Linux SCSI Mailing List

Hi.  Recent kernels will remove the block devices if a FC rport is lost,
which causes a number of problems when dm-multipath is used:

1) Multipathd will receive an event notifying it of the removed rport,
and will respond by removing the path.  This causes a suspend which
flushes outstanding I/O, and in a all-paths-down scenario this will
cause I/O errors to propagate up to the file system layer - even if
queue_if_no_path is in use.  This is fixed in newer versions of
multipath-tools, but old versions are still shipped by the various
server distros.

http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/4005

2) Multipathd will often keep open the device as it's being removed,
resulting in an error message when attempting to re-register the
recently revived rport:

«object_add failed for H:B:T:L with -EEXIST, don't try to register
things with the same name in the same directory»

The newly added path will therefore not make it back into the
dm-multipath map (and won't be available as a block device either).

http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/4240/focus=4255

3) Even when the -EEXIST error doesn't show up, udev/multipath/something
seems to get it wrong sometimes. Either the revived path is added to the
wrong (a new) priority group, or it's not added at all.  Most of the
time it works fine, but it's can't be relied upon in my experience.
Haven't been able to track this one down, unfortunately.

Anyway.  I believe all of these problems would be possible to avoid if I
could simply make it so that block devices would never be removed due to
rports becoming unavailable.  dm-multipath would fail the path anyway,
and multipathd would just keep on testing its availability and would
re-instate when/if it came back online.  If it didn't, it would of
course hang around as harmless junk - but fibre channel SANs are usually
quite stable anyway, and the admin will always have the possibility of
removing the block device manually if it bugs him.  In any case it would
be better than the loss of reliability I experience now.

So what I suggest is a way of disabling dev_loss_tmo (or setting it to
unlimited).  Think that's doable for a kernel newbie like me, or are
there any takers?

Regards
-- 
Tore Anderson
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disabling dev_loss_tmo?
  2007-11-13  9:11 Disabling dev_loss_tmo? Tore Anderson
@ 2007-11-13 10:24 ` Michael Loehr
  2007-11-13 11:07   ` Tore Anderson
  2007-11-13 16:18 ` James Smart
  1 sibling, 1 reply; 9+ messages in thread
From: Michael Loehr @ 2007-11-13 10:24 UTC (permalink / raw)
  To: Tore Anderson; +Cc: Linux SCSI Mailing List, linux-scsi-owner

Hello Tore,

in recent SLES10 kernels the additional setting "remove_on_dev_loss" allows
to control, if the devices are removed or not. Unfortunately this change
did not find its way upstream yet ...

best regards

  Michael

---
Michael Loehr
Linux for zSeries Development
IBM, Mainz, Germany


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disabling dev_loss_tmo?
  2007-11-13 10:24 ` Michael Loehr
@ 2007-11-13 11:07   ` Tore Anderson
  0 siblings, 0 replies; 9+ messages in thread
From: Tore Anderson @ 2007-11-13 11:07 UTC (permalink / raw)
  To: Michael Loehr; +Cc: Linux SCSI Mailing List, mdr, hch, James.Smart

* Michael Loehr

> in recent SLES10 kernels the additional setting "remove_on_dev_loss" allows
> to control, if the devices are removed or not. Unfortunately this change
> did not find its way upstream yet ...

Thanks for pointing this out.  I found that this patch was submitted
here over a year ago, but unfortunately was rejected:

http://www.spinics.net/lists/linux-scsi/msg10289.html (I'm adding Cc's
to the folks that posted to that thread)

Christoph, would you be inclined to reconsider your position on this
patch if it was changed so that the default behaviour would still be to
remove devices after dev_loss_tmo expired?

The current behaviour is causing troubles because of intermittent rport
unavailability and the subsequent remove/readd actions way more often
than I am permanently removing targets from my SAN, so for me being able
to disable the automatic device removal would be a huge improvement for
me (and I doubt I'm the only one that feels this way).

Regards
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disabling dev_loss_tmo?
  2007-11-13  9:11 Disabling dev_loss_tmo? Tore Anderson
  2007-11-13 10:24 ` Michael Loehr
@ 2007-11-13 16:18 ` James Smart
  2007-11-14  8:10   ` Tore Anderson
  1 sibling, 1 reply; 9+ messages in thread
From: James Smart @ 2007-11-13 16:18 UTC (permalink / raw)
  To: Tore Anderson
  Cc: Linux SCSI Mailing List, Michael Reed, Christoph Hellwig, MLOEHR

We had a lot of conversations on what to have the transport do after
connectivity was lost to a device. Suffice to say - the answer was to
remove the device.  The dev_loss_tmo value was the compromise between
the kernel architectural position, and what FC drivers had always
managed and hidden from the kernel in the past.

Unfortunately, even though DM has known of this behavior for a long time
(it's existed since 2.6.<early teens>, no-one has bothered to update DM
to support it. One train of thought is : fixing it for FC doesn't
address the issue, as other transports may still encounter it. It's a DM
thing, and should stay this way to ensure that DM fixes it.

You noted that the FC transport, in SLES10 and RHEL5, added a patch
that allowed for the scsi targets not to be torn down when dev_loss_tmo
timed out. This had little to do with DM, and everything to do with
reuse-after-free issues on mid-layer data structures that were released
as part of the teardown, as well as the timing of the upstream reuse
patches vs what the distro kernels could accept. But DM certainly
benefited from its behavior.

I'd rather that DM got fixed so that it supports the necessary
architectural behavior. But, we've lived with the disto-specific
behavior as well, so it's not a strong sentiment.

-- james s


Tore Anderson wrote:
> Hi.  Recent kernels will remove the block devices if a FC rport is lost,
> which causes a number of problems when dm-multipath is used:
> 
> 1) Multipathd will receive an event notifying it of the removed rport,
> and will respond by removing the path.  This causes a suspend which
> flushes outstanding I/O, and in a all-paths-down scenario this will
> cause I/O errors to propagate up to the file system layer - even if
> queue_if_no_path is in use.  This is fixed in newer versions of
> multipath-tools, but old versions are still shipped by the various
> server distros.
> 
> http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/4005
> 
> 2) Multipathd will often keep open the device as it's being removed,
> resulting in an error message when attempting to re-register the
> recently revived rport:
> 
> «object_add failed for H:B:T:L with -EEXIST, don't try to register
> things with the same name in the same directory»
> 
> The newly added path will therefore not make it back into the
> dm-multipath map (and won't be available as a block device either).
> 
> http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/4240/focus=4255
> 
> 3) Even when the -EEXIST error doesn't show up, udev/multipath/something
> seems to get it wrong sometimes. Either the revived path is added to the
> wrong (a new) priority group, or it's not added at all.  Most of the
> time it works fine, but it's can't be relied upon in my experience.
> Haven't been able to track this one down, unfortunately.
> 
> Anyway.  I believe all of these problems would be possible to avoid if I
> could simply make it so that block devices would never be removed due to
> rports becoming unavailable.  dm-multipath would fail the path anyway,
> and multipathd would just keep on testing its availability and would
> re-instate when/if it came back online.  If it didn't, it would of
> course hang around as harmless junk - but fibre channel SANs are usually
> quite stable anyway, and the admin will always have the possibility of
> removing the block device manually if it bugs him.  In any case it would
> be better than the loss of reliability I experience now.
> 
> So what I suggest is a way of disabling dev_loss_tmo (or setting it to
> unlimited).  Think that's doable for a kernel newbie like me, or are
> there any takers?
> 
> Regards
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disabling dev_loss_tmo?
  2007-11-13 16:18 ` James Smart
@ 2007-11-14  8:10   ` Tore Anderson
  2007-11-14 14:29     ` James Smart
  0 siblings, 1 reply; 9+ messages in thread
From: Tore Anderson @ 2007-11-14  8:10 UTC (permalink / raw)
  To: James.Smart
  Cc: Linux SCSI Mailing List, Michael Reed, Christoph Hellwig, MLOEHR

* James Smart

> We had a lot of conversations on what to have the transport do after 
> connectivity was lost to a device. Suffice to say - the answer was to
> remove the device.  The dev_loss_tmo value was the compromise
> between the kernel architectural position, and what FC drivers had
> always managed and hidden from the kernel in the past.
> 
> Unfortunately, even though DM has known of this behavior for a long
> time (it's existed since 2.6.<early teens>, no-one has bothered to
> update DM to support it. One train of thought is : fixing it for FC
> doesn't address the issue, as other transports may still encounter
> it. It's a DM thing, and should stay this way to ensure that DM fixes
> it.

So basically you're forcing breakage on users to corece the DM folks to
fix their end?  Maybe it's time to re-think that strategy, seeing that
it hasn't been fixed in such a long time it seems unlikely that they
would start bothering now.  SuSE and RH works around it anyway, so
everyone who's obedient enough to run the enterprise distros their
storage vendor tells them to won't have any problems.  Many of the DM
folks are employed by SuSE, RH, and various storage vendors - go figure.

> You noted that the FC transport, in SLES10 and RHEL5, added a patch 
> that allowed for the scsi targets not to be torn down when
> dev_loss_tmo timed out. This had little to do with DM, and everything
> to do with reuse-after-free issues on mid-layer data structures that
> were released as part of the teardown, as well as the timing of the
> upstream reuse patches vs what the distro kernels could accept. But
> DM certainly benefited from its behavior.
> 
> I'd rather that DM got fixed so that it supports the necessary 
> architectural behavior. But, we've lived with the disto-specific 
> behavior as well, so it's not a strong sentiment.

I'm just a simple user.  To me the most important ting is that it - and
by «it» I'm referring to the whole bundle of DM, SCSI, HBA driver, and
the rest of the system - actually works.

With no way of disabling dev_loss_tmo, it doesn't.  It will break after
intermittent failures, exactly the time where you need it to work the
most.  Knowing that the SCSI FC transport does the Right Thing isn't
really any consolation.

The patch seems like a rather simple fix.  Quick and dirty, sure, but it
would actually help out the likes of me who are putting this stuff in
production.  And if it defaulted to remove_on_dev_loss=0, it wouldn't
really be intrusive either.  It seems that it will take a while to get
this properly fixed (both in DM and the -EEXIST issue), so what I'm
asking is just a way to make it work in the interim.

Regards
-- 
Tore Anderson
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disabling dev_loss_tmo?
  2007-11-14  8:10   ` Tore Anderson
@ 2007-11-14 14:29     ` James Smart
  2007-11-14 15:38       ` Tore Anderson
  0 siblings, 1 reply; 9+ messages in thread
From: James Smart @ 2007-11-14 14:29 UTC (permalink / raw)
  To: Tore Anderson
  Cc: Linux SCSI Mailing List, Michael Reed, Christoph Hellwig, MLOEHR

Tore Anderson wrote:
> So basically you're forcing breakage on users to corece the DM folks to
> fix their end?  Maybe it's time to re-think that strategy, seeing that
> it hasn't been fixed in such a long time it seems unlikely that they
> would start bothering now.  SuSE and RH works around it anyway, so
> everyone who's obedient enough to run the enterprise distros their
> storage vendor tells them to won't have any problems.  Many of the DM
> folks are employed by SuSE, RH, and various storage vendors - go figure.

Please don't shoot the messenger. I was trying to summarize the evolution
of FC into the kernel, highlight that the DM team knew of the issue,
had higher-priority issues, and that it applies to more than FC.  Having
experienced first-hand this method of getting work done, I certainly
don't promote it as a productive way of doing things.

> I'm just a simple user.  To me the most important ting is that it - and
> by «it» I'm referring to the whole bundle of DM, SCSI, HBA driver, and
> the rest of the system - actually works.
> 
> With no way of disabling dev_loss_tmo, it doesn't.  It will break after
> intermittent failures, exactly the time where you need it to work the
> most.  Knowing that the SCSI FC transport does the Right Thing isn't
> really any consolation.

I'm highlighting that - ok today, we get FC working, but then the user
puts DM on something else, like iSCSI or SAS/whatever, then it too
breaks - and that's ok ?

> The patch seems like a rather simple fix.  Quick and dirty, sure, but it
> would actually help out the likes of me who are putting this stuff in
> production.  And if it defaulted to remove_on_dev_loss=0, it wouldn't
> really be intrusive either.  It seems that it will take a while to get
> this properly fixed (both in DM and the -EEXIST issue), so what I'm
> asking is just a way to make it work in the interim.

Using this analogy, to resolve the reuse-after-free issues, we simply
would have used the dont-tear-down patches to avoid teardown bugs,
like what the distros did.  This too is a bad approach. Things need to
get fixed where things need to get fixed.  We can add the FC patch, but
DM still needs to get fixed.

I'll defer to James on what he'd like to see happen in his subsystem,
as this does set precendence.

-- james s

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disabling dev_loss_tmo?
  2007-11-14 14:29     ` James Smart
@ 2007-11-14 15:38       ` Tore Anderson
  2007-11-14 19:00         ` James Smart
  0 siblings, 1 reply; 9+ messages in thread
From: Tore Anderson @ 2007-11-14 15:38 UTC (permalink / raw)
  To: James.Smart
  Cc: Linux SCSI Mailing List, Michael Reed, Christoph Hellwig, MLOEHR

* James Smart

> Please don't shoot the messenger. I was trying to summarize the
> evolution of FC into the kernel, highlight that the DM team knew of
> the issue, had higher-priority issues, and that it applies to more
> than FC.  Having experienced first-hand this method of getting work
> done, I certainly don't promote it as a productive way of doing
> things.

Sorry, didn't mean to shoot you!  ;-)  I mean "you" in the plural sense;
«you SCSI people» - got the impression there is some kind of disdain
towards the DM team (which would indeed be a counter-productive way of
getting things done).  Not my intention to offend anyone, and if I did
anyway I apologise.

It's just frusterating when things don't work...

> I'm highlighting that - ok today, we get FC working, but then the
> user puts DM on something else, like iSCSI or SAS/whatever, then it
> too breaks - and that's ok ?

Of course the best would be if it worked perfectly with all transports,
and leaving things broken is of course not OK.  In my strictly pragmatic
opinion, though, having a functional DM+FC combo and and half-functional
DM+<anything else> combos is better than having _only_ half-functional
combos.

> Using this analogy, to resolve the reuse-after-free issues, we simply
> would have used the dont-tear-down patches to avoid teardown bugs, 
> like what the distros did.  This too is a bad approach. Things need
> to get fixed where things need to get fixed.  We can add the FC
> patch, but DM still needs to get fixed.

Best would to have DM-multipath handle the disconnects gracefully, of
course.  But since it doesn't appear to be happening anytime soon:  A
workaround provided by the transport layer would be very welcome!  I
don't use iSCSI or SAS with DM so I don't know if such a workaround is
wanted there too, but with FC it is necessary.  Even if it is a bad
approach it is much better than nothing, the way I see it.

Regards
-- 
Tore Anderson
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Disabling dev_loss_tmo?
  2007-11-14 15:38       ` Tore Anderson
@ 2007-11-14 19:00         ` James Smart
  2007-11-15  8:02           ` [dm-devel] " Mike Anderson
  0 siblings, 1 reply; 9+ messages in thread
From: James Smart @ 2007-11-14 19:00 UTC (permalink / raw)
  Cc: Linux SCSI Mailing List, Christoph Hellwig,
	device-mapper development, MLOEHR, Michael Reed

I've added the dm reflector to this email...

> Best would to have DM-multipath handle the disconnects gracefully, of
> course.  But since it doesn't appear to be happening anytime soon:  A
> workaround provided by the transport layer would be very welcome!  I
> don't use iSCSI or SAS with DM so I don't know if such a workaround is
> wanted there too, but with FC it is necessary.  Even if it is a bad
> approach it is much better than nothing, the way I see it.

Background of where this thread started is:
http://marc.info/?l=linux-scsi&m=119494675103771&w=2

Can someone from the DM community comment on where things are, or are
going, for handling disconnects w/ device teardown ??

-- james s

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dm-devel] Re: Disabling dev_loss_tmo?
  2007-11-14 19:00         ` James Smart
@ 2007-11-15  8:02           ` Mike Anderson
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Anderson @ 2007-11-15  8:02 UTC (permalink / raw)
  To: James.Smart, device-mapper development
  Cc: Linux SCSI Mailing List, Christoph Hellwig, MLOEHR, Michael Reed,
	Christophe Varoqui, Tore Anderson

James Smart <James.Smart@Emulex.Com> wrote:
> I've added the dm reflector to this email...
> 
> >Best would to have DM-multipath handle the disconnects gracefully, of
> >course.  But since it doesn't appear to be happening anytime soon:  A
> >workaround provided by the transport layer would be very welcome!  I
> >don't use iSCSI or SAS with DM so I don't know if such a workaround is
> >wanted there too, but with FC it is necessary.  Even if it is a bad
> >approach it is much better than nothing, the way I see it.
> 
> Background of where this thread started is:
> http://marc.info/?l=linux-scsi&m=119494675103771&w=2
> 
> Can someone from the DM community comment on where things are, or are
> going, for handling disconnects w/ device teardown ??
> 

Since the intent is for mutltipathd to handle these events it would seem
that it would be good to try to fix the issues in mainline code instead of
adding work arounds.

I was not able to replicate all the error previously described in the
above referenced url, but maybe some where on different revs of
multipath tools vs the ones I used.

On queue_if_no_path with the all-paths-down case I assume we would need
multipath to allow a table with 0 priority groups (or some other method of
holding the dm in place, but someone from the list most likely has a
better answer.

If you notice Ex 2. that when I used the multipath-tools from the git tree
I did not receive events until in Ex 3 I added a udev rule to get
multipathd to receive events. The multipath-tools change to use an
abstract namespace socket for communication with udevd will be used unless
the socket operation call fails where it will then fallback to the direct
kobject uevent netlink socket. Christophe V can add better context here.

Some results on some experiments performed.
Ex 1.
	- Using linux-2.6 git head and device-mapper-multipath-0.4.7-12.el5
	on a RHEL5.1 distro.
	- On the FC switch I port disabled the port going to the target.
	- multipathd received event and loaded new table minus the removed
	  devices.
	- On reenabling the the port the devces where added back into the
	  table.
	- On disabling both target ports multipathd received both events
	  and removed both multipath devices.
	- On renabling both ports multipathd only added one path.
Ex 2.
	- Using linux-2.6 git head and multipath-tools git head on a
	  RHEL5.1 distro (default distro udev rules).
	- On the FC switch I port disabled the port going to the target.
	- multipathd did not receive the remove event.
	- udevmonitor showed the remove events "rport-2:0-2: blocked FC
	  remote port time out: removing target and saving binding"
	- On reenabling the the port a number of warnings where generated
	  due to the previous sysfs names still existing which resulted in
	  new devices not having everything setup correctly.
Ex 3.
	- Same setup as Ex 1, but I added to the multipath udev rule:
	  RUN+="socket:/org/kernel/dm/multipath_event"
	- multipathd received event and loaded new table minus the removed
	  devices.
	- On reenabling the the port the devces where added back into the
	  table.
	- On disabling both target ports multipathd received both events
	  and removed both multipath devices. In progress dd received
	  errors (expected).

Ex 4.
	- Same setup as Ex 3, but I added queue_if_nopath
	- On disabling both target ports multipathd received both events
	  and removed both multipath devices. In progress dd received
	  errors (expected).


I provided config info at the bottom of this email for reference.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

Test config info

# uname -a
Linux elm3b87 2.6.24-rc2am1 #1 SMP Wed Nov 14 10:08:48 PST 2007 x86_64
x86_64 x86_64 GNU/Linux

# dmidecode |grep "Product Name"
        Product Name: BladeCenter LS21 -[7971AC1]-
        Product Name: Server Blade
# lspci
...
03:05.0 Fibre Channel: Emulex Corporation Helios-X LightPulse Fibre Channel Host Adapter (rev 01)
03:05.1 Fibre Channel: Emulex Corporation Helios-X LightPulse Fibre Channel Host Adapter (rev 01)

# ./lsscsi
[0:0:0:0]    disk    IBM-ESXS ST936701SS       B51D  /dev/sda
[1:0:0:0]    disk    IBM      1815      FAStT  0914  /dev/sdb
[1:0:0:1]    disk    IBM      1815      FAStT  0914  /dev/sdc
[2:0:0:0]    disk    IBM      1815      FAStT  0914  /dev/sdd
[2:0:0:1]    disk    IBM      1815      FAStT  0914  /dev/sde

# ./lsscsi --host
[0]    mptsas        
[1]    lpfc          
[2]    lpfc 

# multipath-tools version
commit fa75d374cad8fa966dcf17dc18eee4ef5e70ff33

# multipath -l
mpath29 (3600a0b800011a1ee00001e5a46eab101) dm-2 IBM,1815      FAStT
[size=512M][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:0:0 sdb 8:16  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:0:0 sdd 8:48  [active][undef]
mpath5 (3600a0b800011a1ee00001e5c46eab185) dm-4 IBM,1815      FAStT
[size=512M][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:0:1 sdc 8:32  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:0:1 sde 8:64  [active][undef]



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-11-15  8:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-13  9:11 Disabling dev_loss_tmo? Tore Anderson
2007-11-13 10:24 ` Michael Loehr
2007-11-13 11:07   ` Tore Anderson
2007-11-13 16:18 ` James Smart
2007-11-14  8:10   ` Tore Anderson
2007-11-14 14:29     ` James Smart
2007-11-14 15:38       ` Tore Anderson
2007-11-14 19:00         ` James Smart
2007-11-15  8:02           ` [dm-devel] " Mike Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox