From: James Smart <James.Smart@Emulex.Com>
To: Tore Anderson <tore@linpro.no>
Cc: Linux SCSI Mailing List <linux-scsi@vger.kernel.org>,
Michael Reed <mdr@sgi.com>, Christoph Hellwig <hch@infradead.org>,
MLOEHR@de.ibm.com
Subject: Re: Disabling dev_loss_tmo?
Date: Tue, 13 Nov 2007 11:18:49 -0500 [thread overview]
Message-ID: <4739CE69.3040602@emulex.com> (raw)
In-Reply-To: <47396A5B.5040001@linpro.no>
We had a lot of conversations on what to have the transport do after
connectivity was lost to a device. Suffice to say - the answer was to
remove the device. The dev_loss_tmo value was the compromise between
the kernel architectural position, and what FC drivers had always
managed and hidden from the kernel in the past.
Unfortunately, even though DM has known of this behavior for a long time
(it's existed since 2.6.<early teens>, no-one has bothered to update DM
to support it. One train of thought is : fixing it for FC doesn't
address the issue, as other transports may still encounter it. It's a DM
thing, and should stay this way to ensure that DM fixes it.
You noted that the FC transport, in SLES10 and RHEL5, added a patch
that allowed for the scsi targets not to be torn down when dev_loss_tmo
timed out. This had little to do with DM, and everything to do with
reuse-after-free issues on mid-layer data structures that were released
as part of the teardown, as well as the timing of the upstream reuse
patches vs what the distro kernels could accept. But DM certainly
benefited from its behavior.
I'd rather that DM got fixed so that it supports the necessary
architectural behavior. But, we've lived with the disto-specific
behavior as well, so it's not a strong sentiment.
-- james s
Tore Anderson wrote:
> Hi. Recent kernels will remove the block devices if a FC rport is lost,
> which causes a number of problems when dm-multipath is used:
>
> 1) Multipathd will receive an event notifying it of the removed rport,
> and will respond by removing the path. This causes a suspend which
> flushes outstanding I/O, and in a all-paths-down scenario this will
> cause I/O errors to propagate up to the file system layer - even if
> queue_if_no_path is in use. This is fixed in newer versions of
> multipath-tools, but old versions are still shipped by the various
> server distros.
>
> http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/4005
>
> 2) Multipathd will often keep open the device as it's being removed,
> resulting in an error message when attempting to re-register the
> recently revived rport:
>
> «object_add failed for H:B:T:L with -EEXIST, don't try to register
> things with the same name in the same directory»
>
> The newly added path will therefore not make it back into the
> dm-multipath map (and won't be available as a block device either).
>
> http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/4240/focus=4255
>
> 3) Even when the -EEXIST error doesn't show up, udev/multipath/something
> seems to get it wrong sometimes. Either the revived path is added to the
> wrong (a new) priority group, or it's not added at all. Most of the
> time it works fine, but it's can't be relied upon in my experience.
> Haven't been able to track this one down, unfortunately.
>
> Anyway. I believe all of these problems would be possible to avoid if I
> could simply make it so that block devices would never be removed due to
> rports becoming unavailable. dm-multipath would fail the path anyway,
> and multipathd would just keep on testing its availability and would
> re-instate when/if it came back online. If it didn't, it would of
> course hang around as harmless junk - but fibre channel SANs are usually
> quite stable anyway, and the admin will always have the possibility of
> removing the block device manually if it bugs him. In any case it would
> be better than the loss of reliability I experience now.
>
> So what I suggest is a way of disabling dev_loss_tmo (or setting it to
> unlimited). Think that's doable for a kernel newbie like me, or are
> there any takers?
>
> Regards
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2007-11-13 16:35 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-13 9:11 Disabling dev_loss_tmo? Tore Anderson
2007-11-13 10:24 ` Michael Loehr
2007-11-13 11:07 ` Tore Anderson
2007-11-13 16:18 ` James Smart [this message]
2007-11-14 8:10 ` Tore Anderson
2007-11-14 14:29 ` James Smart
2007-11-14 15:38 ` Tore Anderson
2007-11-14 19:00 ` James Smart
2007-11-15 8:02 ` [dm-devel] " Mike Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4739CE69.3040602@emulex.com \
--to=james.smart@emulex.com \
--cc=MLOEHR@de.ibm.com \
--cc=hch@infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mdr@sgi.com \
--cc=tore@linpro.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox