All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Mike Snitzer <snitzer@redhat.com>,
	"Stewart, Sean" <Sean.Stewart@netapp.com>
Cc: device-mapper development <dm-devel@redhat.com>,
	Bryn Reeves <breeves@redhat.com>,
	Alasdair Kergon <agk@redhat.com>
Subject: Re: dm-multipath: Accept failed paths for multipath maps
Date: Fri, 18 Jul 2014 08:12:16 +0200	[thread overview]
Message-ID: <53C8BAC0.2030106@suse.de> (raw)
In-Reply-To: <20140718000411.GB337@redhat.com>

On 07/18/2014 02:04 AM, Mike Snitzer wrote:
> On Wed, Dec 18 2013 at 10:28am -0500,
> Stewart, Sean <Sean.Stewart@netapp.com> wrote:
>
>> On Wed, 2013-12-18 at 15:25 +0100, Hannes Reinecke wrote:
>>> On 12/18/2013 03:08 PM, Mike Snitzer wrote:
>>>> On Wed, Dec 18 2013 at  2:52am -0500,
>>>> Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>>> The multipath kernel module is rejecting any map with an invalid
>>>>> device. However, as the multipathd is processing the events serially
>>>>> it will try to push a map with invalid devices if more than one
>>>>> device failed at the same time.
>>>>> So we can as well accept those maps and make sure to mark the
>>>>> paths as down.
>>>>
>>>> Why is it so desirable to do this?  Reduced latency to restore at least
>>>> one valid path when a bunch of paths go down?
>>>>
>>> Without this patch multipathd cannot update the map as long is
>>> hasn't catched up with udev.
>>> During that time any scheduling decisions by the kernel part are
>>> necessarily wrong, as it has to rely on the old map.
>>>
>>>> Why can't we just rely on userspace eventually figuring out which paths
>>>> are failed and pushing a valid map down?
>>>>
>>> Oh, you can. This is what we're doing now :-)
>>>
>>> But it will lead to spurious error during failover when multipathd
>>> is trying to push down maps with invalid devices.
>>>
>>> You are also running into a race window between checking the path in
>>> multipathd and pushing down the map; if the device disappears during
>>> that time you won't be able to push down the map.
>>> If that happens during boot multipathd won't be able to create the
>>> map at all, so you might not be able to boot here.
>>> With that patch you at least have the device-mapper device, allowing
>>> booting to continue.
>>>
>>>> Are there favorable reports that this new behavior actually helps?
>>>> Please quantify how.
>>>>
>>> NetApp will have; they've been pushing me to forward this patch.
>>> Sean?
>>>
>> Agree.  Internally, we have run into numerous cases with Red Hat where
>> the "failed in domap" error will occur, due to user space being behind,
>> or device detaching taking too long.  The most severe case is with
>> InfiniBand, where the LLD may place a device offline, then every single
>> reload that is trying to add a good path in will fail.  I will qualify
>> this by saying that I realize it is a problem that the device gets
>> placed offline in the first place, but this patch would allow it a
>> chance to continue on. The user still has to take manual steps to fix
>> the problem in this case, but it seems less disruptive to applications.
>>
>> The device detaching case could be kind of disruptive to a user in the
>> scenario they are upgrading the firmware on a NetApp E-Series box, and
>> with this patch, at least a good path is able to be added in ASAP.
>>
>>> BTW, SUSE / SLES is running happily with this patch for years now.
>>> So it can't be at all bad ...
>>>
>>> Cheers,
>>>
>>> Hannes
>>
>> Also agreed.  We have seen this functionality in SLES for years, and
>> have not run into a problem with it.
>
> Revisiting this can of worms...
>
> As part of full due-diligence on the approach that SUSE and NetApp have
> seemingly enjoyed "for years" I reviewed Hannes' v3 patch, fixed one
> issue and did some cleanup.  I then converted over to using a slightly
> different approach where-in the DM core becomes a more willing
> co-conspirator in this hack by introducing the ability to have
> place-holder devices (dm_dev without an opened bdev) referenced in a DM
> table.  The work is here:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=throwaway-dm-mpath-placeholder-devs
>
> Here is the diffstat of all 3 patches rolled up:
>
>   git diff d4bdac727f1e09412c762f177790a96432738264^..7681ae5ddb5d567800023477be7ddc68f9812a95 | diffstat
>   dm-mpath.c |   51 +++++++++++++++++++++++++++++++++++----------------
>   dm-table.c |   53 ++++++++++++++++++++++++++++++++++++++++-------------
>   dm.c       |    5 ++---
>   dm.h       |   12 ++++++++++++
>   4 files changed, 89 insertions(+), 32 deletions(-)
>
> But it was only compile tested, because doing more validation of this
> work would mean it has a snowballs chance in hell of seeing the light of
> upstream.  Sadly it doesn't have a good chance; it would require some
> compelling proof:
> 1) that mpath is bullet-proof no matter how crazy a user got with fake
>     place-holder devices in their DM tables (coupled with reinstate_path
>     messages, etc)
At worst the user ends up with tables packed with non-existing 
devices. Which would be equivalent to a table with all paths failed.
So from that I don't see a problem.
We only might run into issues where userspace fails to do proper 
garbage collection on those tables. But then again, this can happen 
nowadays, too, when all paths from a large configuration drop 
suddenly and multipathd crashes.

> 2) that the storage configs that experienced problems with the current
>     DM mpath dm_get_device() failures weren't broken to start with (for
>     instance ib srp is apparently fixed now.. but those fixes are still
>     working their way into RHEL) -- or put differently: I need _details_
>     on the NetApp or other legit storage configs that are still
>     experiencing problems without a solution to this problem.
>
This has nothing to do with 'legit' storage configs, this is a 
direct result of the udev event handling and the internal multipathd 
path checker logic.

udev event are issue sequentially, so multipathd has to process them 
one at a time (it even has a udev receiving thread which basically 
ensures sequential processing).

multipathd then has to process the event, and update the device 
table _based on that event_. At that time it simply _cannot_ know if 
there are other events relating to the same table queued.
If there are, multipathd has no other choice than to push an invalid 
table down into the kernel.

> ... and even with that proof I'm pretty sure Alasdair will hate this
> place-holder approach and will push for some other solution.
>
> I'm going away on paternity leave until Sept 8... my _hope_ is that
> someone fixes multipath-tools to suck less or that a more clever
> solution to this problem is developed locally in DM mpath.
>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

  parent reply	other threads:[~2014-07-18  6:12 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-18  7:52 [PATCHv2] dm-multipath: Accept failed paths for multipath maps Hannes Reinecke
2013-12-18 14:08 ` Mike Snitzer
2013-12-18 14:25   ` Hannes Reinecke
2013-12-18 15:28     ` Stewart, Sean
2013-12-19  8:21       ` Bart Van Assche
2014-02-19  1:14         ` Mike Snitzer
2014-02-19  8:11           ` Bart Van Assche
2014-07-18  0:04       ` Mike Snitzer
2014-07-18  0:23         ` Mike Snitzer
2014-07-18  6:00           ` Hannes Reinecke
2014-07-18 16:04             ` Mike Snitzer
2014-07-18 16:15               ` Mike Snitzer
2014-07-21  6:05                 ` Hannes Reinecke
2014-07-18  0:29         ` John Utz
2014-07-18  2:18           ` Mike Snitzer
2014-07-18  3:31             ` John Utz
2014-07-18  5:57               ` Hannes Reinecke
2014-07-18 14:38                 ` Mike Snitzer
2014-07-18 17:04                   ` John Utz
2014-07-21 14:23                     ` ZAC target (Was: Re: dm-multipath: Accept failed paths for multipath maps) Hannes Reinecke
2014-07-21 19:28                       ` Kent Overstreet
2014-07-22  5:46                         ` Hannes Reinecke
2014-07-22  8:08                           ` Matias Bjorling
2014-07-18 16:51                 ` dm-multipath: Accept failed paths for multipath maps John Utz
2014-07-18  6:12         ` Hannes Reinecke [this message]
2013-12-18 15:28     ` Merla, ShivaKrishna

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53C8BAC0.2030106@suse.de \
    --to=hare@suse.de \
    --cc=Sean.Stewart@netapp.com \
    --cc=agk@redhat.com \
    --cc=breeves@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.