From: Hannes Reinecke <hare@suse.de>
To: dm-devel@redhat.com
Subject: Re: IO Ordering on Path Failover
Date: Wed, 05 Mar 2014 08:54:45 +0100 [thread overview]
Message-ID: <5316D845.7030402@suse.de> (raw)
In-Reply-To: <CAPB7CupKZqW_tLZHcYsMS2eyqkFnmb0gDE2JrDoTqRpn43fsgg@mail.gmail.com>
On 02/28/2014 11:42 PM, Bob Bawn wrote:
> I am trying to understand how IO ordering safety is enforced with on
> path failover. This is new territory for me so forgive me if this is
> obvious. Consider the sequence:
>
> 1. client writes(lba=0,val=x) on path A.
> 2. multipath declares path A dead and retries write on path B
> 3. retried write on path B completes successfully and client get ack'd
> 4. client writes(lba=0,val=y) on path B. It also completes
> successfully and is ack'd to client
> 5. write from (1) completes and corrupts data
>
> It seems like multipath needs a guarantee at step 2 that the original
> write won't complete after path A has been declared down. I thought it
> would issue something like a LUN RESET on path B and that the response
> to that reset would indicate that it is safe to proceed. This page
> sort of supports that speculation:
> http://scst.sourceforge.net/mc_s.html
>
No, that assumption is wrong.
Strict ordering is only guaranteed for commands submitted from the
HBA to the wire. Once it's in-flight there are _no_ guarantees about
ordering. Eg in a FC Fabric there might be several paths to the same
target, each of which might have a different latency.
So I/O on one path might actually be faster than the other one.
And with CNA's it's virtually impossible to guarantee any I/O
ordering due to several hardware queues involved etc.
Same goes for the linux block layer; the only _enforced_ ordering of
sorts is done by I/O being sent from the page-cache, as each page
can submit only one I/O at a time.
But as soon as you're using O_DIRECT you don't have any ordering
guarantees, either, and it's up to the application to ensure any
ordering requirements.
Which is also what all filesystems do; for any critical section they
wait for the I/O result before continuing.
So for failover any retries will be covered by the multipath layer,
and only the final I/O result will be returned to the upper layers,
rendering any multipath failover invisible.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
prev parent reply other threads:[~2014-03-05 7:54 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-28 22:42 IO Ordering on Path Failover Bob Bawn
2014-03-05 7:54 ` Hannes Reinecke [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5316D845.7030402@suse.de \
--to=hare@suse.de \
--cc=dm-devel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.