All of lore.kernel.org
 help / color / mirror / Atom feed
* IO Ordering on Path Failover
@ 2014-02-28 22:42 Bob Bawn
  2014-03-05  7:54 ` Hannes Reinecke
  0 siblings, 1 reply; 2+ messages in thread
From: Bob Bawn @ 2014-02-28 22:42 UTC (permalink / raw)
  To: dm-devel

I am trying to understand how IO ordering safety is enforced with on
path failover. This is new territory for me so forgive me if this is
obvious. Consider the sequence:

1. client writes(lba=0,val=x) on path A.
2. multipath declares path A dead and retries write on path B
3. retried write on path B completes successfully and client get ack'd
4. client writes(lba=0,val=y) on path B. It also completes
successfully and is ack'd to client
5. write from (1) completes and corrupts data

It seems like multipath needs a guarantee at step 2 that the original
write won't complete after path A has been declared down. I thought it
would issue something like a LUN RESET on path B and that the response
to that reset would indicate that it is safe to proceed. This page
sort of supports that speculation:
http://scst.sourceforge.net/mc_s.html

In testing with dm-multipath accessing an LIO/IBLOCK device over FC, I
don't see a LUN RESET. I also don't see any indication of this in the
code but I might be missing something.

It occurred to me that my target might not be setting the correct bits
for multipathing.

Any advice?

-- 
Bob Bawn

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: IO Ordering on Path Failover
  2014-02-28 22:42 IO Ordering on Path Failover Bob Bawn
@ 2014-03-05  7:54 ` Hannes Reinecke
  0 siblings, 0 replies; 2+ messages in thread
From: Hannes Reinecke @ 2014-03-05  7:54 UTC (permalink / raw)
  To: dm-devel

On 02/28/2014 11:42 PM, Bob Bawn wrote:
> I am trying to understand how IO ordering safety is enforced with on
> path failover. This is new territory for me so forgive me if this is
> obvious. Consider the sequence:
> 
> 1. client writes(lba=0,val=x) on path A.
> 2. multipath declares path A dead and retries write on path B
> 3. retried write on path B completes successfully and client get ack'd
> 4. client writes(lba=0,val=y) on path B. It also completes
> successfully and is ack'd to client
> 5. write from (1) completes and corrupts data
> 
> It seems like multipath needs a guarantee at step 2 that the original
> write won't complete after path A has been declared down. I thought it
> would issue something like a LUN RESET on path B and that the response
> to that reset would indicate that it is safe to proceed. This page
> sort of supports that speculation:
> http://scst.sourceforge.net/mc_s.html
> 
No, that assumption is wrong.

Strict ordering is only guaranteed for commands submitted from the
HBA to the wire. Once it's in-flight there are _no_ guarantees about
ordering. Eg in a FC Fabric there might be several paths to the same
target, each of which might have a different latency.
So I/O on one path might actually be faster than the other one.
And with CNA's it's virtually impossible to guarantee any I/O
ordering due to several hardware queues involved etc.

Same goes for the linux block layer; the only _enforced_ ordering of
sorts is done by I/O being sent from the page-cache, as each page
can submit only one I/O at a time.
But as soon as you're using O_DIRECT you don't have any ordering
guarantees, either, and it's up to the application to ensure any
ordering requirements.

Which is also what all filesystems do; for any critical section they
wait for the I/O result before continuing.

So for failover any retries will be covered by the multipath layer,
and only the final I/O result will be returned to the upper layers,
rendering any multipath failover invisible.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-03-05  7:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-28 22:42 IO Ordering on Path Failover Bob Bawn
2014-03-05  7:54 ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.