linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Bill Davidsen <davidsen@tmr.com>
Cc: Peter Grandi <pg_xf2@xf2.for.sabi.co.UK>,
	Linux RAID <linux-raid@vger.kernel.org>,
	Linux XFS <xfs@oss.sgi.com>
Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
Date: Fri, 19 Dec 2008 09:26:21 +1100	[thread overview]
Message-ID: <20081218222621.GA17177@disturbed> (raw)
In-Reply-To: <494971B2.1000103@tmr.com>

On Wed, Dec 17, 2008 at 04:40:02PM -0500, Bill Davidsen wrote:
> What really bothers me is that there's no obvious need for
> barriers at the device level if the file system is just a bit
> smarter and does it's own async io (like aio_*), because you can
> track writes outstanding on a per-fd basis, so instead of stopping
> the flow of data to the drive, you can just block a file
> descriptor and wait for the count of outstanding i/o to drop to
> zero. That provides the order semantics of barriers as far as I
> can see, having tirelessly thought about it for ten minutes or so.

Well, you've pretty much described the algorithm XFS uses in it's
transaction system - it's entirely asynchronous - and it's been
clear for many, many years that this model is broken when you have
devices with volatile write caches and internal re-ordering.  I/O
completion on such devices does not guarantee data is safe on stable
storage.

If the device does not commit writes to stable storage in the same
order they are signalled as complete (i.e. internal device
re-ordering occurred after completion), then the device violates
fundamental assumptions about I/O completion that the above model
relies on.

XFS uses barriers to guarantee that the devices don't lie about the
completion order of critical I/O, not that the I/Os are on stable
storage. The fact that this causes cache flushes to stable storage
is result of the implementation of that guarantee of ordering. I'm
sure the linux barrier implementation could be smarter and faster
(for some hardware), but for an operation that is used to guarantee
integrity I'll take conservative and safe over smart and fast any
day of the week....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2008-12-18 22:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-06 14:28 12x performance drop on md/linux+sw raid1 due to barriers [xfs] Justin Piszcz
2008-12-06 15:36 ` Eric Sandeen
2008-12-06 20:35   ` Redeeman
2008-12-13 12:54   ` Justin Piszcz
2008-12-13 17:26     ` Martin Steigerwald
2008-12-13 17:40       ` Eric Sandeen
2008-12-14  3:31         ` Redeeman
2008-12-14 14:02           ` Peter Grandi
2008-12-14 18:12             ` Martin Steigerwald
2008-12-14 22:02               ` Peter Grandi
2008-12-15 22:38                 ` Dave Chinner
2008-12-16  9:39                   ` Martin Steigerwald
2008-12-16 20:57                     ` Peter Grandi
2008-12-16 23:14                     ` Dave Chinner
2008-12-17 21:40                 ` Bill Davidsen
2008-12-18  8:20                   ` Leon Woestenberg
2008-12-18 23:33                     ` Bill Davidsen
2008-12-21 19:16                     ` Peter Grandi
2008-12-22 13:19                       ` Leon Woestenberg
2008-12-18 22:26                   ` Dave Chinner [this message]
2008-12-20 14:06               ` Peter Grandi
2008-12-14 18:35             ` Martin Steigerwald
2008-12-14 17:49           ` Martin Steigerwald
2008-12-14 23:36         ` Dave Chinner
2008-12-13 18:01       ` David Lethe
2008-12-06 18:42 ` Peter Grandi
2008-12-11  0:20 ` Bill Davidsen
2008-12-11  9:18   ` Justin Piszcz
2008-12-11  9:24     ` Justin Piszcz
  -- strict thread matches above, loose matches on Subject: below --
2008-12-14 18:33 Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081218222621.GA17177@disturbed \
    --to=david@fromorbit.com \
    --cc=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=pg_xf2@xf2.for.sabi.co.UK \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).