From: Leon Woestenberg <leonw@mailcan.com>
To: Bill Davidsen <davidsen@tmr.com>
Cc: Linux RAID <linux-raid@vger.kernel.org>,
Peter Grandi <pg_xf2@xf2.for.sabi.co.UK>,
Linux XFS <xfs@oss.sgi.com>
Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
Date: Thu, 18 Dec 2008 09:20:10 +0100 [thread overview]
Message-ID: <494A07BA.1080008@mailcan.com> (raw)
In-Reply-To: <494971B2.1000103@tmr.com>
Hello all,
Bill Davidsen wrote:
> Peter Grandi wrote:
>
>> Unfortunately that seems the case.
>>
>> The purpose of barriers is to guarantee that relevant data is
>> known to be on persistent storage (kind of hardware 'fsync').
>>
>> In effect write barrier means "tell me when relevant data is on
>> persistent storage", or less precisely "flush/sync writes now
>> and tell me when it is done". Properties as to ordering are just
>> a side effect.
>>
>>
>
> I don't get that sense from the barriers stuff in Documentation, in fact
> I think it's essentially a pure ordering thing, I don't even see that it
> has an effect of forcing the data to be written to the device, other
> than by preventing other writes until the drive writes everything. So we
> read the intended use differently.
>
> What really bothers me is that there's no obvious need for barriers at
> the device level if the file system is just a bit smarter and does it's
> own async io (like aio_*), because you can track writes outstanding on a
> per-fd basis, so instead of stopping the flow of data to the drive, you
> can just block a file descriptor and wait for the count of outstanding
> i/o to drop to zero. That provides the order semantics of barriers as
> far as I can see, having tirelessly thought about it for ten minutes or
> so. Oh, and did something very similar decades ago in a long-gone
> mainframe OS.
>
Did that mainframe OS have re-ordering devices? If it did, you'ld still
need barriers all the way down:
The drive itself may still re-order writes, thus can cause corruption if
halfway the power goes down.
From my understanding, disabling write-caches simply forces the drive
to operate in-order.
Barriers need to travel all the way down to the point where-after
everything remains in-order.
Devices with write-cache enabled will still re-order, but not across
barriers (which are implemented as
either a single cache flush with forced unit access, or a double cache
flush around the barrier write).
Whether the data has made it to the drive platters is not really
important from a barrier point of view, however,
iff part of the data made it to the platters, then we want to be sure it
was in-order.
Because only in this way can we ensure that the data that is on the
platters is consistent.
Regards,
Leon.
[[HTML alternate version deleted]]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2008-12-18 8:20 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-06 14:28 12x performance drop on md/linux+sw raid1 due to barriers [xfs] Justin Piszcz
2008-12-06 14:28 ` Justin Piszcz
2008-12-06 15:36 ` Eric Sandeen
2008-12-06 20:35 ` Redeeman
2008-12-06 20:35 ` Redeeman
2008-12-13 12:54 ` Justin Piszcz
2008-12-13 12:54 ` Justin Piszcz
2008-12-13 17:26 ` Martin Steigerwald
2008-12-13 17:26 ` Martin Steigerwald
2008-12-13 17:40 ` Eric Sandeen
2008-12-13 17:40 ` Eric Sandeen
2008-12-14 3:31 ` Redeeman
2008-12-14 3:31 ` Redeeman
2008-12-14 14:02 ` Peter Grandi
2008-12-14 14:02 ` Peter Grandi
2008-12-14 18:12 ` Martin Steigerwald
2008-12-14 18:12 ` Martin Steigerwald
2008-12-14 22:02 ` Peter Grandi
2008-12-14 22:02 ` Peter Grandi
2008-12-15 18:48 ` Martin Steigerwald
2008-12-15 22:50 ` Peter Grandi
2009-02-18 22:14 ` Leon Woestenberg
2009-02-18 22:24 ` Eric Sandeen
2009-02-18 23:09 ` Ralf Liebenow
2009-02-18 23:19 ` Eric Sandeen
2009-02-20 19:19 ` Peter Grandi
2008-12-15 22:38 ` Dave Chinner
2008-12-15 22:38 ` Dave Chinner
2008-12-16 9:39 ` Martin Steigerwald
2008-12-16 9:39 ` Martin Steigerwald
2008-12-16 20:57 ` Peter Grandi
2008-12-16 23:14 ` Dave Chinner
2008-12-16 23:14 ` Dave Chinner
2008-12-17 21:40 ` Bill Davidsen
2008-12-17 21:40 ` Bill Davidsen
2008-12-18 8:20 ` Leon Woestenberg [this message]
2008-12-18 23:33 ` Bill Davidsen
2008-12-21 19:16 ` Peter Grandi
2008-12-22 13:19 ` Leon Woestenberg
2008-12-22 13:19 ` Leon Woestenberg
2008-12-18 22:26 ` Dave Chinner
2008-12-18 22:26 ` Dave Chinner
2008-12-20 14:06 ` Peter Grandi
2008-12-14 18:35 ` Martin Steigerwald
2008-12-14 18:35 ` Martin Steigerwald
2008-12-14 17:49 ` Martin Steigerwald
2008-12-14 17:49 ` Martin Steigerwald
2008-12-14 23:36 ` Dave Chinner
2008-12-14 23:36 ` Dave Chinner
2008-12-14 23:55 ` Eric Sandeen
2008-12-13 18:01 ` David Lethe
2008-12-13 18:01 ` David Lethe
2008-12-06 18:42 ` Peter Grandi
2008-12-11 0:20 ` Bill Davidsen
2008-12-11 0:20 ` Bill Davidsen
2008-12-11 9:18 ` Justin Piszcz
2008-12-11 9:18 ` Justin Piszcz
2008-12-11 9:24 ` Justin Piszcz
2008-12-11 9:24 ` Justin Piszcz
-- strict thread matches above, loose matches on Subject: below --
2008-12-14 18:33 Martin Steigerwald
2008-12-14 18:33 ` Martin Steigerwald
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=494A07BA.1080008@mailcan.com \
--to=leonw@mailcan.com \
--cc=davidsen@tmr.com \
--cc=linux-raid@vger.kernel.org \
--cc=pg_xf2@xf2.for.sabi.co.UK \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.