From: Ralf Liebenow <ralf@theco.de>
To: xfs@oss.sgi.com
Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
Date: Thu, 19 Feb 2009 00:09:58 +0100 [thread overview]
Message-ID: <20090218230958.GA6506@theco.de> (raw)
In-Reply-To: <B50173E3-7975-4A71-903A-A76D910CBB3A@mailcan.com>
Hello !
> Correct ordering can be proven to be enough to provide transactional
> correctness, enough to ensure that filesystems can not get corrupted
> on power down.
Please beware that caching RAID controllers which are not battery
backed and the harddisk (when write caching) may decide to
re-order writes to the disk, so the ordering imposed by the
operating system (filesystem driver) may not be retained.
This is usually done by harddisks and
controllers to minimize seek times and thats what disk
command queueing is good for. So ordering can only be retained
if all external caching mechanism and command queueing are
switched off. Otherwise you need to have something like fsync
points (barriers ?) to have consistent checkpoints you can
rollback to ...
So the answer has many variables:
do you have a persistent (battery backed) write cache ?
Yes -> you can go with nobarriers if you can make sure
that the harddisk cache is off, if the
filesystem does proper write ordering.
No -> if you switch off the disks cache, you _may_
switch off barriers, when the filesystem driver
uses properly placed write ordering
-> if you have disk write caching on, you are on
your own when power goes down and you dont
use barriers ... you maybe lucky or not ...
But to make that clear: its only a problem
when power is failing ... its not a problem
when the machine crashes ... the disks will
eventually write down their caches then.
So if your system is somewhere connected with
a redundant power supply and failsave power
supply sytems (as this is the case for most
data centers) you can probably live with
disk write caching on and nobarriers, if the
filesystem driver does order its writes
properly ....
So I have one open question left: does xfs do proper
(transactional) ordering when barriers are off ? Im using
xfs for years now and had many machine crashes (not
power failures) without xfs get corrupted (and that was
before 2.6.17 ... and therefore without barrier support).
So I assume it always does proper ordering and barrier
support is only making "fsynced" checkpoints in time.
Am I right ?
Ralf
> Hello,
>
> On 15 dec 2008, at 23:50, Peter Grandi wrote:
>
> >[ ... ]
> >
> >>>The purpose of barriers is to guarantee that relevant data is
> >>>known to be on persistent storage (kind of hardware 'fsync').
> >>>
> >
> >>[ ... ] Unfortunately in my understanding none of this is
> >>reflected by Documentation/block/barrier.txt
> >
> >But we are talking about XFS and barriers here. That described
> >just a (flawed, buggy) mechanism to implement those. Consider
> >for example:
> >
> > http://www.xfs.org/index.php/XFS_FAQ#Write_barrier_support.
> > http://www.xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F
> >
> >In any case as to the kernel "barrier" mechanism, its
> >description is misleading because it heavily fixates on the
> >ordering issue, which is just a consequence, but yet mentions
> >the far more important "flush/sync" aspect.
> >
> >Still, there is a lot of confusion about barrier support and
> >what it means at which level, as reflected in several online
> >discussions and the different behaviour of different kernel
> >versions.
> >
> The semantics of a barrier are whatever semantics we describe to it.
> So we can continue to be confused about it.
>
> I strongly disagree on the ordering issue being a side effect.
>
> Correct ordering can be proven to be enough to provide transactional
> correctness, enough to ensure that filesystems can not get corrupted
> on power down.
>
> Using barriers to guarantee that (all submitted) write requests
> (before the barrier) made it to the medium are a stronger predicate.
>
> The Linux approach and documentation talks about the first type of
> semantics (which I rather like for them being strong enough and not
> more).
>
> Regards,
>
> Leon
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
--
theCode AG
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0 fax -10
ralf@theCo.de http://www.theCo.de
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2009-02-18 23:10 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-06 14:28 12x performance drop on md/linux+sw raid1 due to barriers [xfs] Justin Piszcz
2008-12-06 15:36 ` Eric Sandeen
2008-12-06 20:35 ` Redeeman
2008-12-13 12:54 ` Justin Piszcz
2008-12-13 17:26 ` Martin Steigerwald
2008-12-13 17:40 ` Eric Sandeen
2008-12-14 3:31 ` Redeeman
2008-12-14 14:02 ` Peter Grandi
2008-12-14 18:12 ` Martin Steigerwald
2008-12-14 22:02 ` Peter Grandi
2008-12-15 18:48 ` Martin Steigerwald
2008-12-15 22:50 ` Peter Grandi
2009-02-18 22:14 ` Leon Woestenberg
2009-02-18 22:24 ` Eric Sandeen
2009-02-18 23:09 ` Ralf Liebenow [this message]
2009-02-18 23:19 ` Eric Sandeen
2009-02-20 19:19 ` Peter Grandi
2008-12-15 22:38 ` Dave Chinner
2008-12-16 9:39 ` Martin Steigerwald
2008-12-16 20:57 ` Peter Grandi
2008-12-16 23:14 ` Dave Chinner
2008-12-17 21:40 ` Bill Davidsen
2008-12-18 8:20 ` Leon Woestenberg
2008-12-18 23:33 ` Bill Davidsen
2008-12-21 19:16 ` Peter Grandi
2008-12-22 13:19 ` Leon Woestenberg
2008-12-18 22:26 ` Dave Chinner
2008-12-14 18:35 ` Martin Steigerwald
2008-12-14 17:49 ` Martin Steigerwald
2008-12-14 23:36 ` Dave Chinner
2008-12-14 23:55 ` Eric Sandeen
2008-12-13 18:01 ` David Lethe
2008-12-06 18:42 ` Peter Grandi
2008-12-11 0:20 ` Bill Davidsen
2008-12-11 9:18 ` Justin Piszcz
2008-12-11 9:24 ` Justin Piszcz
-- strict thread matches above, loose matches on Subject: below --
2008-12-14 18:33 Martin Steigerwald
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090218230958.GA6506@theco.de \
--to=ralf@theco.de \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox