From: Christoph Hellwig <hch@infradead.org>
To: Peter Grandi <pg_mh@sabi.co.UK>
Cc: Linux fs XFS <linux-xfs@oss.sgi.com>,
Linux fs JFS <jfs-discussion@lists.SourceForge.net>
Subject: Re: op-journaled fs, journal size and storage speeds
Date: Mon, 2 May 2011 06:40:31 -0400 [thread overview]
Message-ID: <20110502104031.GA22953@infradead.org> (raw)
In-Reply-To: <19901.41647.606112.243194@tree.ty.sabi.co.UK>
On Sun, May 01, 2011 at 07:13:03PM +0100, Peter Grandi wrote:
> > That's why you can configure an external log....
>
> ...and lose barriers :-). But indeed.
Using a writeback cache on the log device is rather pointless as
every writes needs write through semantics using FUA or a post-flush
anyway. But I actually have patch to allow for devices with
a writeback cache in external log configurations, it's just a bit
complicated as we basically need to copy the pre-flush statemachine
into XFS to deal with the preflush beeing for a different device
than the actual write.
> >> But if they can be pretty small, I wonder whether putting the
> >> journals of several filesystems on the same storage device then
> >> becomes a sensible option as the locality will be quite narrow
> >> (e.g. a single physical cylinder) or it could be wortwhile like
> >> the database people do to journal to battery-backed RAM.
>
> For example as described in this old paper:
It only makes sense if the log activity bursts for the different
filesystems happen at different times, or none of the filesystems
maxes out the log IOP rate.
> But they seem to me fundamentally terrible for journals, because
> of the large erase blocks sizes and the enormous latency of erase
> operations (lots of read-erase-write cycles for small commits).
> They seem more oriented to large mostly read-only data sets than
> very small mostly write ones.
As mentioned earlier in this thread XFS allows to align and pad
log writes. Just make sure to get a device with an erase block
size <= 256 kilobytes, which usually means SLC. But even drives
with a larger erase block size and sane firmware tend to be faster
than plain old disks. But as Dave mentioned there's nothing that's
going to beat a battery backed cache/memory for log IOP performance.
> The saving grace is the capacitor-backed RAM in SSDs (used to work
> around erase block size issues as you probably know) which to a
> significant extent may act as the battery-backed RAM I was
> mentioning; and similarly as another post says the battery-backed
> RAM in RAID host adapters would do much the same function.
Just make sure your device actually has it. Both the Intel X25 SSDs
and many other consumer / prosumer SSDs actually don't have them
and will lose data in case of a powerloss.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-05-02 10:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-30 14:51 op-journaled fs, journal size and storage speeds Peter Grandi
2011-05-01 9:27 ` Dave Chinner
2011-05-01 18:13 ` Peter Grandi
2011-05-02 1:23 ` Dave Chinner
2011-05-02 10:40 ` Christoph Hellwig [this message]
2011-05-02 4:35 ` Stan Hoeppner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110502104031.GA22953@infradead.org \
--to=hch@infradead.org \
--cc=jfs-discussion@lists.SourceForge.net \
--cc=linux-xfs@oss.sgi.com \
--cc=pg_mh@sabi.co.UK \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.