From: Christoph Hellwig <hch@infradead.org>
To: Peter Grandi <pg_mh@sabi.co.UK>
Cc: Linux fs XFS <linux-xfs@oss.sgi.com>,
Linux fs JFS <jfs-discussion@lists.SourceForge.net>
Subject: Re: op-journaled fs, journal size and storage speeds
Date: Mon, 2 May 2011 06:40:31 -0400 [thread overview]
Message-ID: <20110502104031.GA22953@infradead.org> (raw)
In-Reply-To: <19901.41647.606112.243194@tree.ty.sabi.co.UK>
On Sun, May 01, 2011 at 07:13:03PM +0100, Peter Grandi wrote:
> > That's why you can configure an external log....
>
> ...and lose barriers :-). But indeed.
Using a writeback cache on the log device is rather pointless as
every writes needs write through semantics using FUA or a post-flush
anyway. But I actually have patch to allow for devices with
a writeback cache in external log configurations, it's just a bit
complicated as we basically need to copy the pre-flush statemachine
into XFS to deal with the preflush beeing for a different device
than the actual write.
> >> But if they can be pretty small, I wonder whether putting the
> >> journals of several filesystems on the same storage device then
> >> becomes a sensible option as the locality will be quite narrow
> >> (e.g. a single physical cylinder) or it could be wortwhile like
> >> the database people do to journal to battery-backed RAM.
>
> For example as described in this old paper:
It only makes sense if the log activity bursts for the different
filesystems happen at different times, or none of the filesystems
maxes out the log IOP rate.
> But they seem to me fundamentally terrible for journals, because
> of the large erase blocks sizes and the enormous latency of erase
> operations (lots of read-erase-write cycles for small commits).
> They seem more oriented to large mostly read-only data sets than
> very small mostly write ones.
As mentioned earlier in this thread XFS allows to align and pad
log writes. Just make sure to get a device with an erase block
size <= 256 kilobytes, which usually means SLC. But even drives
with a larger erase block size and sane firmware tend to be faster
than plain old disks. But as Dave mentioned there's nothing that's
going to beat a battery backed cache/memory for log IOP performance.
> The saving grace is the capacitor-backed RAM in SSDs (used to work
> around erase block size issues as you probably know) which to a
> significant extent may act as the battery-backed RAM I was
> mentioning; and similarly as another post says the battery-backed
> RAM in RAID host adapters would do much the same function.
Just make sure your device actually has it. Both the Intel X25 SSDs
and many other consumer / prosumer SSDs actually don't have them
and will lose data in case of a powerloss.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-05-02 10:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-30 14:51 op-journaled fs, journal size and storage speeds Peter Grandi
2011-05-01 9:27 ` Dave Chinner
2011-05-01 18:13 ` Peter Grandi
2011-05-02 1:23 ` Dave Chinner
2011-05-02 10:40 ` Christoph Hellwig [this message]
2011-05-02 4:35 ` Stan Hoeppner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110502104031.GA22953@infradead.org \
--to=hch@infradead.org \
--cc=jfs-discussion@lists.SourceForge.net \
--cc=linux-xfs@oss.sgi.com \
--cc=pg_mh@sabi.co.UK \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox