From: Dave Chinner <david@fromorbit.com>
To: Peter Cordes <peter@cordes.ca>
Cc: xfs@oss.sgi.com
Subject: Re: RAID5/6 writes
Date: Thu, 2 Oct 2008 10:32:51 +1000 [thread overview]
Message-ID: <20081002003251.GA30001@disturbed> (raw)
In-Reply-To: <20081001175237.GJ32037@cordes.ca>
On Wed, Oct 01, 2008 at 02:52:37PM -0300, Peter Cordes wrote:
> I just had an idea for speeding up writes to parity-based RAIDs
> (RAID4,5,6).[1] If XFS wants to write sectors 1,2,3, 5,6,7, but it
> knows that block 4 is free space, it might be better to write sector 4
> (with zeros, don't put uninitialized kernel memory on disk!).
How does XFS know that block 4 is free space? Or indeed that this is
a single block sized hole in range of blocks mapped to different inodes
or filesystem metadata?
If you want something like this, you need to have the lower layer
discover holes like this and instead of immediately initiating
a RMW cycle, it calls back to the filesystem to determine is hole
is free space. That works for all filesystems not just XFS.
> It's
> probably only useful to do this if XFS has data in memory to prove
> that the gap is not part of the filesystem. Doing extra reads
> probably doesn't make sense except in very special cases. (e.g.
> repeated writes to the same location with the same hole, so just one
> read would let them all become full-block or even full-stripe writes.)
That's the sort of workload the stripe cache is supposed to optimise;
every subsequent sparse write to the same stripe line avoids the
read part of the RMW cycle. The filesystem is the wrong layer to
optimise this type of workload....
FWIW, XFS has it's own problems with writeback triggering RMW
cycles - this sort of thing for data could be considered noise
compared to the RMW storm that can be caused by inode writeback
under memory pressure as XFS has to do RMW cycles itself on the
inode cluster buffers. See the Inode Writeback section of this
document:
http://oss.sgi.com/archives/xfs/2008-09/msg00289.html
This can only be fixed at the filesystem level because no amount of
tweaking the storage can improve the I/O patterns that XFS is
issuing. These RMW cycles in inode writeback can cause the inode
flush rate to drop to a few tens of inodes per second. When you have
hundreds of thousands of dirty inodes in a system, it can take
*hours* to flush the dirty inodes to disk....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2008-10-02 0:33 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-01 17:52 RAID5/6 writes Peter Cordes
2008-10-01 19:36 ` Andi Kleen
2008-10-01 20:13 ` Peter Cordes
2008-10-01 20:44 ` Andi Kleen
2008-10-01 21:01 ` Peter Cordes
2008-10-02 0:32 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081002003251.GA30001@disturbed \
--to=david@fromorbit.com \
--cc=peter@cordes.ca \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox