* Re: FAST paper on ffsck
[not found] <20131209180149.GA6096@thunk.org>
@ 2013-12-12 5:30 ` Dave Chinner
2014-01-29 19:45 ` Darrick J. Wong
0 siblings, 1 reply; 2+ messages in thread
From: Dave Chinner @ 2013-12-12 5:30 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: linux-ext4, xfs
On Mon, Dec 09, 2013 at 01:01:49PM -0500, Theodore Ts'o wrote:
> Andreas brought up on today's conference call Kirk McKusick's recent
> changes[1] to try to improve fsck times for FFS, in response to the
> recent FAST paper covering fsck speed ups for ext3, "ffsck: The Fast
> Filesystem Checker"[2]
>
> [1] http://www.mckusick.com/publications/faster_fsck.pdf
> [2] https://www.usenix.org/system/files/conference/fast13/fast13-final52_0.pdf
Interesting - it's all about trying to lay out data to get
sequential disk access patterns during scanning (i.e. minimise disk
seeks) to reduce fsck runtime. Fine in principle, but I think that
it's a dead end you don't want to go down.
Why? Because it's the exact opposite of what you need for SSD based
filesystems. What fsck really needs is to be able to saturate the
IOPS capability of the underlying device rather than optimising for
bandwidth, and that means driving deep IO queue depths.
e.g I've dropped xfs_repair times on a 100TB test filesystem with 50
million inodes from 25 minutes to 5 minutes simply by adding gobs of
additional concurrency and ignoring sequential IO optimisations.
It's driving bandwidth rates of 200-250MB/s simply due to the IOPS
rate it is acheiving, not because I'm optimising IO patterns for
sequential IO.
In fact, it dispatches so much IO now that the limitation is not the
60,000 IOPS that it is pulling from the underlying SSDs, but
mmap_sem contention caused by 30-odd threads doing concurrent memory
allocation to cache and store all the information that is being read
from disk...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: FAST paper on ffsck
2013-12-12 5:30 ` FAST paper on ffsck Dave Chinner
@ 2014-01-29 19:45 ` Darrick J. Wong
0 siblings, 0 replies; 2+ messages in thread
From: Darrick J. Wong @ 2014-01-29 19:45 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-ext4, Theodore Ts'o, xfs
On Thu, Dec 12, 2013 at 04:30:47PM +1100, Dave Chinner wrote:
> On Mon, Dec 09, 2013 at 01:01:49PM -0500, Theodore Ts'o wrote:
> > Andreas brought up on today's conference call Kirk McKusick's recent
> > changes[1] to try to improve fsck times for FFS, in response to the
> > recent FAST paper covering fsck speed ups for ext3, "ffsck: The Fast
> > Filesystem Checker"[2]
> >
> > [1] http://www.mckusick.com/publications/faster_fsck.pdf
> > [2] https://www.usenix.org/system/files/conference/fast13/fast13-final52_0.pdf
>
> Interesting - it's all about trying to lay out data to get
> sequential disk access patterns during scanning (i.e. minimise disk
> seeks) to reduce fsck runtime. Fine in principle, but I think that
> it's a dead end you don't want to go down.
>
> Why? Because it's the exact opposite of what you need for SSD based
> filesystems. What fsck really needs is to be able to saturate the
> IOPS capability of the underlying device rather than optimising for
> bandwidth, and that means driving deep IO queue depths.
>
> e.g I've dropped xfs_repair times on a 100TB test filesystem with 50
> million inodes from 25 minutes to 5 minutes simply by adding gobs of
> additional concurrency and ignoring sequential IO optimisations.
> It's driving bandwidth rates of 200-250MB/s simply due to the IOPS
> rate it is acheiving, not because I'm optimising IO patterns for
> sequential IO.
>
> In fact, it dispatches so much IO now that the limitation is not the
> 60,000 IOPS that it is pulling from the underlying SSDs, but
> mmap_sem contention caused by 30-odd threads doing concurrent memory
> allocation to cache and store all the information that is being read
> from disk...
I've created a couple of experimental patches to speed up e2fsck. The first
patch creates a new IO manager that mmap()s the device and simply memcpy()s
buffers in and out to do IO. The second patch spawns a bunch of threads that
split up the work of scanning each block group in the hopes of faulting in all
the metadata off the disk ahead of the main e2fsck thread.
The upside is that on a cold system, the patches reduce e2fsck running time on
HDD RAIDs and SSDs by 30-40%. On a warm system there's not much advantage.
The downside is that fsck tends to crash when it writes anything out. I've
been meaning to send this out after I fix the write crash, but I've been
occupied with other things at work. :/
There's also a horrible case where on a disk where can_queue = 1, the disk
mostly just thrashes like mad and takes several times longer than regular fsck.
I could be wrong about that; I don't know if it's really can_queue = 1 or
simply having only one disk head that's the cause.
<shrug> I'll clean 'em up and send an RFC.
--D
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-01-29 19:45 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20131209180149.GA6096@thunk.org>
2013-12-12 5:30 ` FAST paper on ffsck Dave Chinner
2014-01-29 19:45 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox