linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* e2fsck readahead speedup performance report
@ 2014-08-09  3:18 Darrick J. Wong
  2014-08-09  3:22 ` Darrick J. Wong
  2014-08-09  3:56 ` Theodore Ts'o
  0 siblings, 2 replies; 4+ messages in thread
From: Darrick J. Wong @ 2014-08-09  3:18 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

Hi all,

Since I this email last week, I rewrote the prefetch algorithms for pass 1 and
2 and separated thread support into a separate patch.  Upon discovering that
issuing a POSIX_FADV_DONTNEED call caused a noticeable increase (of about 2-5%
points) on fsck runtime, I dropped that part out.

In pass 1, we now walk the group descriptors looking for inode table blocks to
read until we have found enough to issue a $readahead_kb size readahead
command.  The patch also computes the number of the first inode of the last
inode buffer block of the last group of the readahead group and schedules the
next readahead to occur when we reach that inode.  This keeps the readahead
running at closer to full speed and eliminates conflicting IOs between the
checker thread and the readahead.

For pass 2, readahead is broken up into $readahead_kb sized chunks instead of
issuing all of them at once.  This should increase the likelihood that a block
is not evicted before pass2 tries to read it.

Pass 4's readahead remains unchanged.

The raw numbers from my performance evaluation of the new code live here:
https://docs.google.com/spreadsheets/d/1hTCfr30TebXcUV8HnSatNkm4OXSyP9ezbhtMbB_UuLU

This time, I repeatedly ran e2fsck -Fnfvtt with various sizes of readahead
buffer to see how that affected fsck runtime.  The run times are listed in the
table at row 22, and I've created a table at row 46 to show % reduction in
e2fsck runtime.  I tried (mostly) power-of-two buffer sizes from 1MB to 1GB; as
you can see, even a small amount of readahead can speed things up quite a lot,
though the returns diminish as the buffer sizes get exponentially larger.  USB
disks suffer across the board, probably due to their slow single-issue nature.
Hopefully UAS will eliminate that gap, though currently it just crashes my
machines.

Note that all of these filesystems are formatted ext4 with an per-group inode
table size of 2MB, which is probably why readahead=2MB seems to win most often.
I think 2MB is a small enough amount that we needn't worry about thrashing
memory in the case of parallel e2fsck, particularly because with a small
readahead amount, e2fsck is most likely going to demand the blocks fairly soon
anyway.  The design of the new pass1 RA code won't issue RA for a fraction of a
block group's inode table blocks, so I propose setting RA to blocksize *
inode_blocks_per_group.

On a lark I fired up an old ext3 filesystem to see what would happen, and the
results generally follow the ext4 results.  I haven't done much digging into
ext3 though.  Potentially, one could prefetch the block map blocks when reading
in another inode_buffer_block's worth of inode tables.

Will send patches soon.

--D

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: e2fsck readahead speedup performance report
  2014-08-09  3:18 e2fsck readahead speedup performance report Darrick J. Wong
@ 2014-08-09  3:22 ` Darrick J. Wong
  2014-08-09  3:56 ` Theodore Ts'o
  1 sibling, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2014-08-09  3:22 UTC (permalink / raw)
  To: tytso; +Cc: linux-ext4

On Fri, Aug 08, 2014 at 08:18:45PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> Since I this email last week, I rewrote the prefetch algorithms for pass 1 and

"Since I last replied to the e2fsck readahead patch last week..."

> 2 and separated thread support into a separate patch.  Upon discovering that
> issuing a POSIX_FADV_DONTNEED call caused a noticeable increase (of about 2-5%
> points) on fsck runtime, I dropped that part out.
> 
> In pass 1, we now walk the group descriptors looking for inode table blocks to
> read until we have found enough to issue a $readahead_kb size readahead
> command.  The patch also computes the number of the first inode of the last
> inode buffer block of the last group of the readahead group and schedules the
> next readahead to occur when we reach that inode.  This keeps the readahead
> running at closer to full speed and eliminates conflicting IOs between the
> checker thread and the readahead.
> 
> For pass 2, readahead is broken up into $readahead_kb sized chunks instead of
> issuing all of them at once.  This should increase the likelihood that a block
> is not evicted before pass2 tries to read it.
> 
> Pass 4's readahead remains unchanged.
> 
> The raw numbers from my performance evaluation of the new code live here:
> https://docs.google.com/spreadsheets/d/1hTCfr30TebXcUV8HnSatNkm4OXSyP9ezbhtMbB_UuLU
> 
> This time, I repeatedly ran e2fsck -Fnfvtt with various sizes of readahead
> buffer to see how that affected fsck runtime.  The run times are listed in the
> table at row 22, and I've created a table at row 46 to show % reduction in
> e2fsck runtime.  I tried (mostly) power-of-two buffer sizes from 1MB to 1GB; as
> you can see, even a small amount of readahead can speed things up quite a lot,
> though the returns diminish as the buffer sizes get exponentially larger.  USB
> disks suffer across the board, probably due to their slow single-issue nature.
> Hopefully UAS will eliminate that gap, though currently it just crashes my
> machines.
> 
> Note that all of these filesystems are formatted ext4 with an per-group inode
> table size of 2MB, which is probably why readahead=2MB seems to win most often.
> I think 2MB is a small enough amount that we needn't worry about thrashing
> memory in the case of parallel e2fsck, particularly because with a small
> readahead amount, e2fsck is most likely going to demand the blocks fairly soon
> anyway.  The design of the new pass1 RA code won't issue RA for a fraction of a
> block group's inode table blocks, so I propose setting RA to blocksize *
> inode_blocks_per_group.

I forgot to mention that I'll disable RA if the buffer size is greater than
1/100th of RAM.

--D
> 
> On a lark I fired up an old ext3 filesystem to see what would happen, and the
> results generally follow the ext4 results.  I haven't done much digging into
> ext3 though.  Potentially, one could prefetch the block map blocks when reading
> in another inode_buffer_block's worth of inode tables.
> 
> Will send patches soon.
> 
> --D
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: e2fsck readahead speedup performance report
  2014-08-09  3:18 e2fsck readahead speedup performance report Darrick J. Wong
  2014-08-09  3:22 ` Darrick J. Wong
@ 2014-08-09  3:56 ` Theodore Ts'o
  2014-08-09  4:06   ` Darrick J. Wong
  1 sibling, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2014-08-09  3:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-ext4

Interesting results!

I noticed that the 1TB SSD did seem to suffer when you went from
multi-threaded to single-threaded.  Was this a SATA-attached or
USB-attached SSD?  And any insights about why the SSD seemed to
require threading for better performance when using readaead?

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: e2fsck readahead speedup performance report
  2014-08-09  3:56 ` Theodore Ts'o
@ 2014-08-09  4:06   ` Darrick J. Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2014-08-09  4:06 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Fri, Aug 08, 2014 at 11:56:46PM -0400, Theodore Ts'o wrote:
> Interesting results!
> 
> I noticed that the 1TB SSD did seem to suffer when you went from
> multi-threaded to single-threaded.  Was this a SATA-attached or
> USB-attached SSD?  And any insights about why the SSD seemed to
> require threading for better performance when using readaead?

PCIE, and it might simply be having issues. :/

One thing I haven't looked into is how exactly the kernel maps IO
requests to queue slots -- does each CPU get its own pile of slots to
use up?  I _think_ it does, but it's been a few months since I poked
at mq.  Hmm... max_sectors_kb=128, which isn't unusually odd.  Guess
I'll keep digging.

The other disks seems fairly normal, at least.

--D
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-08-09  4:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-09  3:18 e2fsck readahead speedup performance report Darrick J. Wong
2014-08-09  3:22 ` Darrick J. Wong
2014-08-09  3:56 ` Theodore Ts'o
2014-08-09  4:06   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).