From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Darrick J. Wong" Subject: Re: e2fsck readahead speedup performance report Date: Fri, 8 Aug 2014 20:22:40 -0700 Message-ID: <20140809032240.GK11191@birch.djwong.org> References: <20140809031845.GJ11191@birch.djwong.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-ext4@vger.kernel.org To: tytso@mit.edu Return-path: Received: from aserp1040.oracle.com ([141.146.126.69]:41175 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751003AbaHIDWp (ORCPT ); Fri, 8 Aug 2014 23:22:45 -0400 Content-Disposition: inline In-Reply-To: <20140809031845.GJ11191@birch.djwong.org> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Fri, Aug 08, 2014 at 08:18:45PM -0700, Darrick J. Wong wrote: > Hi all, > > Since I this email last week, I rewrote the prefetch algorithms for pass 1 and "Since I last replied to the e2fsck readahead patch last week..." > 2 and separated thread support into a separate patch. Upon discovering that > issuing a POSIX_FADV_DONTNEED call caused a noticeable increase (of about 2-5% > points) on fsck runtime, I dropped that part out. > > In pass 1, we now walk the group descriptors looking for inode table blocks to > read until we have found enough to issue a $readahead_kb size readahead > command. The patch also computes the number of the first inode of the last > inode buffer block of the last group of the readahead group and schedules the > next readahead to occur when we reach that inode. This keeps the readahead > running at closer to full speed and eliminates conflicting IOs between the > checker thread and the readahead. > > For pass 2, readahead is broken up into $readahead_kb sized chunks instead of > issuing all of them at once. This should increase the likelihood that a block > is not evicted before pass2 tries to read it. > > Pass 4's readahead remains unchanged. > > The raw numbers from my performance evaluation of the new code live here: > https://docs.google.com/spreadsheets/d/1hTCfr30TebXcUV8HnSatNkm4OXSyP9ezbhtMbB_UuLU > > This time, I repeatedly ran e2fsck -Fnfvtt with various sizes of readahead > buffer to see how that affected fsck runtime. The run times are listed in the > table at row 22, and I've created a table at row 46 to show % reduction in > e2fsck runtime. I tried (mostly) power-of-two buffer sizes from 1MB to 1GB; as > you can see, even a small amount of readahead can speed things up quite a lot, > though the returns diminish as the buffer sizes get exponentially larger. USB > disks suffer across the board, probably due to their slow single-issue nature. > Hopefully UAS will eliminate that gap, though currently it just crashes my > machines. > > Note that all of these filesystems are formatted ext4 with an per-group inode > table size of 2MB, which is probably why readahead=2MB seems to win most often. > I think 2MB is a small enough amount that we needn't worry about thrashing > memory in the case of parallel e2fsck, particularly because with a small > readahead amount, e2fsck is most likely going to demand the blocks fairly soon > anyway. The design of the new pass1 RA code won't issue RA for a fraction of a > block group's inode table blocks, so I propose setting RA to blocksize * > inode_blocks_per_group. I forgot to mention that I'll disable RA if the buffer size is greater than 1/100th of RAM. --D > > On a lark I fired up an old ext3 filesystem to see what would happen, and the > results generally follow the ext4 results. I haven't done much digging into > ext3 though. Potentially, one could prefetch the block map blocks when reading > in another inode_buffer_block's worth of inode tables. > > Will send patches soon. > > --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html