From mboxrd@z Thu Jan 1 00:00:00 1970 From: Theodore Ts'o Subject: Re: [PATCH 04/35] e2fsck: read-ahead metadata during passes 1, 2, and 4 Date: Mon, 20 Apr 2015 23:03:52 -0400 Message-ID: <20150421030352.GE3238@thunk.org> References: <20150402023359.25243.79782.stgit@birch.djwong.org> <20150402023427.25243.66810.stgit@birch.djwong.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-ext4@vger.kernel.org To: "Darrick J. Wong" Return-path: Received: from imap.thunk.org ([74.207.234.97]:49225 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751309AbbDUDDx (ORCPT ); Mon, 20 Apr 2015 23:03:53 -0400 Content-Disposition: inline In-Reply-To: <20150402023427.25243.66810.stgit@birch.djwong.org> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Wed, Apr 01, 2015 at 07:34:27PM -0700, Darrick J. Wong wrote: > e2fsck pass1 is modified to use the block group data prefetch function > to try to fetch the inode tables into the pagecache before it is > needed. We iterate through the blockgroups until we have enough inode > tables that need reading such that we can issue readahead; then we sit > and wait until the last inode table block read of the last group to > start fetching the next bunch. > > pass2 is modified to use the dirblock prefetching function to prefetch > the list of directory blocks that are assembled in pass1. We use the > "iterate a subset of a dblist" and avoid copying the dblist. Directory > blocks are fetched incrementally as we walk through the directory > block list. In previous iterations of this patch we would free the > directory blocks after processing, but the performance hit to e2fsck > itself wasn't worth it. Furthermore, it is anticipated that most > users will then mount the FS and start using the directories, so they > may as well remain in the page cache. > > pass4 is modified to prefetch the block and inode bitmaps in > anticipation of pass 5, because pass4 is entirely CPU bound. > > In general, these mechanisms can decrease fsck time by 10-40%, if the > host system has sufficient memory and the storage system can provide a > lot of IOPs. Pretty much any storage system capable of handling > multiple IOs in-flight at any time will see a fairly large performance > boost. (Single-issue USB mass storage disks seem to suffer badly.) > > By default, the readahead buffer size will be set to the size of a block > group's inode table (which is 2MiB for a regular ext4 FS). The -E > readahead_kb= option can be given to specify the amount of memory to > use for readahead or zero to disable it entirely; or an option can be > given in e2fsck.conf. > > v2: Fix an off-by-one error in the pass1 readahead which made the > readahead trigger one inode too late if the block groups are full. > > v3: Use the dblist partial iterator function to read ahead parts > of the directory block list in pass 2, instead of making sublists. > > Signed-off-by: Darrick J. Wong Thanks, applied. - Ted