From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Tue, 19 Aug 2008 16:11:40 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m7JNB9Ep008477 for ; Tue, 19 Aug 2008 16:11:10 -0700 Message-ID: <48AB5335.4030900@sgi.com> Date: Wed, 20 Aug 2008 09:11:49 +1000 From: Mark Goodwin Reply-To: markgw@sgi.com MIME-Version: 1.0 Subject: Re: [PATCH 0/28] XFS: sync and reclaim rework References: <1219151804-30749-1-git-send-email-david@fromorbit.com> In-Reply-To: <1219151804-30749-1-git-send-email-david@fromorbit.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Dave Chinner Cc: xfs-oss Thanks Dave - this is queued behind the btree factoring series; both need to be QA'd and stress/perf tested independently, which leads me to ask: how much QA has the sync/reclaim rework received? (and could you make use of the machine Christoph has been using, since you're in non-overlapping TZs?) Also, if we were to change the inode / block offset direct mapping to an indirect method (e.g. inode32+), would this grossly affect the ascending inode number traversal optimization? If so, could that mechanism be made conditional? Thanks -- Mark Dave Chinner wrote: > Multiple patch sets, all in one patch bomb against a current > git tree. This includes all outstanding patches I have previously > sent that are not committed plus a bunch more... > > --- > > XFS: replace the mount inode list with radix tree traversals V4 > > The list of all inodes on a mount is superfluous. We can traverse > all inodes now by walking the per-AG inode radix trees without > needing a separate list. This enables us to remove a bunch of > complex list traversal code and remove another two pointers from > the xfs_inode. > > Also, by replacing the sync traversal with an ascending inode > number traversal, we will issue better inode I/O patterns for > writeback triggered by xfssyncd or unmount. > > Before we make this change, move all the relevant sync code > into it's own file in the linux-2.6/ directory. This aggregates > VFS specific sync interfacing in the one file and will allow > all the subsequent change history to be associated with this > file so it is easy to find in future. > > Version 4: > o revert xfs_syncsub -> xfs_sync change in xfs_quiesce_fs and > rediff patch series > > --- > > XFS: clean up sync code > > xfs_sync and xfs_syncsub are multiplexed interfaces that > shares relatively little code between callers. because it is > a multiplexed interface, it's hard to tell what is executed > in each context it is called. > > Factor out the sync code and explicitly call the sync functions > needed rather than the multiplexed interfaces. Once this is > done, we can remove xfs_syncsub and xfs_sync altogether. > > --- > > RFC: Combine Linux and XFS inodes V2 > > XFS currently has to deal with two separate inode lifecycles > which makes for complexity in inode lookups and reclaim. We > also have the problem of not always having a linux inode around > when it might be useful to have it. > > To avoid these lifecycle problems, this series embedѕ the linux > inode inside the struct xfs_inode and changes the way we reference > to two inodes. We can no longer check for a null linux inode - > instead we have to check to see if it is valid or not by checking > either the linux inode or xfs inode state flags. While this means > that inodes waiting for reclaim use more memory, this is not the > commonn state for inodes and the will soon be completely freed so > the additional memeory use in this state is only a temporary issue. > > This combining of the inodes simplifies the inode and reclaim logic, > making it possible to do reclaim via radix tree tags (an upcoming > patch series) and to be able to use RCU locking on the radix trees. > The fact that we don't have a simple mechanism to determine the > reclaim state of the inode makes RCU locking very complex, and this > complexity is removed by having a combined inode structure. > > This patch series also changes the way XFS caches inodes. It no > longer uses the linux inode cache as the primary lookup cache - > instead we rely solely on the XFS inode caches. This avoids the > inode_lock in lookups that hit the cache - we should get much > better parallelism out of inode lookup than we currently do now. > > The patch series also makes use of the slab 'init once' feature > for the XFS inodes. This means we only need to do partial > initialisation of the xfs (and embedded linux inode) whenever > we allocate a new inode. > > In future, we should also be able to cull duplicate fields out of > the xfs and linux inodes reducing the overall memory usage of > the active inode cache. This provides scope for continuing to > reduce the memory footprint of the XFS inode cache. > > Version 2 > o reorder and rework as a result of review comments. > > --- > > XFS: Track reclaimable inodes in inode cache. > > Move the tracking of reclaimable inodes > into the inode radix trees. This currently does not replace > the reclaim flags in the inode, rather it allows traversal of > all reclaimable inodes by walking the per-AG inode radix trees without needing > a separate list. This enables us to remove a list and a lock to > remove a point of serialisation during inode reclaim. > > > Like the matching sync code, this also allows reclaim of inodes > in ascending inode numbers which substantially improves I/O > patterns during reclaim driven inode flushing. > > --- > > Combined diffstat: > > fs/inode.c | 205 ++++++---- > fs/xfs/Makefile | 1 > fs/xfs/linux-2.6/xfs_aops.c | 2 > fs/xfs/linux-2.6/xfs_iops.c | 19 > fs/xfs/linux-2.6/xfs_super.c | 265 +++---------- > fs/xfs/linux-2.6/xfs_super.h | 3 > fs/xfs/linux-2.6/xfs_sync.c | 780 +++++++++++++++++++++++++++++++++++++++++ > fs/xfs/linux-2.6/xfs_sync.h | 55 ++ > fs/xfs/linux-2.6/xfs_vfs.h | 31 - > fs/xfs/linux-2.6/xfs_vnode.c | 6 > fs/xfs/linux-2.6/xfs_vnode.h | 5 > fs/xfs/quota/xfs_qm.c | 10 > fs/xfs/quota/xfs_qm_syscalls.c | 137 +++---- > fs/xfs/xfs_ag.h | 5 > fs/xfs/xfs_iget.c | 473 +++++++++--------------- > fs/xfs/xfs_inode.c | 140 ++++--- > fs/xfs/xfs_inode.h | 22 - > fs/xfs/xfs_itable.c | 14 > fs/xfs/xfs_mount.c | 8 > fs/xfs/xfs_mount.h | 12 > fs/xfs/xfs_vfsops.c | 617 -------------------------------- > fs/xfs/xfs_vfsops.h | 2 > fs/xfs/xfs_vnodeops.c | 118 ------ > include/linux/fs.h | 2 > 24 files changed, 1391 insertions(+), 1541 deletions(-) > > > -- Mark Goodwin markgw@sgi.com Engineering Manager for XFS and PCP Phone: +61-3-99631937 SGI Australian Software Group Cell: +61-4-18969583 -------------------------------------------------------------