From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Thu, 18 Sep 2008 01:50:37 -0700 (PDT) Received: from relay.sgi.com (relay1.corp.sgi.com [192.26.58.214]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m8I8oSTs028826 for ; Thu, 18 Sep 2008 01:50:28 -0700 Message-ID: <48D218AE.9090400@sgi.com> Date: Thu, 18 Sep 2008 19:00:30 +1000 From: Lachlan McIlroy Reply-To: lachlan@sgi.com MIME-Version: 1.0 Subject: Re: REVIEW: Fix for incore extent corruption. References: <48D19A83.4040608@thebarn.com> <48D1CD46.4010104@sgi.com> <48D1DCD5.7040502@thebarn.com> In-Reply-To: <48D1DCD5.7040502@thebarn.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Russell Cattelan Cc: xfs@oss.sgi.com Russell, this fixes xfs_iext_irec_compact_full(). If we don't move all the records from the next page into the current page then we need to update the er_extoff of the modified page as we move the remaining extents up. Would you mind giving it a go? --- a/fs/xfs/xfs_inode.c 2008-09-18 18:48:46.000000000 +1000 +++ b/fs/xfs/xfs_inode.c 2008-09-18 18:57:18.000000000 +1000 @@ -4623,6 +4623,7 @@ xfs_iext_irec_compact_full( (XFS_LINEAR_EXTS - erp_next->er_extcount) * sizeof(xfs_bmbt_rec_t)); + erp_next->er_extoff += ext_diff; } } Russell Cattelan wrote: > Lachlan McIlroy wrote: >> Russell Cattelan wrote: >>> >>> Reference: >>> http://oss.sgi.com/archives/xfs/2008-06/msg00209.html >>> >>> >>> It turns out that the code in question is still broken. >>> >>> xfs_iext_irec_compact_full will still corrupt the incore extent list >>> if it does >>> the the partial copy of extents from one page to the next. >>> I haven't quit figured out where things get out of sync but somehow >>> if_real_bytes which tracks the total number of bytes available in >>> the extent buffers and if_bytes (which tracks the total bytes used >>> for extents. >>> >>> So at some point the inode thinks is has more extents than allocated >>> pages allow. >>> So what happens is xfs_iext_idx_to_irec which uses idxp to pass IN the >>> absolute extent index is suppose to change idxp on the way OUT and >>> erp_idxp >>> to be a buffer index pair. What happens is that it doesn't find the >>> extent so idxp >>> is left holding the same value as was passed in and erp_idx is 0. >>> This causes the extent code to then index way past the end of extent >>> buffer 0 >>> into garbage mem. >>> >>> with 4k ext buffers max extent count per buffer is 256. >>> example being: >>> IN: >>> idxp = 400 >>> should become: >>> idexp = 144 >>> erp_idxp = 1 >>> >>> but we end up not finding the extent so >>> we have >>> idxp = 400 >>> erp_idxp =0 >>> >>> so we now index 6400 bytes into a 4k buffer. >>> >>> Which often times is a pages of mostly 0 so we end up with access to >>> block 0 errors. >>> >>> The more I looked at this code the more it didn't make sense to do >>> partial moves. >>> Since the list of extent buffers is only scanned once vs restarting >>> the list whenever a partial move is done, >>> it is very unlikely to actually free an extent buffer. (granted it's >>> possible but unlikely) >>> >>> xfs_iext_irec_compact_pages does the same thing as >>> xfs_iext_irec_compact_full except that doesn't handle partial moves. >>> >>> xfs_iext_irec_compact is written such that ratio of current extents >>> has to be significantly smaller than the current allocated space >>> xfs_inode: 4513 >>> nextents < ( nlists * XFS_LINEAR_EXT) >> 3 >>> >>> As it turns out 99% of the time it calls xfs_iext_irec_compact_pages >>> (which is why it has been so hard to track this bug down). >>> >>> If you change the 3 to a 1 so the code alway calls compact_full vs >>> compact_pages the extent list will corrupt amost >>> immediately. >> Awesome work Russell, we'll give it a go. >> >>> >>> Since the code is broken, almost never used, and provides only micro >>> optimization of incore space I propose we just >>> remove it all together. >> Are you sure the bug is in xfs_iext_irec_compact_full? >> >> Is it possible that we are still indexing beyond the page when using >> xfs_iext_irec_compact_pages but the pages just happen to be sequential >> so the indexing gets the extent anyway? > I added a bunch of printk to track this down, the compact_pages path is > hit > a lot in fact as far as I can tell all running file systems that shrink > extents and don't crash :-) > > I should have done this originally my I'm including the modified > makeextents that > I used to tickle this problem. > It reserves a bunch of space to create a contiguous extents then in > unreserves space to > poke a bunch of holes creating a big extent list, it then goes back and > writes the whole > file hopefully collapsing extents as it goes. > > i.e. > makeextents -v -c 512 foo ; xfs_bmap -v foo > should give you 1024 extents > makeextents -v -f -c 512 foo ; xfs_bmap -v foo > will do the same thing but fill in the file with writes. > The number of resulting extents vary, but sometimes you > end up with one extent. sometimes more. > > If you change the 3 to a 1 in the current code so compact_full is used > vs compact_pages > and run the test it will hit some problem every time. > xexlist in kdb will show the corruption in the incore list. > > This will run the code through all 3 formats so if you are lucky you end > up hitting all > the cases indirect > 256, direct <= 256, and inline <= 2 > > note: xfs_iext_indirect_to_direct does call compact_full but in that > case we are already down > to under 256 extents (at least we should be ) and at that point > compact_full will behave just like compact_pages. > > >> >>> >>> I'm also including an xfsidbg patch that for xexlist that prints out >>> buffer offset and index. >>> >>> -Russell Cattelan >>> >>> >> >