From: Russell Cattelan <cattelan@thebarn.com>
To: lachlan@sgi.com
Cc: xfs@oss.sgi.com
Subject: Re: REVIEW: Fix for incore extent corruption.
Date: Thu, 18 Sep 2008 14:34:50 -0700 [thread overview]
Message-ID: <48D2C97A.1070703@thebarn.com> (raw)
In-Reply-To: <48D218AE.9090400@sgi.com>
Lachlan McIlroy wrote:
> Russell, this fixes xfs_iext_irec_compact_full(). If we don't move
> all the records from the next page into the current page then we need
> to update the er_extoff of the modified page as we move the remaining
> extents up. Would you mind giving it a go?
>
> --- a/fs/xfs/xfs_inode.c 2008-09-18 18:48:46.000000000 +1000
> +++ b/fs/xfs/xfs_inode.c 2008-09-18 18:57:18.000000000 +1000
> @@ -4623,6 +4623,7 @@ xfs_iext_irec_compact_full(
> (XFS_LINEAR_EXTS -
> erp_next->er_extcount) *
> sizeof(xfs_bmbt_rec_t));
> + erp_next->er_extoff += ext_diff;
> }
> }
>
Cool I'll give it some run through when I done traveling.
I still think compact_full should simply be eliminated since
it really doesn't help, and it's obviously confusing code.
Or we should make sure it works and get rid of compact_pages
since compact_full behaves just like compact_pages when not
doing partial moves.
>
> Russell Cattelan wrote:
>> Lachlan McIlroy wrote:
>>> Russell Cattelan wrote:
>>>>
>>>> Reference:
>>>> http://oss.sgi.com/archives/xfs/2008-06/msg00209.html
>>>>
>>>>
>>>> It turns out that the code in question is still broken.
>>>>
>>>> xfs_iext_irec_compact_full will still corrupt the incore extent
>>>> list if it does
>>>> the the partial copy of extents from one page to the next.
>>>> I haven't quit figured out where things get out of sync but somehow
>>>> if_real_bytes which tracks the total number of bytes available in
>>>> the extent buffers and if_bytes (which tracks the total bytes used
>>>> for extents.
>>>>
>>>> So at some point the inode thinks is has more extents than
>>>> allocated pages allow.
>>>> So what happens is xfs_iext_idx_to_irec which uses idxp to pass IN the
>>>> absolute extent index is suppose to change idxp on the way OUT and
>>>> erp_idxp
>>>> to be a buffer index pair. What happens is that it doesn't find the
>>>> extent so idxp
>>>> is left holding the same value as was passed in and erp_idx is 0.
>>>> This causes the extent code to then index way past the end of
>>>> extent buffer 0
>>>> into garbage mem.
>>>>
>>>> with 4k ext buffers max extent count per buffer is 256.
>>>> example being:
>>>> IN:
>>>> idxp = 400
>>>> should become:
>>>> idexp = 144
>>>> erp_idxp = 1
>>>>
>>>> but we end up not finding the extent so
>>>> we have
>>>> idxp = 400
>>>> erp_idxp =0
>>>>
>>>> so we now index 6400 bytes into a 4k buffer.
>>>>
>>>> Which often times is a pages of mostly 0 so we end up with access
>>>> to block 0 errors.
>>>>
>>>> The more I looked at this code the more it didn't make sense to do
>>>> partial moves.
>>>> Since the list of extent buffers is only scanned once vs restarting
>>>> the list whenever a partial move is done,
>>>> it is very unlikely to actually free an extent buffer. (granted
>>>> it's possible but unlikely)
>>>>
>>>> xfs_iext_irec_compact_pages does the same thing as
>>>> xfs_iext_irec_compact_full except that doesn't handle partial moves.
>>>>
>>>> xfs_iext_irec_compact is written such that ratio of current extents
>>>> has to be significantly smaller than the current allocated space
>>>> xfs_inode: 4513
>>>> nextents < ( nlists * XFS_LINEAR_EXT) >> 3
>>>>
>>>> As it turns out 99% of the time it calls xfs_iext_irec_compact_pages
>>>> (which is why it has been so hard to track this bug down).
>>>>
>>>> If you change the 3 to a 1 so the code alway calls compact_full vs
>>>> compact_pages the extent list will corrupt amost
>>>> immediately.
>>> Awesome work Russell, we'll give it a go.
>>>
>>>>
>>>> Since the code is broken, almost never used, and provides only
>>>> micro optimization of incore space I propose we just
>>>> remove it all together.
>>> Are you sure the bug is in xfs_iext_irec_compact_full?
>>>
>>> Is it possible that we are still indexing beyond the page when using
>>> xfs_iext_irec_compact_pages but the pages just happen to be sequential
>>> so the indexing gets the extent anyway?
>> I added a bunch of printk to track this down, the compact_pages path
>> is hit
>> a lot in fact as far as I can tell all running file systems that
>> shrink extents and don't crash :-)
>>
>> I should have done this originally my I'm including the modified
>> makeextents that
>> I used to tickle this problem.
>> It reserves a bunch of space to create a contiguous extents then in
>> unreserves space to
>> poke a bunch of holes creating a big extent list, it then goes back
>> and writes the whole
>> file hopefully collapsing extents as it goes.
>>
>> i.e.
>> makeextents -v -c 512 foo ; xfs_bmap -v foo
>> should give you 1024 extents
>> makeextents -v -f -c 512 foo ; xfs_bmap -v foo
>> will do the same thing but fill in the file with writes.
>> The number of resulting extents vary, but sometimes you
>> end up with one extent. sometimes more.
>>
>> If you change the 3 to a 1 in the current code so compact_full is
>> used vs compact_pages
>> and run the test it will hit some problem every time.
>> xexlist in kdb will show the corruption in the incore list.
>>
>> This will run the code through all 3 formats so if you are lucky you
>> end up hitting all
>> the cases indirect > 256, direct <= 256, and inline <= 2
>>
>> note: xfs_iext_indirect_to_direct does call compact_full but in that
>> case we are already down
>> to under 256 extents (at least we should be ) and at that point
>> compact_full will behave just like compact_pages.
>>
>>
>>>
>>>>
>>>> I'm also including an xfsidbg patch that for xexlist that prints
>>>> out buffer offset and index.
>>>>
>>>> -Russell Cattelan
>>>>
>>>>
>>>
>>
>
next prev parent reply other threads:[~2008-09-18 21:33 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-18 0:02 REVIEW: Fix for incore extent corruption Russell Cattelan
2008-09-18 3:38 ` Lachlan McIlroy
2008-09-18 4:45 ` Russell Cattelan
2008-09-18 7:02 ` Lachlan McIlroy
2008-09-18 9:00 ` Lachlan McIlroy
2008-09-18 18:30 ` Eric Sandeen
2008-09-18 19:28 ` Eric Sandeen
2008-09-19 0:59 ` Lachlan McIlroy
2008-09-19 0:55 ` Lachlan McIlroy
2008-09-19 7:01 ` Eric Sandeen
2008-09-22 2:08 ` Lachlan McIlroy
2008-09-18 21:34 ` Russell Cattelan [this message]
2008-09-18 22:20 ` Eric Sandeen
2008-09-19 0:51 ` Lachlan McIlroy
2008-09-19 3:25 ` Lachlan McIlroy
2008-09-19 6:23 ` Eric Sandeen
2008-09-19 15:15 ` Russell Cattelan
2008-09-22 2:17 ` Lachlan McIlroy
2008-09-19 15:03 ` Russell Cattelan
2008-09-22 2:33 ` Lachlan McIlroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48D2C97A.1070703@thebarn.com \
--to=cattelan@thebarn.com \
--cc=lachlan@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox