All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russell Cattelan <cattelan@thebarn.com>
To: lachlan@sgi.com
Cc: xfs@oss.sgi.com
Subject: Re: REVIEW: Fix for incore extent corruption.
Date: Thu, 18 Sep 2008 14:34:50 -0700	[thread overview]
Message-ID: <48D2C97A.1070703@thebarn.com> (raw)
In-Reply-To: <48D218AE.9090400@sgi.com>

Lachlan McIlroy wrote:
> Russell, this fixes xfs_iext_irec_compact_full().  If we don't move
> all the records from the next page into the current page then we need
> to update the er_extoff of the modified page as we move the remaining
> extents up.  Would you mind giving it a go?
>
> --- a/fs/xfs/xfs_inode.c    2008-09-18 18:48:46.000000000 +1000
> +++ b/fs/xfs/xfs_inode.c    2008-09-18 18:57:18.000000000 +1000
> @@ -4623,6 +4623,7 @@ xfs_iext_irec_compact_full(
>                     (XFS_LINEAR_EXTS -
>                         erp_next->er_extcount) *
>                     sizeof(xfs_bmbt_rec_t));
> +                erp_next->er_extoff += ext_diff;
>             }
>         }
>
Cool I'll give it some run through when I done traveling.

I still think compact_full should simply be eliminated since
it really doesn't help, and it's obviously confusing code.
Or we should make sure it works and get rid of compact_pages
since compact_full behaves just like compact_pages when not
doing partial moves.

>
> Russell Cattelan wrote:
>> Lachlan McIlroy wrote:
>>> Russell Cattelan wrote:
>>>>
>>>> Reference:
>>>> http://oss.sgi.com/archives/xfs/2008-06/msg00209.html
>>>>
>>>>
>>>> It turns out that the code in question is still broken.
>>>>
>>>> xfs_iext_irec_compact_full will still corrupt the incore extent 
>>>> list if it does
>>>> the the partial copy of extents from one page to the next.
>>>> I haven't quit figured out where things get out of sync but somehow
>>>> if_real_bytes which tracks the total number of bytes available in
>>>> the extent buffers and if_bytes (which tracks the total bytes used
>>>> for extents.
>>>>
>>>> So at some point the inode thinks is has more extents than 
>>>> allocated pages allow.
>>>> So what happens is xfs_iext_idx_to_irec which uses idxp to pass IN the
>>>> absolute extent index is suppose to change  idxp on the way OUT and 
>>>> erp_idxp
>>>> to be a buffer index pair. What happens is that it doesn't find the 
>>>> extent so idxp
>>>> is left holding the same value as was passed in and erp_idx is 0.
>>>> This causes the extent code to then index way past the end of 
>>>> extent buffer 0
>>>> into garbage mem.
>>>>
>>>> with 4k ext buffers max extent count per buffer is 256.
>>>> example being:
>>>> IN:
>>>> idxp = 400
>>>> should become:
>>>> idexp = 144
>>>> erp_idxp = 1
>>>>
>>>> but we end up not finding the extent so
>>>> we have
>>>> idxp = 400
>>>> erp_idxp =0
>>>>
>>>> so we now index 6400 bytes into a 4k buffer.
>>>>
>>>> Which often times  is a pages of mostly 0 so we end up with access 
>>>> to block 0 errors.
>>>>
>>>> The more I looked at this code the more it didn't make sense to do 
>>>> partial moves.
>>>> Since the list of extent buffers is only scanned once vs restarting 
>>>> the list whenever a partial move is done,
>>>> it is very unlikely to actually free an extent buffer. (granted 
>>>> it's possible but unlikely)
>>>>
>>>> xfs_iext_irec_compact_pages does the same thing as 
>>>> xfs_iext_irec_compact_full except that doesn't handle partial moves.
>>>>
>>>> xfs_iext_irec_compact is written such that ratio of current extents 
>>>> has to be significantly smaller than the current allocated space
>>>> xfs_inode: 4513
>>>> nextents < ( nlists  * XFS_LINEAR_EXT) >> 3
>>>>
>>>> As it turns out 99% of the time it calls xfs_iext_irec_compact_pages
>>>> (which is why it has been so hard to track this bug down).
>>>>
>>>> If you change the 3 to a 1 so the code alway calls compact_full vs 
>>>> compact_pages the extent list will corrupt amost
>>>> immediately.
>>> Awesome work Russell, we'll give it a go.
>>>
>>>>
>>>> Since the code is broken, almost never used, and provides only 
>>>> micro optimization of incore space I propose we just
>>>> remove it all together.
>>> Are you sure the bug is in xfs_iext_irec_compact_full?
>>>
>>> Is it possible that we are still indexing beyond the page when using
>>> xfs_iext_irec_compact_pages but the pages just happen to be sequential
>>> so the indexing gets the extent anyway?
>> I added a bunch of printk to track this down,  the compact_pages path 
>> is hit
>> a lot in fact as far as I can tell all running file systems that 
>> shrink extents and don't crash :-)
>>
>> I should have done this originally my I'm including the modified 
>> makeextents that
>> I used to tickle this problem.
>> It reserves a bunch of space to create a contiguous extents then in 
>> unreserves space to
>> poke a bunch of holes creating a big extent list, it then goes back 
>> and writes the whole
>> file hopefully collapsing extents as it goes.
>>
>> i.e.
>> makeextents -v -c 512 foo ; xfs_bmap -v foo
>> should give you 1024 extents
>> makeextents -v -f -c 512 foo ; xfs_bmap -v foo
>> will do the same thing but fill in the file with writes.
>> The number of resulting extents vary, but sometimes you
>> end up with one extent. sometimes more.
>>
>> If you change the 3 to a 1 in the current code so compact_full is 
>> used vs compact_pages
>> and run the test it will hit some problem every time.
>> xexlist in kdb will show the corruption in the incore list.
>>
>> This will run the code through all 3 formats so if you are lucky you 
>> end up hitting all
>> the cases indirect > 256, direct <= 256, and inline <= 2
>>
>> note:  xfs_iext_indirect_to_direct does call compact_full but in that 
>> case we are already down
>> to under 256 extents (at least we should be ) and at that point 
>> compact_full will behave just like compact_pages.
>>
>>
>>>
>>>>
>>>> I'm also including an xfsidbg patch that for xexlist that prints 
>>>> out buffer offset and index.
>>>>
>>>> -Russell Cattelan
>>>>
>>>>
>>>
>>
>

  parent reply	other threads:[~2008-09-18 21:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-18  0:02 REVIEW: Fix for incore extent corruption Russell Cattelan
2008-09-18  3:38 ` Lachlan McIlroy
2008-09-18  4:45   ` Russell Cattelan
2008-09-18  7:02     ` Lachlan McIlroy
2008-09-18  9:00     ` Lachlan McIlroy
2008-09-18 18:30       ` Eric Sandeen
2008-09-18 19:28         ` Eric Sandeen
2008-09-19  0:59           ` Lachlan McIlroy
2008-09-19  0:55         ` Lachlan McIlroy
2008-09-19  7:01           ` Eric Sandeen
2008-09-22  2:08             ` Lachlan McIlroy
2008-09-18 21:34       ` Russell Cattelan [this message]
2008-09-18 22:20         ` Eric Sandeen
2008-09-19  0:51           ` Lachlan McIlroy
2008-09-19  3:25             ` Lachlan McIlroy
2008-09-19  6:23               ` Eric Sandeen
2008-09-19 15:15                 ` Russell Cattelan
2008-09-22  2:17                 ` Lachlan McIlroy
2008-09-19 15:03               ` Russell Cattelan
2008-09-22  2:33                 ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48D2C97A.1070703@thebarn.com \
    --to=cattelan@thebarn.com \
    --cc=lachlan@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.