Re: REVIEW: Fix for incore extent corruption.

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Lachlan McIlroy <lachlan@sgi.com>
To: Russell Cattelan <cattelan@thebarn.com>
Cc: xfs@oss.sgi.com
Subject: Re: REVIEW: Fix for incore extent corruption.
Date: Thu, 18 Sep 2008 19:00:30 +1000	[thread overview]
Message-ID: <48D218AE.9090400@sgi.com> (raw)
In-Reply-To: <48D1DCD5.7040502@thebarn.com>

Russell, this fixes xfs_iext_irec_compact_full().  If we don't move
all the records from the next page into the current page then we need
to update the er_extoff of the modified page as we move the remaining
extents up.  Would you mind giving it a go?

--- a/fs/xfs/xfs_inode.c	2008-09-18 18:48:46.000000000 +1000
+++ b/fs/xfs/xfs_inode.c	2008-09-18 18:57:18.000000000 +1000
@@ -4623,6 +4623,7 @@ xfs_iext_irec_compact_full(
 					(XFS_LINEAR_EXTS -
 						erp_next->er_extcount) *
 					sizeof(xfs_bmbt_rec_t));
+				erp_next->er_extoff += ext_diff;
 			}
 		}


Russell Cattelan wrote:
> Lachlan McIlroy wrote:
>> Russell Cattelan wrote:
>>>
>>> Reference:
>>> http://oss.sgi.com/archives/xfs/2008-06/msg00209.html
>>>
>>>
>>> It turns out that the code in question is still broken.
>>>
>>> xfs_iext_irec_compact_full will still corrupt the incore extent list 
>>> if it does
>>> the the partial copy of extents from one page to the next.
>>> I haven't quit figured out where things get out of sync but somehow
>>> if_real_bytes which tracks the total number of bytes available in
>>> the extent buffers and if_bytes (which tracks the total bytes used
>>> for extents.
>>>
>>> So at some point the inode thinks is has more extents than allocated 
>>> pages allow.
>>> So what happens is xfs_iext_idx_to_irec which uses idxp to pass IN the
>>> absolute extent index is suppose to change  idxp on the way OUT and 
>>> erp_idxp
>>> to be a buffer index pair. What happens is that it doesn't find the 
>>> extent so idxp
>>> is left holding the same value as was passed in and erp_idx is 0.
>>> This causes the extent code to then index way past the end of extent 
>>> buffer 0
>>> into garbage mem.
>>>
>>> with 4k ext buffers max extent count per buffer is 256.
>>> example being:
>>> IN:
>>> idxp = 400
>>> should become:
>>> idexp = 144
>>> erp_idxp = 1
>>>
>>> but we end up not finding the extent so
>>> we have
>>> idxp = 400
>>> erp_idxp =0
>>>
>>> so we now index 6400 bytes into a 4k buffer.
>>>
>>> Which often times  is a pages of mostly 0 so we end up with access to 
>>> block 0 errors.
>>>
>>> The more I looked at this code the more it didn't make sense to do 
>>> partial moves.
>>> Since the list of extent buffers is only scanned once vs restarting 
>>> the list whenever a partial move is done,
>>> it is very unlikely to actually free an extent buffer. (granted it's 
>>> possible but unlikely)
>>>
>>> xfs_iext_irec_compact_pages does the same thing as 
>>> xfs_iext_irec_compact_full except that doesn't handle partial moves.
>>>
>>> xfs_iext_irec_compact is written such that ratio of current extents 
>>> has to be significantly smaller than the current allocated space
>>> xfs_inode: 4513
>>> nextents < ( nlists  * XFS_LINEAR_EXT) >> 3
>>>
>>> As it turns out 99% of the time it calls xfs_iext_irec_compact_pages
>>> (which is why it has been so hard to track this bug down).
>>>
>>> If you change the 3 to a 1 so the code alway calls compact_full vs 
>>> compact_pages the extent list will corrupt amost
>>> immediately.
>> Awesome work Russell, we'll give it a go.
>>
>>>
>>> Since the code is broken, almost never used, and provides only micro 
>>> optimization of incore space I propose we just
>>> remove it all together.
>> Are you sure the bug is in xfs_iext_irec_compact_full?
>>
>> Is it possible that we are still indexing beyond the page when using
>> xfs_iext_irec_compact_pages but the pages just happen to be sequential
>> so the indexing gets the extent anyway?
> I added a bunch of printk to track this down,  the compact_pages path is 
> hit
> a lot in fact as far as I can tell all running file systems that shrink 
> extents and don't crash :-)
> 
> I should have done this originally my I'm including the modified 
> makeextents that
> I used to tickle this problem.
> It reserves a bunch of space to create a contiguous extents then in 
> unreserves space to
> poke a bunch of holes creating a big extent list, it then goes back and 
> writes the whole
> file hopefully collapsing extents as it goes.
> 
> i.e.
> makeextents -v -c 512 foo ; xfs_bmap -v foo
> should give you 1024 extents
> makeextents -v -f -c 512 foo ; xfs_bmap -v foo
> will do the same thing but fill in the file with writes.
> The number of resulting extents vary, but sometimes you
> end up with one extent. sometimes more.
> 
> If you change the 3 to a 1 in the current code so compact_full is used 
> vs compact_pages
> and run the test it will hit some problem every time.
> xexlist in kdb will show the corruption in the incore list.
> 
> This will run the code through all 3 formats so if you are lucky you end 
> up hitting all
> the cases indirect > 256, direct <= 256, and inline <= 2
> 
> note:  xfs_iext_indirect_to_direct does call compact_full but in that 
> case we are already down
> to under 256 extents (at least we should be ) and at that point 
> compact_full will behave just like compact_pages.
> 
> 
>>
>>>
>>> I'm also including an xfsidbg patch that for xexlist that prints out 
>>> buffer offset and index.
>>>
>>> -Russell Cattelan
>>>
>>>
>>
>

next prev parent reply	other threads:[~2008-09-18  8:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-18  0:02 REVIEW: Fix for incore extent corruption Russell Cattelan
2008-09-18  3:38 ` Lachlan McIlroy
2008-09-18  4:45   ` Russell Cattelan
2008-09-18  7:02     ` Lachlan McIlroy
2008-09-18  9:00     ` Lachlan McIlroy [this message]
2008-09-18 18:30       ` Eric Sandeen
2008-09-18 19:28         ` Eric Sandeen
2008-09-19  0:59           ` Lachlan McIlroy
2008-09-19  0:55         ` Lachlan McIlroy
2008-09-19  7:01           ` Eric Sandeen
2008-09-22  2:08             ` Lachlan McIlroy
2008-09-18 21:34       ` Russell Cattelan
2008-09-18 22:20         ` Eric Sandeen
2008-09-19  0:51           ` Lachlan McIlroy
2008-09-19  3:25             ` Lachlan McIlroy
2008-09-19  6:23               ` Eric Sandeen
2008-09-19 15:15                 ` Russell Cattelan
2008-09-22  2:17                 ` Lachlan McIlroy
2008-09-19 15:03               ` Russell Cattelan
2008-09-22  2:33                 ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48D218AE.9090400@sgi.com \
    --to=lachlan@sgi.com \
    --cc=cattelan@thebarn.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox