linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Carlos Maiolino <cmaiolino@redhat.com>, linux-ext4@vger.kernel.org
Subject: Re: new block group bitmaps not being initialized after resize!?
Date: Fri, 11 Jan 2013 09:47:01 -0600	[thread overview]
Message-ID: <50F033F5.6080008@redhat.com> (raw)
In-Reply-To: <25BFE2C2-8FF2-4064-86D3-6CFBF5A2931F@dilger.ca>

On 1/11/13 12:44 AM, Andreas Dilger wrote:
> On 2013-01-10, at 6:07 PM, Eric Sandeen wrote:
>> On 1/10/13 6:07 PM, Carlos Maiolino wrote:
>>> I'm working on a Fedora bugzilla (https://bugzilla.redhat.com/show_bug.cgi?id=852833)
>>> together with Eric and we found that the problem described on the
>>> bugzilla happens when the commit 93f9052643 is not applied, which
>>> is the case of the Fedora 16 kernel being discussed there.
>>
>> Also, to be clear, this is with older e2fsprogs which was using the
>> old resize interface.  Not sure what the behavior is w/ newer
>> e2fsprogs, but we don't see the corruption.
>>
>> Note, we see the corruption on these older kernels even when resizing from say 100G to 120G.  It appears fixed upstream, but so much has
>> changed, we need to be sure the older interface doesn't have bugs
>> lurking, I think.
> 
> The original resize code didn't ever know about uninit_bg, so it
> would always zero out the inode table, so I suspect that this was
> added at some later point. 

Ok, so that must have gotten dropped when everything was reworked
to the new interface.

We probably should have a (hidden?) switch in resize2fs to exercise
the old interface, so we can test this.

>>> Although we already found the solution to the problem in the commit
>>> above, looking through the commit have raised some questions
>>> regarding to the bitmap of the newer block groups added to the FS
>>> after it is extended.
>>>
>>> The newer block groups do not have flags ITABLE_ZEROED and
>>> INODE_UNINIT set, even when 'lazy_itable_init' is enabled.
>>
>> In particular, we see things like this in the last pre-resize group:
>>
>> Group 799: (Blocks 26181632-26214399) [INODE_UNINIT, ITABLE_ZEROED]
>>  Checksum 0xafe7, unused inodes 1936
>>  Block bitmap at 25165855 (bg #768 + 31), Inode bitmap at 25166111 (bg #768 + 287)
>>  Inode table at 25170087-25170207 (bg #768 + 4263)
>>  32768 free blocks, 1936 free inodes, 0 directories, 1936 unused inodes
>>  Free blocks: 26181632-26214399
>>  Free inodes: 1546865-1548800
>>
>> and this in the newly-added groups:
>>
>> Group 800: (Blocks 26214400-26247167)
>>  Checksum 0xddc4, unused inodes 0
>>  Block bitmap at 26236224 (+21824), Inode bitmap at 26236225 (+21825)
>>  Inode table at 26214400-26214520
>>  32645 free blocks, 1936 free inodes, 0 directories
>>  Free blocks: 26214521-26236223, 26236226-26247167
>>  Free inodes: 1548801-1550736
>>
>> so it says 0 unused inodes, but also 1936 free inodes (?), and
>> no UNINIT or ZEROED flags set.  e2fsck finds stale data in the inode table, and goes nuts.
> 
> Zeroing the inode table but not setting the INODE_ZEROED flag
> would not be harmful, but this seems to not be the case.

we appear to be not zeroing the table, and not setting INODE_ZEROED.
But we should have set INODE_UNINIT, or zeroed it, right?

> When the filesystem is remounted, does the kernel lazyinit
> thread zero out the new groups in the inode table?

Don't think so.

> 
>>> Without this commit, inode stale data might be exposed and also makes fsck complain about all inodes of the newer block groups.
>>
>> *nod* :)
> 
> That's why 
> 
>> so 93f9052643 seems to have accidentally fixed this, by setting
>> the unused counter to EXT4_INODES_PER_GROUP(), but it feels like
>> we've missed properly setting up this block group.
> 
> Actually, just setting the unused counter is not enough to properly
> fix this problem.  The lazyinit thread should be started to do
> background zeroing of the inode table, otherwise if the group
> descriptor is corrupted and the bg_itable_unused value is wrong,
> then the uninitialized inodes would be accessed.
> 
>> To be honest, though, sometimes I get lost in the sea of flags.
>>
>>> The question is, are these flags expected to not be found on
>>> these newer block groups or they should be set?
>>
>> *nod* :)
> 
> Depends on how it is implemented. :-/  The flag should definitely
> not be set unless the itable is actually overwritten with zeroes.
> It makes sense that the lazyinit thread would do this in the
> background while the filesystem is mounted instead of waiting for
> the next time that the filesystem is mounted.
> 
> Looking at the code, it appears that this is not happening at the
> end of the resize, since ext4_register_li_request() is marked
> static for the superblock.  It looks like it would be relatively
> straight forward to add a call to ext4_register_li_request() to
> ext4_resize_end() with the first group added to the resize.
> 
> It looks like ext4_run_li_request() will skip groups that are
> already marked as INODE_ZERO, so it is fine to always call it even
> if the kernel is already setting this itself in some cases (not
> that I see this happening).

Hum, but lazyinit will take some time to complete; in this case
we resized, unmounted, ran fsck and everything was a mess.  Even if
we'd started lazyinit I don't think that'ts enough, because we never
flagged the group as uninit.

>>> The lack of these flags on newer block groups is an expected
>>> behaviour or is something that should be fixed?
>>>
>>> FWIW, in the old ext4_group_add(), we added EXT4_BG_INODE_ZEROED
>>> flag to the bg_flags, also, I did some tests here and the lack
>>> of these flags look to not be affecting filesystem integrity,
>>> i.e. new inodes can be properly allocated, which sounds that
>>> these uninitialized inodes/bitmaps are set to be initialized
>>> as soon as a first inode is allocated on these new BGs.
> 
> Well, the old code always zeroed the inode table, and it made sense
> to mark it as such.

so I think we need to either:

1) mark it as UNINIT and start the background thread, or
2) just zero the darned thing out on resize.

and

3) make this testable, again.  :/

-Eric

> Cheers, Andreas
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  reply	other threads:[~2013-01-11 15:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-11  0:07 new block group bitmaps not being initialized after resize!? Carlos Maiolino
2013-01-11  1:07 ` Eric Sandeen
2013-01-11  6:44   ` Andreas Dilger
2013-01-11 15:47     ` Eric Sandeen [this message]
2013-01-13  4:36       ` Theodore Ts'o
2013-01-13  5:26         ` Eric Sandeen
2013-01-13 13:37           ` Theodore Ts'o
2013-01-13 13:42             ` [PATCH] ext4: trigger the lazy inode table initialization after resize Theodore Ts'o
2013-01-14 13:04               ` Carlos Maiolino
2013-01-14 14:33                 ` Theodore Ts'o
2013-01-14 14:42                   ` Carlos Maiolino
2013-01-14 14:45                   ` Carlos Maiolino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50F033F5.6080008@redhat.com \
    --to=sandeen@redhat.com \
    --cc=adilger@dilger.ca \
    --cc=cmaiolino@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).