Re: new block group bitmaps not being initialized after resize!?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Sandeen <sandeen@redhat.com>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Carlos Maiolino <cmaiolino@redhat.com>, linux-ext4@vger.kernel.org
Subject: Re: new block group bitmaps not being initialized after resize!?
Date: Fri, 11 Jan 2013 09:47:01 -0600	[thread overview]
Message-ID: <50F033F5.6080008@redhat.com> (raw)
In-Reply-To: <25BFE2C2-8FF2-4064-86D3-6CFBF5A2931F@dilger.ca>

On 1/11/13 12:44 AM, Andreas Dilger wrote:
> On 2013-01-10, at 6:07 PM, Eric Sandeen wrote:
>> On 1/10/13 6:07 PM, Carlos Maiolino wrote:
>>> I'm working on a Fedora bugzilla (https://bugzilla.redhat.com/show_bug.cgi?id=852833)
>>> together with Eric and we found that the problem described on the
>>> bugzilla happens when the commit 93f9052643 is not applied, which
>>> is the case of the Fedora 16 kernel being discussed there.
>>
>> Also, to be clear, this is with older e2fsprogs which was using the
>> old resize interface.  Not sure what the behavior is w/ newer
>> e2fsprogs, but we don't see the corruption.
>>
>> Note, we see the corruption on these older kernels even when resizing from say 100G to 120G.  It appears fixed upstream, but so much has
>> changed, we need to be sure the older interface doesn't have bugs
>> lurking, I think.
> 
> The original resize code didn't ever know about uninit_bg, so it
> would always zero out the inode table, so I suspect that this was
> added at some later point. 

Ok, so that must have gotten dropped when everything was reworked
to the new interface.

We probably should have a (hidden?) switch in resize2fs to exercise
the old interface, so we can test this.

>>> Although we already found the solution to the problem in the commit
>>> above, looking through the commit have raised some questions
>>> regarding to the bitmap of the newer block groups added to the FS
>>> after it is extended.
>>>
>>> The newer block groups do not have flags ITABLE_ZEROED and
>>> INODE_UNINIT set, even when 'lazy_itable_init' is enabled.
>>
>> In particular, we see things like this in the last pre-resize group:
>>
>> Group 799: (Blocks 26181632-26214399) [INODE_UNINIT, ITABLE_ZEROED]
>>  Checksum 0xafe7, unused inodes 1936
>>  Block bitmap at 25165855 (bg #768 + 31), Inode bitmap at 25166111 (bg #768 + 287)
>>  Inode table at 25170087-25170207 (bg #768 + 4263)
>>  32768 free blocks, 1936 free inodes, 0 directories, 1936 unused inodes
>>  Free blocks: 26181632-26214399
>>  Free inodes: 1546865-1548800
>>
>> and this in the newly-added groups:
>>
>> Group 800: (Blocks 26214400-26247167)
>>  Checksum 0xddc4, unused inodes 0
>>  Block bitmap at 26236224 (+21824), Inode bitmap at 26236225 (+21825)
>>  Inode table at 26214400-26214520
>>  32645 free blocks, 1936 free inodes, 0 directories
>>  Free blocks: 26214521-26236223, 26236226-26247167
>>  Free inodes: 1548801-1550736
>>
>> so it says 0 unused inodes, but also 1936 free inodes (?), and
>> no UNINIT or ZEROED flags set.  e2fsck finds stale data in the inode table, and goes nuts.
> 
> Zeroing the inode table but not setting the INODE_ZEROED flag
> would not be harmful, but this seems to not be the case.

we appear to be not zeroing the table, and not setting INODE_ZEROED.
But we should have set INODE_UNINIT, or zeroed it, right?

> When the filesystem is remounted, does the kernel lazyinit
> thread zero out the new groups in the inode table?

Don't think so.

> 
>>> Without this commit, inode stale data might be exposed and also makes fsck complain about all inodes of the newer block groups.
>>
>> *nod* :)
> 
> That's why 
> 
>> so 93f9052643 seems to have accidentally fixed this, by setting
>> the unused counter to EXT4_INODES_PER_GROUP(), but it feels like
>> we've missed properly setting up this block group.
> 
> Actually, just setting the unused counter is not enough to properly
> fix this problem.  The lazyinit thread should be started to do
> background zeroing of the inode table, otherwise if the group
> descriptor is corrupted and the bg_itable_unused value is wrong,
> then the uninitialized inodes would be accessed.
> 
>> To be honest, though, sometimes I get lost in the sea of flags.
>>
>>> The question is, are these flags expected to not be found on
>>> these newer block groups or they should be set?
>>
>> *nod* :)
> 
> Depends on how it is implemented. :-/  The flag should definitely
> not be set unless the itable is actually overwritten with zeroes.
> It makes sense that the lazyinit thread would do this in the
> background while the filesystem is mounted instead of waiting for
> the next time that the filesystem is mounted.
> 
> Looking at the code, it appears that this is not happening at the
> end of the resize, since ext4_register_li_request() is marked
> static for the superblock.  It looks like it would be relatively
> straight forward to add a call to ext4_register_li_request() to
> ext4_resize_end() with the first group added to the resize.
> 
> It looks like ext4_run_li_request() will skip groups that are
> already marked as INODE_ZERO, so it is fine to always call it even
> if the kernel is already setting this itself in some cases (not
> that I see this happening).

Hum, but lazyinit will take some time to complete; in this case
we resized, unmounted, ran fsck and everything was a mess.  Even if
we'd started lazyinit I don't think that'ts enough, because we never
flagged the group as uninit.

>>> The lack of these flags on newer block groups is an expected
>>> behaviour or is something that should be fixed?
>>>
>>> FWIW, in the old ext4_group_add(), we added EXT4_BG_INODE_ZEROED
>>> flag to the bg_flags, also, I did some tests here and the lack
>>> of these flags look to not be affecting filesystem integrity,
>>> i.e. new inodes can be properly allocated, which sounds that
>>> these uninitialized inodes/bitmaps are set to be initialized
>>> as soon as a first inode is allocated on these new BGs.
> 
> Well, the old code always zeroed the inode table, and it made sense
> to mark it as such.

so I think we need to either:

1) mark it as UNINIT and start the background thread, or
2) just zero the darned thing out on resize.

and

3) make this testable, again.  :/

-Eric

> Cheers, Andreas
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2013-01-11 15:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-11  0:07 new block group bitmaps not being initialized after resize!? Carlos Maiolino
2013-01-11  1:07 ` Eric Sandeen
2013-01-11  6:44   ` Andreas Dilger
2013-01-11 15:47     ` Eric Sandeen [this message]
2013-01-13  4:36       ` Theodore Ts'o
2013-01-13  5:26         ` Eric Sandeen
2013-01-13 13:37           ` Theodore Ts'o
2013-01-13 13:42             ` [PATCH] ext4: trigger the lazy inode table initialization after resize Theodore Ts'o
2013-01-14 13:04               ` Carlos Maiolino
2013-01-14 14:33                 ` Theodore Ts'o
2013-01-14 14:42                   ` Carlos Maiolino
2013-01-14 14:45                   ` Carlos Maiolino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50F033F5.6080008@redhat.com \
    --to=sandeen@redhat.com \
    --cc=adilger@dilger.ca \
    --cc=cmaiolino@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.