All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: "Frédéric Bohé" <frederic.bohe@bull.net>
Cc: Theodore Tso <tytso@mit.edu>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH v2] ext4: fix initialization of UNINIT bitmap blocks
Date: Mon, 22 Sep 2008 14:17:21 +0530	[thread overview]
Message-ID: <20080922084721.GA6691@skywalker> (raw)
In-Reply-To: <1222070998.3581.25.camel@frecb007923.frec.bull.fr>

On Mon, Sep 22, 2008 at 10:09:57AM +0200, Frédéric Bohé wrote:
> Le samedi 20 septembre 2008 à 20:44 -0400, Theodore Tso a écrit :
> > On Thu, Sep 18, 2008 at 03:45:14PM +0200, Frédéric Bohé wrote:
> > > The issue here is that you can't use all inode of the second group of
> > > the fs.
> > > 
> > > This happens because resize2fs make a call to ext2fs_read_bitmaps. This
> > > function reads all bitmaps while paying attention not to read the
> > > uninited bitmap. This works well as long as the fs block size is equal
> > > to the page size. But in the above test case, the fs use 1k blocks and
> > > we have an issue. 
> > > 
> > > That's because the "read" function issued by ext2fs_read_bitmaps is a
> > > call to kernel's block_read_full_page function. So when a single bitmap
> > > block is asked for, 4 blocks (for 1k blocks fs on x86) are actually read
> > > (including the uninited ones) and their respective buffer set to
> > > uptodate. 
> > > 
> > > As we rely on the buffer's uptodate flags to initialize or not this
> > > buffer, it may happen that certain bitmap blocks are not initialized at
> > > all. So their buffer contains the random garbage that was present on the
> > > disk prior to the mkfs ( In the above test case, the inode bitmap of the
> > > second group is full a random bits so I can't use all of its inodes ).
> > 
> > Actually that's the problem.  We shouldn't be relying on the buffer's
> > uptodate flags as a hint to tell mballoc to reload the buddy bitmaps.
> > Unfortunately I didn't notice this problem by not carefully auditing
> > commit 5f21b0e6 before it went in, but it's seriously buggy by trying
> > to overload the use of the buffer's uptodate flag for anything other
> > than error handling.
> > 
> 
> Maybe I missed something, but I thought the bug I am talking about here,
> is neither related to buddy nor directly to mballoc. Sorry, I was not
> clear enough. In fact, it happens even without using mballoc. It is
> related to uninit feature with filesystems using blocks which are
> smaller than page size. If any userland process call ext2fs_read_bitmaps
> function (or try to read a bitmap block directly), you may end up with
> those buffers full of garbage. It concerns either block bitmap buffers
> or inode bitmap buffers.
> 
> 
> 
> > > I am a bit lost on how to fix this. Aneesh was right, I think it's an
> > > ext2fs_read_bitmaps bug, not a kernel bug. I guess we need a userland
> > > function to read a single block whatever the block size and page size
> > > are. I've made a try using O_DIRECT flag but I was unsuccessful. Any
> > > ideas/suggestions ?
> > 
> > No!!!!  Think about it.  It's always fair for userspace to read from
> > the block device.  If this causes the kernel to blow up, then it's a
> > kernel bug, not a userspace bug.  And it is a *perfect* demonstration
> > why overloading the uptodate flag by using it for *anything* other
> > than error signalling from the buffer I/O layer is wrong and horribly
> > fragile.
> 
> You are probably right, so maybe the patch I sent at the beginning of
> this thread makes sense ?
> 

What you can do is make ext4_group_info generic for both mballoc and
oldalloc. We can then add bg_flag to the in memory ext4_group_info
that would indicate whether the group is initialized or not. Here
initialized for an UNINIT_GROUP indicate we have done
ext4_init_block_bitmap on the buffer_head. Then 
instead of depending on the buffer_head uptodate flag we can check
for the ext4_group_info bg_flags and decided whether the block/inode
bitmap need to be initialized.

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2008-09-22  8:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-15 11:41 [PATCH] ext4: fix initialization of UNINIT bitmap blocks Frédéric Bohé
2008-09-15 12:16 ` [PATCH v2] " Frédéric Bohé
2008-09-15 13:36   ` Aneesh Kumar K.V
2008-09-15 14:30     ` Frédéric Bohé
2008-09-18 13:45       ` Frédéric Bohé
2008-09-21  0:44         ` Theodore Tso
2008-09-22  8:09           ` Frédéric Bohé
2008-09-22  8:47             ` Aneesh Kumar K.V [this message]
2008-09-22  9:32               ` Frédéric Bohé
2008-09-23 23:13                 ` Andreas Dilger
2008-09-24 12:57                   ` Frédéric Bohé
2008-09-24 16:23                     ` Theodore Tso
2008-09-25 23:04                       ` Andreas Dilger
2008-09-24 12:38                 ` Frédéric Bohé
2008-09-26 13:17                   ` Frédéric Bohé
2008-09-28 22:49   ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080922084721.GA6691@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=frederic.bohe@bull.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.