From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Theodore Tso <tytso@mit.edu>
Cc: cmm@us.ibm.com, sandeen@redhat.com, linux-ext4@vger.kernel.org
Subject: Re: [RFC PATCH -V3 8/9] ext4: Fix double free of blocks
Date: Fri, 7 Nov 2008 21:31:24 +0530 [thread overview]
Message-ID: <20081107160124.GM25194@skywalker> (raw)
In-Reply-To: <20081106223702.GN18939@mit.edu>
On Thu, Nov 06, 2008 at 05:37:02PM -0500, Theodore Tso wrote:
> On Fri, Nov 07, 2008 at 12:19:33AM +0530, Aneesh Kumar K.V wrote:
> > Blocks freed but not yet committed will be marked free in disk bitmap.
>
> That should probably read, "marked free in the on-disk block
> allocation bitmap".
>
> And I'm having real trouble parsing this below
>
> > ext4_mb_release_inode_pa looks at the block bitmap and mark the
> > block found free in the bitmap withing the inode prealloc space
>
> "the block" should that be plural? Which block is it?
>
> "found free in the bitmap"? which bitmap? The block allocation
> bitmap?
>
> "withing"? Is that supposed to be within?
>
> "withing the prealloc space"? What is that supposed to be modifying?
> It is cloest to "the bitmap", but that doesn't make any sense; is this
> phrase supposed to be modifying "the block"?
>
> > as unused. If we have blocks allocated out of a prealloc space
>
> "mark as unused" Mark as unused where? In the block allocation
> bitmap? But I thought these were "blocks found free in the bitmap",
> so aren't they already unused?
>
> It's better to try to explain things at a higher level. What what
> you've said on the conference call, I *think* what happens is that the
> preallocation range is range of blocks which is reserved for
> allocation for that particular inode. To support this, the blocks in
> question are made to appear as though they are not available in the
> mballoc's in-memory buddy bitmap, which is what mballoc consults when
> making allocations.
Yes. That is correct
>
> For some strange reason which I don't understand, when blocks are
> served out of the preallocation area and allocated to the inode, they
> are not removed from the preallocation area. (This seems like the
> real bug, but whatever...)
This happens for inode prealloc space. Because i node prealloc space
is mapped using the logical block number. Ie if we have a prealloc space
of contiguous 'x' blocks starting at physical block 'p'. When creating
this prealloc space we would have been requesting for blocks for logical
block 'l'. ie
[ p .........(p+k).............p+x]
l
Now request for 2 blocks at logical file block 'l' get allocated
from above prealloc space. Again request for 3 blocks at logical block
'l+k' get allocated from the above prealloc space. Also request for
3 blocks at logical block 'p+x+1' WON't use the above prealloc space.
Inshort inode prealloc space use logical block number to find which
prealloc space to use. We also don't reduce the pa_len of the prealloc
space. But we reduce pa_free. Now when we discard this prealloc space
we need to look at on-block bitmap and find out how many blocks actually
got allocated out of the prealloc space. So we start with the first zero
block on the block bitmap starting from physical block 'p'. We search
for next set bit. All the blocks in that range are free. We them mark
them free in the buddy bitmap.
>So when we release a preallocation area
> for an inode, for each block in the inode's preallocation area, we
> check and see if the block is marked as free according to the on-disk
> disk allocation bitmap, and if so we make it look available in
> mballoc's in-memory buddy bitmap.
>
> Unfortunately, if the disk blocks in question are freed before the
> commit transmit, the blocks would look free in the on-disk block
> allocation bitmap, and so there would be an attempt to mark them as
> available twice in mballoc's buddy bitmap.
exactly.
>
> To get around this problem, the patch allocates a temporary bitmap
> which also includes the blocks in the "release on commit" linked list.
>
> Did I get this right? If so, we should put an abbreviated version of
> the above in the commit, and more of this needs to be explicitly
> documented in comments in mballoc.c
I am tempted to add your mail above as it is :)
>
> So ---- stupid question. Instead of creating the temporary bitmap,
> which I fear will be very expensive --- why not remove the block from
> the inode's preallocation area when it is served up?
We can't do that because inode prealloc space look at logical block
of the file.
>Then, it becomes
> easier when releasing the inode preallocation area; we wouldn't need
> to consult the on-disk block allocation bitmap at all, and merely just
> iterate over all of the blocks in the inode preallocation space, and
> mark them as available in the mballoc buddy bitmap?
-aneesh
next prev parent reply other threads:[~2008-11-07 16:01 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-06 18:49 [RFC PATCH -V3 1/9] ext4: sparse fixes Aneesh Kumar K.V
2008-11-06 18:49 ` [RFC PATCH -V3 2/9] ext4: Add blocks added during resize to bitmap Aneesh Kumar K.V
2008-11-06 18:49 ` [RFC PATCH -V3 3/9] ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize Aneesh Kumar K.V
2008-11-06 18:49 ` [RFC PATCH -V3 4/9] ext4: cleanup mballoc header files Aneesh Kumar K.V
2008-11-06 18:49 ` [RFC PATCH -V3 5/9] ext4: Add sparse annotations for the group info semaphore Aneesh Kumar K.V
2008-11-06 18:49 ` [RFC PATCH -V3 6/9] jbd2: Call journal commit callback without holding j_list_lock Aneesh Kumar K.V
2008-11-06 18:49 ` [RFC PATCH -V3 7/9] ext4: don't use blocks freed but not yet committed in buddy cache init Aneesh Kumar K.V
2008-11-06 18:49 ` [RFC PATCH -V3 8/9] ext4: Fix double free of blocks Aneesh Kumar K.V
2008-11-06 18:49 ` [RFC PATCH -V3 9/9] ext4: Fix lockdep recursive locking warning Aneesh Kumar K.V
2008-11-07 8:04 ` Li Zefan
2008-11-07 16:22 ` Aneesh Kumar K.V
2008-11-08 1:21 ` Li Zefan
2008-11-06 22:37 ` [RFC PATCH -V3 8/9] ext4: Fix double free of blocks Theodore Tso
2008-11-07 16:01 ` Aneesh Kumar K.V [this message]
2008-11-06 22:09 ` [RFC PATCH -V3 1/9] ext4: sparse fixes Theodore Tso
2008-11-07 14:20 ` Aneesh Kumar K.V
2008-11-07 14:45 ` Theodore Tso
2008-11-07 15:07 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081107160124.GM25194@skywalker \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=cmm@us.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).