From: "Theodore Ts'o" <tytso@mit.edu>
To: Alex Tomas <bzzz@sun.com>, Andreas Dilger <adilger@sun.com>
Cc: linux-ext4@vger.kernel.org
Subject: Potential bug in mballoc --- reusing data blocks before txn commit
Date: Sat, 27 Sep 2008 21:35:46 -0400 [thread overview]
Message-ID: <E1KjlCM-0002L7-SO@closure.thunk.org> (raw)
So while I was preparing a patch to delete the old legacy block
allocator, I came across this bit of code in balloc.c:
/**
* ext4_test_allocatable()
* @nr: given allocation block group
* @bh: bufferhead contains the bitmap of the given bloc
*
* For ext4 allocations, we must not reuse any blocks which are
* allocated in the bitmap buffer's "last committed data" copy. This
* prevents deletes from freeing up the page for reuse until we have
* committed the delete transaction.
*
* If we didn't do this, then deleting something and reallocating it as
* data would allow the old block to be overwritten before the
* transaction committed (because we force data to disk before commit).
* This would lead to corruption if we crashed between overwriting the
* data and committing the delete.
*
* @@@ We may want to make this allocation behaviour conditional on
* data-writes at some point, and disable it for metadata allocations or
* sync-data inodes.
*/
This is done by searching an old copy of the bitmap found
jh->b_committed_data.
So I was more than a little bit concerned when I found that mballoc.c
wasn't referencing b_committed data *at* *all*. When I looked a bit
more closely, it looks like that mballoc is using a separate scheme,
based on linked list hanging off of sbi->s_active_transaction.
Unfortunately, it seems to only prevent released metadata blocks from
being reused:
if (metadata) {
/* blocks being freed are metadata. these blocks shouldn't
* be used until this transaction is committed */
ext4_mb_free_metadata(handle, &e4b, block_group, bit, count);
} else {
This means that if a file is deleted, but then system crashes before the
commit, some of its data blocks could be reallocated and then written
out. We could fix this by running all released blocks via
ext4_mb_free_metadata(), but since ext4_mb_free_metadata() stores blocks
that need to be protected via a block list, this could get *quite*
unwieldy, especially when deleting very large files (we could end up
with a very large linked list). Alternate solutions would involve using
a list of extents, or the original jh->b_commited_data mechanism.
I'll also note that a linked list of extents that should be freed would
also be useful for implementing the trim command for SSD's --- and that
this would be much more cleanly implemented via a callback from the jbd2
layer when a commit is finished, rather than the current
ext4_mb_poll_new_transaction() mechanism.
In any case, is there a reason why the mballoc.c is using its current
scheme, and not using kj->b_commited_data as in the original balloc.c
code? And was there a reason why you decided that it wasn't necessary
to protect freed data blocks from being reused until the transaction was
committed?
Regards,
- Ted
next reply other threads:[~2008-09-28 1:35 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-28 1:35 Theodore Ts'o [this message]
2008-09-29 20:21 ` Potential bug in mballoc --- reusing data blocks before txn commit Alex Tomas
2008-09-29 20:57 ` Theodore Tso
2008-09-29 21:04 ` Ric Wheeler
2008-09-29 23:00 ` Theodore Tso
2008-09-29 23:05 ` Ric Wheeler
2008-09-30 4:35 ` Alex Tomas
2008-09-30 13:02 ` Theodore Tso
2008-09-30 13:12 ` Alex Tomas
2008-09-30 14:15 ` Theodore Tso
2008-10-01 7:17 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1KjlCM-0002L7-SO@closure.thunk.org \
--to=tytso@mit.edu \
--cc=adilger@sun.com \
--cc=bzzz@sun.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox