From: Jan Kara <jack@suse.cz>
To: linux-ext4@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>, Jan Kara <jack@suse.cz>,
Ying Han <yinghan@google.com>,
Nick Piggin <nickpiggin@yahoo.com.au>
Subject: [PATCH] ext2: Fix data corruption for racing writes
Date: Fri, 3 Apr 2009 01:36:38 +0200 [thread overview]
Message-ID: <1238715399-22172-1-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <n>
If two writers allocating blocks to file race with each other (e.g. because
writepages races with ordinary write or two writepages race with each other),
ext2_getblock() can be called on the same inode in parallel. Before we are
going to allocate new blocks, we have to recheck the block chain we have
obtained so far without holding truncate_mutex. Otherwise we could overwrite
the indirect block pointer set by the other writer leading to data loss.
The below test program by Ying is able to reproduce the data loss with ext2
on in BRD in a few minutes if the machine is under memory pressure:
long kMemSize = 50 << 20;
int kPageSize = 4096;
int main(int argc, char **argv) {
int status;
int count = 0;
int i;
char *fname = "/mnt/test.mmap";
char *mem;
unlink(fname);
int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600);
status = ftruncate(fd, kMemSize);
mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
// Fill the memory with 1s.
memset(mem, 1, kMemSize);
sleep(2);
for (i = 0; i < kMemSize; i++) {
int byte_good = mem[i] != 0;
if (!byte_good && ((i % kPageSize) == 0)) {
//printf("%d ", i / kPageSize);
count++;
}
}
munmap(mem, kMemSize);
close(fd);
unlink(fname);
if (count > 0) {
printf("Running %d bad page\n", count);
return 1;
}
return 0;
}
CC: Ying Han <yinghan@google.com>
CC: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext2/inode.c | 44 +++++++++++++++++++++++++++++++++-----------
1 files changed, 33 insertions(+), 11 deletions(-)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index b43b955..acf6788 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -590,9 +590,8 @@ static int ext2_get_blocks(struct inode *inode,
if (depth == 0)
return (err);
-reread:
- partial = ext2_get_branch(inode, depth, offsets, chain, &err);
+ partial = ext2_get_branch(inode, depth, offsets, chain, &err);
/* Simplest case - block found, no allocation needed */
if (!partial) {
first_block = le32_to_cpu(chain[depth - 1].key);
@@ -602,15 +601,16 @@ reread:
while (count < maxblocks && count <= blocks_to_boundary) {
ext2_fsblk_t blk;
- if (!verify_chain(chain, partial)) {
+ if (!verify_chain(chain, chain + depth - 1)) {
/*
* Indirect block might be removed by
* truncate while we were reading it.
* Handling of that case: forget what we've
* got now, go to reread.
*/
+ err = -EAGAIN;
count = 0;
- goto changed;
+ break;
}
blk = le32_to_cpu(*(chain[depth-1].p + count));
if (blk == first_block + count)
@@ -618,7 +618,8 @@ reread:
else
break;
}
- goto got_it;
+ if (err != -EAGAIN)
+ goto got_it;
}
/* Next simple case - plain lookup or failed read of indirect block */
@@ -626,6 +627,33 @@ reread:
goto cleanup;
mutex_lock(&ei->truncate_mutex);
+ /*
+ * If the indirect block is missing while we are reading
+ * the chain(ext3_get_branch() returns -EAGAIN err), or
+ * if the chain has been changed after we grab the semaphore,
+ * (either because another process truncated this branch, or
+ * another get_block allocated this branch) re-grab the chain to see if
+ * the request block has been allocated or not.
+ *
+ * Since we already block the truncate/other get_block
+ * at this point, we will have the current copy of the chain when we
+ * splice the branch into the tree.
+ */
+ if (err == -EAGAIN || !verify_chain(chain, partial)) {
+ while (partial > chain) {
+ brelse(partial->bh);
+ partial--;
+ }
+ partial = ext2_get_branch(inode, depth, offsets, chain, &err);
+ if (!partial) {
+ count++;
+ mutex_unlock(&ei->truncate_mutex);
+ if (err)
+ goto cleanup;
+ clear_buffer_new(bh_result);
+ goto got_it;
+ }
+ }
/*
* Okay, we need to do block allocation. Lazily initialize the block
@@ -683,12 +711,6 @@ cleanup:
partial--;
}
return err;
-changed:
- while (partial > chain) {
- brelse(partial->bh);
- partial--;
- }
- goto reread;
}
int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create)
--
1.6.0.2
next reply other threads:[~2009-04-02 23:37 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-02 23:36 Jan Kara [this message]
2009-04-02 23:36 ` [PATCH] ext3: Fix chain verification in ext3_get_blocks() Jan Kara
2009-04-04 0:38 ` [PATCH] ext2: Fix data corruption for racing writes Ying Han
2009-04-06 10:23 ` Jan Kara
2009-04-09 20:30 ` Andrew Morton
2009-04-09 20:59 ` Ying Han
2009-04-10 18:10 ` Ying Han
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1238715399-22172-1-git-send-email-jack@suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox