From: Eric Sandeen <sandeen@redhat.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: fsstress-induced corruption reproduced
Date: Mon, 04 Jan 2010 17:08:55 -0600 [thread overview]
Message-ID: <4B427507.40004@redhat.com> (raw)
In-Reply-To: <4B424BE4.3030605@redhat.com>
Eric Sandeen wrote:
> Theodore Ts'o wrote:
>> One of the things which has been annoying me for a while now is a
>> hard-to-reproduce xfsqa failure in test #13 (fsstress), which causes the
>> a test failure because the file system found to be inconsistent:
>>
>> Inode NNN, i_blocks is X, should be Y.
>
> Interesting, this apparently has gotten much worse since 2.6.32.
>
> I wrote an xfstests reproducer, and couldn't hit it on .32; hit it right
> off on 2.6.33-rc2.
>
> Probably should find out why ;) I'll go take a look.
commit d21cd8f163ac44b15c465aab7306db931c606908
Author: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Thu Dec 10 03:31:45 2009 +0000
ext4: Fix potential quota deadlock
seems to be the culprit.
(unfortunately this means that the error we saw before is something
-else- to be fixed, yet) Anyway ...
This is because we used to do this in ext4_mb_mark_diskspace_used() :
/*
* Now reduce the dirty block count also. Should not go negative
*/
if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
/* release all the reserved blocks if non delalloc */
percpu_counter_sub(&sbi->s_dirtyblocks_counter,
reserv_blks);
else {
percpu_counter_sub(&sbi->s_dirtyblocks_counter,
ac->ac_b_ex.fe_len);
/* convert reserved quota blocks to real quota blocks */
vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
}
i.e. the vfs_dq_claim_block was conditional based on
EXT4_MB_DELALLOC_RESERVED... and the testcase did not go that way,
because we had already preallocated the blocks.
But with the above quota deadlock commit it's not unconditional
anymore in ext4_da_update_reserve_space and we always call
vfs_dq_claim_block which over-accounts.
Of course with the above commit, we have no allocation context in
ext4_da_update_reserve_space... that's all long gone so we can't key
on that anymore.
However, I think the following change will fix it; I'll run it through
xfstests later on and be sure nothing else regresses.
-Eric
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5352db1..28cd8d8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1257,9 +1257,10 @@ int ext4_get_blocks(handle_t *handle, struct
inode *inode, sector_t block,
* if the caller is from delayed allocation writeout path
* we have already reserved fs blocks for allocation
* let the underlying get_block() function know to
- * avoid double accounting
+ * avoid double accounting. Ditto for prealloc blocks.
*/
- if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
+ if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE ||
+ flags & EXT4_GET_BLOCKS_UNINIT_EXT)
EXT4_I(inode)->i_delalloc_reserved_flag = 1;
/*
* We need to check for EXT4 here because migrate
-Eric
> -Eric
>
>> I finally reproduced it; the problem happens when we fallocate() a
>> region of the file which we had recently written, and which is still in
>> the page cache marked as delayed allocation blocks. When we finally
>> write those blocks out, since they are marked BH_Delay,
>> ext4_get_blocks() calls ext4_da_update_reserve_space(), which ends up
>> bumping i_blocks a second time and charging the blocks against the
>> user's quota a second time. Oops.
>>
>> Fortunately the fsck problem is one that will be fixed with a preen (and
>> if quota is enabled, a quotacheck), so it's not super serious, but we
>> should fix it when we have a chance. If anyone has time to look at it,
>> please let me know. Otherwise, I'll put it on my todo list. I don't
>> consider seriously urgent since the case is highly unlikely to occur in
>> real life, and it doesn't have any security implications; the worst an
>> attacker could do is end up charging excesss quota to herself.
>>
>> I've included a simple reproduction case below; if you run this program,
>> it will create a file "test-file" in the current working directory which
>> will appear to be 32k, even though it is really only 16k long, and if
>> you then unmount the test file system and run e2fsck -p on it, you will get
>> the error message:
>>
>> Inode XXX, i_blocks is 64, should be 32. FIXED.
>>
>> - Ted
>>
>> #define _GNU_SOURCE
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <string.h>
>> #include <sys/types.h>
>> #include <fcntl.h>
>> #include <fcntl.h>
>>
>> #define BUFSIZE 1024
>>
>> int main(int argc, char **argv)
>> {
>> int i, fd, ret;
>> char buf[BUFSIZE];
>>
>> fd = open("test-file", O_RDWR|O_CREAT|O_TRUNC, 0644);
>> if (fd < 0) {
>> perror("open");
>> exit(1);
>> }
>> memset(&buf, 0, BUFSIZE);
>> for (i=0; i < 16; i++) {
>> ret = write(fd, &buf, BUFSIZE);
>> if (ret < 0) {
>> perror("write");
>> exit(1);
>> }
>> if (ret != BUFSIZE) {
>> fprintf(stderr, "Write return expected %d, got %d\n",
>> BUFSIZE, ret);
>> exit(1);
>> }
>> }
>> ret = fallocate(fd, 0, 0, 16384);
>> if (ret < 0) {
>> perror("fallocate");
>> exit(1);
>> }
>> ret = fsync(fd);
>> if (ret < 0) {
>> perror("fsync");
>> exit(1);
>> }
>> ret = close(fd);
>> if (ret < 0) {
>> perror("close");
>> exit(1);
>> }
>> exit(0);
>> }
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-01-04 23:08 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-31 15:50 fsstress-induced corruption reproduced Theodore Ts'o
2010-01-04 20:13 ` Eric Sandeen
2010-01-04 23:08 ` Eric Sandeen [this message]
2010-01-05 6:17 ` Aneesh Kumar K.V
2010-01-05 14:40 ` Eric Sandeen
2010-01-05 23:37 ` Eric Sandeen
2010-01-06 8:49 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B427507.40004@redhat.com \
--to=sandeen@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox