From: Eric Sandeen <sandeen@redhat.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: fsstress-induced corruption reproduced
Date: Mon, 04 Jan 2010 17:08:55 -0600 [thread overview]
Message-ID: <4B427507.40004@redhat.com> (raw)
In-Reply-To: <4B424BE4.3030605@redhat.com>
Eric Sandeen wrote:
> Theodore Ts'o wrote:
>> One of the things which has been annoying me for a while now is a
>> hard-to-reproduce xfsqa failure in test #13 (fsstress), which causes the
>> a test failure because the file system found to be inconsistent:
>>
>> Inode NNN, i_blocks is X, should be Y.
>
> Interesting, this apparently has gotten much worse since 2.6.32.
>
> I wrote an xfstests reproducer, and couldn't hit it on .32; hit it right
> off on 2.6.33-rc2.
>
> Probably should find out why ;) I'll go take a look.
commit d21cd8f163ac44b15c465aab7306db931c606908
Author: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Thu Dec 10 03:31:45 2009 +0000
ext4: Fix potential quota deadlock
seems to be the culprit.
(unfortunately this means that the error we saw before is something
-else- to be fixed, yet) Anyway ...
This is because we used to do this in ext4_mb_mark_diskspace_used() :
/*
* Now reduce the dirty block count also. Should not go negative
*/
if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
/* release all the reserved blocks if non delalloc */
percpu_counter_sub(&sbi->s_dirtyblocks_counter,
reserv_blks);
else {
percpu_counter_sub(&sbi->s_dirtyblocks_counter,
ac->ac_b_ex.fe_len);
/* convert reserved quota blocks to real quota blocks */
vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
}
i.e. the vfs_dq_claim_block was conditional based on
EXT4_MB_DELALLOC_RESERVED... and the testcase did not go that way,
because we had already preallocated the blocks.
But with the above quota deadlock commit it's not unconditional
anymore in ext4_da_update_reserve_space and we always call
vfs_dq_claim_block which over-accounts.
Of course with the above commit, we have no allocation context in
ext4_da_update_reserve_space... that's all long gone so we can't key
on that anymore.
However, I think the following change will fix it; I'll run it through
xfstests later on and be sure nothing else regresses.
-Eric
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5352db1..28cd8d8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1257,9 +1257,10 @@ int ext4_get_blocks(handle_t *handle, struct
inode *inode, sector_t block,
* if the caller is from delayed allocation writeout path
* we have already reserved fs blocks for allocation
* let the underlying get_block() function know to
- * avoid double accounting
+ * avoid double accounting. Ditto for prealloc blocks.
*/
- if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
+ if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE ||
+ flags & EXT4_GET_BLOCKS_UNINIT_EXT)
EXT4_I(inode)->i_delalloc_reserved_flag = 1;
/*
* We need to check for EXT4 here because migrate
-Eric
> -Eric
>
>> I finally reproduced it; the problem happens when we fallocate() a
>> region of the file which we had recently written, and which is still in
>> the page cache marked as delayed allocation blocks. When we finally
>> write those blocks out, since they are marked BH_Delay,
>> ext4_get_blocks() calls ext4_da_update_reserve_space(), which ends up
>> bumping i_blocks a second time and charging the blocks against the
>> user's quota a second time. Oops.
>>
>> Fortunately the fsck problem is one that will be fixed with a preen (and
>> if quota is enabled, a quotacheck), so it's not super serious, but we
>> should fix it when we have a chance. If anyone has time to look at it,
>> please let me know. Otherwise, I'll put it on my todo list. I don't
>> consider seriously urgent since the case is highly unlikely to occur in
>> real life, and it doesn't have any security implications; the worst an
>> attacker could do is end up charging excesss quota to herself.
>>
>> I've included a simple reproduction case below; if you run this program,
>> it will create a file "test-file" in the current working directory which
>> will appear to be 32k, even though it is really only 16k long, and if
>> you then unmount the test file system and run e2fsck -p on it, you will get
>> the error message:
>>
>> Inode XXX, i_blocks is 64, should be 32. FIXED.
>>
>> - Ted
>>
>> #define _GNU_SOURCE
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <string.h>
>> #include <sys/types.h>
>> #include <fcntl.h>
>> #include <fcntl.h>
>>
>> #define BUFSIZE 1024
>>
>> int main(int argc, char **argv)
>> {
>> int i, fd, ret;
>> char buf[BUFSIZE];
>>
>> fd = open("test-file", O_RDWR|O_CREAT|O_TRUNC, 0644);
>> if (fd < 0) {
>> perror("open");
>> exit(1);
>> }
>> memset(&buf, 0, BUFSIZE);
>> for (i=0; i < 16; i++) {
>> ret = write(fd, &buf, BUFSIZE);
>> if (ret < 0) {
>> perror("write");
>> exit(1);
>> }
>> if (ret != BUFSIZE) {
>> fprintf(stderr, "Write return expected %d, got %d\n",
>> BUFSIZE, ret);
>> exit(1);
>> }
>> }
>> ret = fallocate(fd, 0, 0, 16384);
>> if (ret < 0) {
>> perror("fallocate");
>> exit(1);
>> }
>> ret = fsync(fd);
>> if (ret < 0) {
>> perror("fsync");
>> exit(1);
>> }
>> ret = close(fd);
>> if (ret < 0) {
>> perror("close");
>> exit(1);
>> }
>> exit(0);
>> }
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-01-04 23:08 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-31 15:50 fsstress-induced corruption reproduced Theodore Ts'o
2010-01-04 20:13 ` Eric Sandeen
2010-01-04 23:08 ` Eric Sandeen [this message]
2010-01-05 6:17 ` Aneesh Kumar K.V
2010-01-05 14:40 ` Eric Sandeen
2010-01-05 23:37 ` Eric Sandeen
2010-01-06 8:49 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B427507.40004@redhat.com \
--to=sandeen@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.