All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>, linux-ext4@vger.kernel.org
Subject: Re: fsstress-induced corruption reproduced
Date: Tue, 05 Jan 2010 17:37:57 -0600	[thread overview]
Message-ID: <4B43CD55.1050904@redhat.com> (raw)
In-Reply-To: <20100105061728.GA7868@skywalker.linux.vnet.ibm.com>

Aneesh Kumar K.V wrote:
> On Mon, Jan 04, 2010 at 05:08:55PM -0600, Eric Sandeen wrote:
>   
>> Eric Sandeen wrote:
>>     
>>> Theodore Ts'o wrote:
>>>       
>>>> One of the things which has been annoying me for a while now is a
>>>> hard-to-reproduce xfsqa failure in test #13 (fsstress), which causes the
>>>> a test failure because the file system found to be inconsistent:
>>>>
>>>> Inode NNN, i_blocks is X, should be Y.
>>>>         
>>> Interesting, this apparently has gotten much worse since 2.6.32.
>>>
>>> I wrote an xfstests reproducer, and couldn't hit it on .32; hit it right
>>> off on 2.6.33-rc2.
>>>
>>> Probably should find out why ;) I'll go take a look.
>>>       
>> commit d21cd8f163ac44b15c465aab7306db931c606908
>> Author: Dmitry Monakhov <dmonakhov@openvz.org>
>> Date:   Thu Dec 10 03:31:45 2009 +0000
>>
>>     ext4: Fix potential quota deadlock
>>
>> seems to be the culprit.
>>
>> (unfortunately this means that the error we saw before is something
>> -else- to be fixed, yet)  Anyway ...
>>
>> This is because we used to do this in ext4_mb_mark_diskspace_used() :
>>
>>         /*
>>          * Now reduce the dirty block count also. Should not go negative
>>          */
>>         if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
>>                 /* release all the reserved blocks if non delalloc */
>>                 percpu_counter_sub(&sbi->s_dirtyblocks_counter,
>> reserv_blks);
>>         else {
>>                 percpu_counter_sub(&sbi->s_dirtyblocks_counter,
>>                                                 ac->ac_b_ex.fe_len);
>>                 /* convert reserved quota blocks to real quota blocks */
>>                 vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
>> 	}
>>
>> i.e. the vfs_dq_claim_block was conditional based on
>> EXT4_MB_DELALLOC_RESERVED... and the testcase did not go that way,
>> because we had already preallocated the blocks.
>>
>> But with the above quota deadlock commit it's not unconditional
>> anymore in ext4_da_update_reserve_space and we always call
>> vfs_dq_claim_block which over-accounts.
>>
>>     
>
> It is still conditional right ? We call ext4_da_update_reserve_space
> only if EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE  is set . That will
> happen only in case of delayed allocation. I guess the problem is
> same as what Ted stated. But i am not sure why we are able to reproduce
> it much easily on 2.6.33-rc2.
>
>   
Maybe something like this works:

Index: linux-2.6/fs/ext4/inode.c
===================================================================
--- linux-2.6.orig/fs/ext4/inode.c
+++ linux-2.6/fs/ext4/inode.c
@@ -1203,6 +1203,7 @@ int ext4_get_blocks(handle_t *handle, st
 		    int flags)
 {
 	int retval;
+	int was_unwritten;
 
 	clear_buffer_mapped(bh);
 	clear_buffer_unwritten(bh);
@@ -1253,9 +1254,13 @@ int ext4_get_blocks(handle_t *handle, st
 	 * part of the uninitialized extent to be an initialized
 	 * extent.  This is because we need to avoid the combination
 	 * of BH_Unwritten and BH_Mapped flags being simultaneously
-	 * set on the buffer_head.
+	 * set on the buffer_head.  However, if it was unwritten we
+	 * don't want to update reserved space later.
 	 */
-	clear_buffer_unwritten(bh);
+	if (buffer_unwritten(bh)) {
+		was_unwritten = 1;
+		clear_buffer_unwritten(bh);
+	}
 
 	/*
 	 * New blocks allocate and/or writing to uninitialized extent
@@ -1301,7 +1306,8 @@ int ext4_get_blocks(handle_t *handle, st
 	 * Update reserved blocks/metadata blocks after successful
 	 * block allocation which had been deferred till now.
 	 */
-	if ((retval > 0) && (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
+	if ((retval > 0) && !was_unwritten &&
+	    (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
 		ext4_da_update_reserve_space(inode, retval);
 
 	up_write((&EXT4_I(inode)->i_data_sem));

but that might leave the previous reservations hanging around from
prior to the fallocate ...

-Eric



  parent reply	other threads:[~2010-01-05 23:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-31 15:50 fsstress-induced corruption reproduced Theodore Ts'o
2010-01-04 20:13 ` Eric Sandeen
2010-01-04 23:08   ` Eric Sandeen
2010-01-05  6:17     ` Aneesh Kumar K.V
2010-01-05 14:40       ` Eric Sandeen
2010-01-05 23:37       ` Eric Sandeen [this message]
2010-01-06  8:49         ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B43CD55.1050904@redhat.com \
    --to=sandeen@redhat.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.