linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>,
	cmm@us.ibm.com, linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/3] ext4: Clear the unwritten buffer_head flag properly
Date: Thu, 14 May 2009 11:30:15 +0530	[thread overview]
Message-ID: <20090514060015.GB7359@skywalker> (raw)
In-Reply-To: <20090513222847.GA10003@mit.edu>

On Wed, May 13, 2009 at 06:28:47PM -0400, Theodore Tso wrote:
> On Wed, May 13, 2009 at 01:56:56PM -0500, Eric Sandeen wrote:
> > The comment:
> > 
> >  	/*
> > +	 * The above get_blocks can cause the buffer to be
> > +	 * marked unwritten. So clear the same.
> > +	 */
> > +	clear_buffer_unwritten(bh);
> > 
> > is imho not helpful; to me it says "we -just- set this, so clear it!"
> > It leaves me scratching my head.
> 
> I've updated it the comment to say this instead.
> 
> 	/*
> 	 * When we call get_blocks without the create flag, the
> 	 * BH_Unwritten flag could have gotten set if the blocks
> 	 * requested were part of a uninitialized extent.  We need to
> 	 * clear this flag now that we are committed to convert all or
> 	 * part of the uninitialized extent to be an initialized
> 	 * extent.  This is because we need to avoid the combination
> 	 * of BH_Unwritten and BH_Mapped flags being simultaneously
> 	 * set on the buffer_head.
> 	 */

The last line in the above comment is not a problem with the latest
kernel with all the patches in the patch queue. The patch that does that
is  "ext4: Mark the unwritten buffer_head as mapped during write_begin"

The unwritten and mapped state together was a problem with the code path
we had before in ext4_da_get_block_prep
we had:

ret = ext4_get_blocks(NULL, inode, iblock, 1,  bh_result, 0);
.....
..
} else if (ret > 0) {
	bh_result->b_size = (ret << inode->i_blkbits);
	if (buffer_unwritten(bh_result)) {
		bh_result.b_blocknr = ~0;
		....
	}
}

The above can result in 

1) We do a multi block delayed alloc to prealloc space. That would get
us multiple buffer_heads marked with BH_Unwritten. (say 10, 11, 12)
2) pdflush attempt to write some pages (say mapping block 10) which
cause
a get_block call with create = 1. That would attempt to convert
uninitialized extent to initialized one. This can cause multiple blocks
to be marked initialized. ( say 10, 11 , 12)
3) We do an overwrite of block 11. That would mean we call
ext4_da_get_block_prep, which would again do a get_block for block 11
with create = 0. But remember we already have buffer_head marked with
BH_Unwritten flag. But the buffer was unmapped because it is unwritten
( We are fixing this mess in the patch for 2.6.31).
4) The get_block call will find the buffer mapped due to step b. And
mark the buffer_head mapped. There we go . We end up with buffer_head
mapped and unwritten
5) later in ext4_da_get_block_prep we check whether the buffer_head in
marked
BH_Unwritten if so we set the block number to ~0. This is introduced by
[PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten
extents
6) So now we have a buffer_head that is mapped, unwritten, with
b_blocknr = ~0. That would result in the I/O error.


Now that we have the actual block number in bh_result.b_blocknr I guess
we should be ok to have the unwritten flag set. But then i guess it is
against the VFS assumption of buffer_head state. The state unwritten
indicate the blocks are not yet written. So if we don't do a
clear_buffer_unwritten as in the patch we end up with a block that is
written ( done via create = 1 get_block call) and still marked
unwritten. (see the 6 step example above)

Now to explain what "ext4: Mark the unwritten buffer_head as mapped
during write_begin" does.

It marks the buffer_head as mapped in the write_begin phase.
And that helps in making sure we don't end up calling get_block
multiple times.  So with delayed allocation we now have between
the write_begin and writepage phase a buffer_head which is mapped
and unwritten for blocks in the fallocate space. Once we do
writepage for the block we will have buffer_head which is mapped and
unwritten flag cleared. The unwritten get cleared when we do a get_block
call with create = 1.


To achieve the above we need to make sure writepage code path looks at
the unwritten flag and does a get_blocks call with create = 1. With
mainline we have in writepage code path

mpage_da_map_blocks(...)

if ((mpd->b_state  & (1 << BH_Mapped)) &&
    !(mpd->b_state & (1 << BH_Delay)))
	return 0;
	...
	..
	ext4_da_get_block_write()


If we start marking buffer_head mapped for fallocate blocks in the
write_begin phase, then the above code will return without doing
any get_block(create = 1) call. That means we don't convert the
uninitialized extent to initialized one. So along with marking
buffer_head mapped and unwritten in the write_begin phase we also
need changes to writepage code path which does something like below

mpage_da_map_blocks(...)

if ((mpd->b_state  & (1 << BH_Mapped)) &&
	!(mpd->b_state & (1 << BH_Delay)) &&
	!(mpd->b_state & (1 << BH_Unwritten)))
	    return 0;
	...
	..
	ext4_da_get_block_write()

this is done in patch "ext4: Mark the unwritten buffer_head as mapped
during write_begin".

-aneesh

  reply	other threads:[~2009-05-14  6:00 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-07 10:39 [PATCH 1/3] ext4: Properly initialize the buffer_head state Aneesh Kumar K.V
2009-05-07 10:39 ` [PATCH 2/3] ext4: Clear the unwritten buffer_head flag properly Aneesh Kumar K.V
2009-05-07 10:39   ` [PATCH 3/3] vfs: Add BUG_ON for delayed and unwritten extents in submit_bh Aneesh Kumar K.V
2009-05-07 15:37     ` Eric Sandeen
2009-05-12  3:17     ` Theodore Tso
2009-05-12  4:52       ` [PATCH 3/3] vfs: Add BUG_ON for delayed and unwritten extentsin submit_bh Aneesh Kumar K.V
2009-05-12 13:25         ` Eric Sandeen
2009-05-07 15:36   ` [PATCH 2/3] ext4: Clear the unwritten buffer_head flag properly Eric Sandeen
2009-05-08  8:12     ` Aneesh Kumar K.V
2009-05-12  3:08   ` Theodore Tso
2009-05-12  4:46     ` Aneesh Kumar K.V
2009-05-13 18:56       ` Eric Sandeen
2009-05-13 22:28         ` Theodore Tso
2009-05-14  6:00           ` Aneesh Kumar K.V [this message]
2009-05-14  5:40         ` Aneesh Kumar K.V
2009-05-14 13:14           ` Theodore Tso
2009-05-07 15:20 ` [PATCH 1/3] ext4: Properly initialize the buffer_head state Eric Sandeen
2009-05-10 23:57   ` Theodore Tso
2009-05-11  9:24     ` Aneesh Kumar K.V
2009-05-11 11:31       ` Theodore Tso
2009-05-11 14:49     ` Eric Sandeen
2009-05-12  3:17 ` Theodore Tso
2009-05-12  4:47   ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090514060015.GB7359@skywalker \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cmm@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).