linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents
@ 2009-04-29  4:47 Aneesh Kumar K.V
  2009-04-29  4:47 ` [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head Aneesh Kumar K.V
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2009-04-29  4:47 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V

We need to mark the  buffer_head mapping prealloc space
as new during write_begin. Otherwise we don't zero out the
page cache content properly for a partial write. This will
cause file corruption with preallocation.

Also use block number -1 as the fake block number so that
unmap_underlying_metadata doesn't drop wrong buffer_head

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

---
 fs/ext4/inode.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e91f978..12dcfab 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 		set_buffer_delay(bh_result);
 	} else if (ret > 0) {
 		bh_result->b_size = (ret << inode->i_blkbits);
+		/*
+		 * With sub-block writes into unwritten extents
+		 * we also need to mark the buffer as new so that
+		 * the unwritten parts of the buffer gets correctly zeroed.
+		 */
+		if (buffer_unwritten(bh_result)) {
+			bh_result->b_bdev = inode->i_sb->s_bdev;
+			set_buffer_new(bh_result);
+			bh_result->b_blocknr = -1;
+		}
 		ret = 0;
 	}
 
-- 
tg: (56a50ad..) preallocate_corruption_quickfix (depends on: master)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head
  2009-04-29  4:47 [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents Aneesh Kumar K.V
@ 2009-04-29  4:47 ` Aneesh Kumar K.V
  2009-04-29 13:59   ` Eric Sandeen
                     ` (2 more replies)
  2009-04-29 13:59 ` [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents Eric Sandeen
  2009-05-12  2:42 ` Theodore Tso
  2 siblings, 3 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2009-04-29  4:47 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V

Block number '0' should not be used as the fake block number for
the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
block number '0'. So  use -1 instead.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

---
 fs/ext4/inode.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 12dcfab..c7c2156 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2318,7 +2318,7 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 			/* not enough space to reserve */
 			return ret;
 
-		map_bh(bh_result, inode->i_sb, 0);
+		map_bh(bh_result, inode->i_sb, -1);
 		set_buffer_new(bh_result);
 		set_buffer_delay(bh_result);
 	} else if (ret > 0) {
-- 
tg: (15262e5..) proper_mappingblock_for_delay (depends on: preallocate_corruption_quickfix)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head
  2009-04-29  4:47 ` [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head Aneesh Kumar K.V
@ 2009-04-29 13:59   ` Eric Sandeen
  2009-04-29 15:35   ` Theodore Tso
  2009-05-12 15:17   ` [PATCH -V5] " Aneesh Kumar K.V
  2 siblings, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2009-04-29 13:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, tytso, linux-ext4

Aneesh Kumar K.V wrote:
> Block number '0' should not be used as the fake block number for
> the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
> block number '0'. So  use -1 instead.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> ---
>  fs/ext4/inode.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 12dcfab..c7c2156 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2318,7 +2318,7 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
>  			/* not enough space to reserve */
>  			return ret;
>  
> -		map_bh(bh_result, inode->i_sb, 0);
> +		map_bh(bh_result, inode->i_sb, -1);
>  		set_buffer_new(bh_result);
>  		set_buffer_delay(bh_result);
>  	} else if (ret > 0) {

this seems fine too, thanks.

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents
  2009-04-29  4:47 [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents Aneesh Kumar K.V
  2009-04-29  4:47 ` [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head Aneesh Kumar K.V
@ 2009-04-29 13:59 ` Eric Sandeen
  2009-04-29 17:28   ` Mingming
  2009-05-12  2:42 ` Theodore Tso
  2 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2009-04-29 13:59 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, tytso, linux-ext4

Aneesh Kumar K.V wrote:
> We need to mark the  buffer_head mapping prealloc space
> as new during write_begin. Otherwise we don't zero out the
> page cache content properly for a partial write. This will
> cause file corruption with preallocation.
> 
> Also use block number -1 as the fake block number so that
> unmap_underlying_metadata doesn't drop wrong buffer_head
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> ---
>  fs/ext4/inode.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e91f978..12dcfab 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
>  		set_buffer_delay(bh_result);
>  	} else if (ret > 0) {
>  		bh_result->b_size = (ret << inode->i_blkbits);
> +		/*
> +		 * With sub-block writes into unwritten extents
> +		 * we also need to mark the buffer as new so that
> +		 * the unwritten parts of the buffer gets correctly zeroed.
> +		 */
> +		if (buffer_unwritten(bh_result)) {
> +			bh_result->b_bdev = inode->i_sb->s_bdev;
> +			set_buffer_new(bh_result);
> +			bh_result->b_blocknr = -1;
> +		}
>  		ret = 0;
>  	}
>  

Ok, I guess this seems like the safest approach.  Long term we should
look really hard at the state & block nr of these buffer heads, but I
agree that keeping the changes restricted to the preallocation path for
now is safest.

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head
  2009-04-29  4:47 ` [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head Aneesh Kumar K.V
  2009-04-29 13:59   ` Eric Sandeen
@ 2009-04-29 15:35   ` Theodore Tso
  2009-04-29 15:37     ` Eric Sandeen
  2009-05-04  8:54     ` Aneesh Kumar K.V
  2009-05-12 15:17   ` [PATCH -V5] " Aneesh Kumar K.V
  2 siblings, 2 replies; 15+ messages in thread
From: Theodore Tso @ 2009-04-29 15:35 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, sandeen, linux-ext4

On Wed, Apr 29, 2009 at 10:17:21AM +0530, Aneesh Kumar K.V wrote:
> Block number '0' should not be used as the fake block number for
> the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
> block number '0'. So  use -1 instead.

sector_t is an unsigned type, so we probably want to use ~0 instead of
-1.  I can fix this up before we apply into the patch queue.

Are we agreed both of these should probably be pushed to Linus for
2.6.30?

					- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head
  2009-04-29 15:35   ` Theodore Tso
@ 2009-04-29 15:37     ` Eric Sandeen
  2009-04-29 16:52       ` Theodore Tso
  2009-05-04  8:54     ` Aneesh Kumar K.V
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2009-04-29 15:37 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Aneesh Kumar K.V, cmm, linux-ext4

Theodore Tso wrote:
> On Wed, Apr 29, 2009 at 10:17:21AM +0530, Aneesh Kumar K.V wrote:
>> Block number '0' should not be used as the fake block number for
>> the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
>> block number '0'. So  use -1 instead.
> 
> sector_t is an unsigned type, so we probably want to use ~0 instead of
> -1.  I can fix this up before we apply into the patch queue.

I don't think that helps.  The point is to have a block number which is
invalid, therefore won't get unmapped or accidentally written to ...

-Eric

> Are we agreed both of these should probably be pushed to Linus for
> 2.6.30?
> 
> 					- Ted


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head
  2009-04-29 15:37     ` Eric Sandeen
@ 2009-04-29 16:52       ` Theodore Tso
  2009-04-29 17:01         ` Eric Sandeen
  0 siblings, 1 reply; 15+ messages in thread
From: Theodore Tso @ 2009-04-29 16:52 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Aneesh Kumar K.V, cmm, linux-ext4

On Wed, Apr 29, 2009 at 10:37:34AM -0500, Eric Sandeen wrote:
> Theodore Tso wrote:
> > On Wed, Apr 29, 2009 at 10:17:21AM +0530, Aneesh Kumar K.V wrote:
> >> Block number '0' should not be used as the fake block number for
> >> the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
> >> block number '0'. So  use -1 instead.
> > 
> > sector_t is an unsigned type, so we probably want to use ~0 instead of
> > -1.  I can fix this up before we apply into the patch queue.
> 
> I don't think that helps.  The point is to have a block number which is
> invalid, therefore won't get unmapped or accidentally written to ...

This is more of a type-safety thing to eliminate compiler warnings.
We could use something like s_blocks_count instead, which has less
chance of wrapping, but by the time we get to the bh level, the risk
of wrapping should be minimal, and ~0 (or -1) is more distinctive when
debugging/tracing.

					- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head
  2009-04-29 16:52       ` Theodore Tso
@ 2009-04-29 17:01         ` Eric Sandeen
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2009-04-29 17:01 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Aneesh Kumar K.V, cmm, linux-ext4

Theodore Tso wrote:
> On Wed, Apr 29, 2009 at 10:37:34AM -0500, Eric Sandeen wrote:
>> Theodore Tso wrote:
>>> On Wed, Apr 29, 2009 at 10:17:21AM +0530, Aneesh Kumar K.V wrote:
>>>> Block number '0' should not be used as the fake block number for
>>>> the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
>>>> block number '0'. So  use -1 instead.
>>> sector_t is an unsigned type, so we probably want to use ~0 instead of
>>> -1.  I can fix this up before we apply into the patch queue.
>> I don't think that helps.  The point is to have a block number which is
>> invalid, therefore won't get unmapped or accidentally written to ...
> 
> This is more of a type-safety thing to eliminate compiler warnings.
> We could use something like s_blocks_count instead, which has less
> chance of wrapping, but by the time we get to the bh level, the risk
> of wrapping should be minimal, and ~0 (or -1) is more distinctive when
> debugging/tracing.

I'm sorry.  Poor choice of fonts, or something, I read "-0" not "~0" and
wondered what on earth you were thinking.  ;)

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents
  2009-04-29 13:59 ` [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents Eric Sandeen
@ 2009-04-29 17:28   ` Mingming
  0 siblings, 0 replies; 15+ messages in thread
From: Mingming @ 2009-04-29 17:28 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Aneesh Kumar K.V, tytso, linux-ext4


On Wed, 2009-04-29 at 08:59 -0500, Eric Sandeen wrote:
> Aneesh Kumar K.V wrote:
> > We need to mark the  buffer_head mapping prealloc space
> > as new during write_begin. Otherwise we don't zero out the
> > page cache content properly for a partial write. This will
> > cause file corruption with preallocation.
> > 
> > Also use block number -1 as the fake block number so that
> > unmap_underlying_metadata doesn't drop wrong buffer_head
> > 
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > ---
> >  fs/ext4/inode.c |   10 ++++++++++
> >  1 files changed, 10 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index e91f978..12dcfab 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
> >  		set_buffer_delay(bh_result);
> >  	} else if (ret > 0) {
> >  		bh_result->b_size = (ret << inode->i_blkbits);
> > +		/*
> > +		 * With sub-block writes into unwritten extents
> > +		 * we also need to mark the buffer as new so that
> > +		 * the unwritten parts of the buffer gets correctly zeroed.
> > +		 */
> > +		if (buffer_unwritten(bh_result)) {
> > +			bh_result->b_bdev = inode->i_sb->s_bdev;
> > +			set_buffer_new(bh_result);
> > +			bh_result->b_blocknr = -1;
> > +		}
> >  		ret = 0;
> >  	}
> >  
> 
> Ok, I guess this seems like the safest approach.  Long term we should
> look really hard at the state & block nr of these buffer heads, but I
> agree that keeping the changes restricted to the preallocation path for
> now is safest.
> 

This path (ret >0) this is the path where get_blocks() find the block
allocated or preallocated. The buffer_unwritten() is strict to the
preallocation case, but why not take care of the buffer_new() when we
set the buffer_unwritten() for preallocation  in ext4_ext_get_blocks()
at the first place? That makes the "preallocation" case handling there
all together. 

But both patch is correct, I have tested the prealloc,
prealloc->paritial write, prealloc->paritial long
write->partial-short-write, the content of the afterward read seems all
sane in both patch.

Any thoughts about the comments update I made in my previous patch? This
part of comment in preallocation  handling in ext4_ext_get_blocks()
needs some cleanup.


Think this over, if we set the buffer new here(i.e. in the write_begin()
path), I wonder about the read case: where do we set the buffer_new()
for the read on preallocated space? the ext4_ext_get_blocks() with
create = 0 on preallocated extent will return bh unwritten, but not new.
However my read tests right after new preallocation returns all zeroed
data. I wonder what I am missing.

Mingming
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head
  2009-04-29 15:35   ` Theodore Tso
  2009-04-29 15:37     ` Eric Sandeen
@ 2009-05-04  8:54     ` Aneesh Kumar K.V
  2009-05-04 15:06       ` Eric Sandeen
  1 sibling, 1 reply; 15+ messages in thread
From: Aneesh Kumar K.V @ 2009-05-04  8:54 UTC (permalink / raw)
  To: Theodore Tso; +Cc: cmm, sandeen, linux-ext4

On Wed, Apr 29, 2009 at 11:35:21AM -0400, Theodore Tso wrote:
> On Wed, Apr 29, 2009 at 10:17:21AM +0530, Aneesh Kumar K.V wrote:
> > Block number '0' should not be used as the fake block number for
> > the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
> > block number '0'. So  use -1 instead.
> 
> sector_t is an unsigned type, so we probably want to use ~0 instead of
> -1.  I can fix this up before we apply into the patch queue.
> 
> Are we agreed both of these should probably be pushed to Linus for
> 2.6.30?
> 

With ABAT I am seeing the below error during fsstress run.

EXT4-fs: mounted filesystem sdb1 with ordered data mode
attempt to access beyond end of device
sdb1: rw=1, want=0, limit=136713087
Buffer I/O error on device sdb1, logical block 18446744073709551615
lost page write due to I/O error on sdb1
attempt to access beyond end of device
sdb1: rw=1, want=0, limit=136713087
Buffer I/O error on device sdb1, logical block 18446744073709551615
lost page write due to I/O error on sdb1
attempt to access beyond end of device
sdb1: rw=1, want=0, limit=136713087
Buffer I/O error on device sdb1, logical block 18446744073709551615
lost page write due to I/O error on sdb1
attempt to access beyond end of device
sdb1: rw=1, want=0, limit=136713087


That is the logical block number ~0. Well that would imply we are trying
to write the buffer_head which is marked delay.


-aneesh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head
  2009-05-04  8:54     ` Aneesh Kumar K.V
@ 2009-05-04 15:06       ` Eric Sandeen
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2009-05-04 15:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Theodore Tso, cmm, linux-ext4

Aneesh Kumar K.V wrote:
> On Wed, Apr 29, 2009 at 11:35:21AM -0400, Theodore Tso wrote:
>> On Wed, Apr 29, 2009 at 10:17:21AM +0530, Aneesh Kumar K.V wrote:
>>> Block number '0' should not be used as the fake block number for
>>> the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
>>> block number '0'. So  use -1 instead.
>> sector_t is an unsigned type, so we probably want to use ~0 instead of
>> -1.  I can fix this up before we apply into the patch queue.
>>
>> Are we agreed both of these should probably be pushed to Linus for
>> 2.6.30?
>>
> 
> With ABAT I am seeing the below error during fsstress run.
> 
> EXT4-fs: mounted filesystem sdb1 with ordered data mode
> attempt to access beyond end of device
> sdb1: rw=1, want=0, limit=136713087
> Buffer I/O error on device sdb1, logical block 18446744073709551615

Ok, I think this is actually good.  Looks like we are leaking
uninitialized delalloc buffer heads... this may well explain some of the
corruptions we've seen.  So now .... what's going on ... :)

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents
  2009-04-29  4:47 [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents Aneesh Kumar K.V
  2009-04-29  4:47 ` [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head Aneesh Kumar K.V
  2009-04-29 13:59 ` [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents Eric Sandeen
@ 2009-05-12  2:42 ` Theodore Tso
  2009-05-12  3:37   ` Eric Sandeen
  2009-05-12 15:16   ` [PATCH -V5] Fix sub-block zeroing for buffered writes intounwritten extents Aneesh Kumar K.V
  2 siblings, 2 replies; 15+ messages in thread
From: Theodore Tso @ 2009-05-12  2:42 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, sandeen, linux-ext4

On Wed, Apr 29, 2009 at 10:17:20AM +0530, Aneesh Kumar K.V wrote:
> We need to mark the  buffer_head mapping prealloc space
> as new during write_begin. Otherwise we don't zero out the
> page cache content properly for a partial write. This will
> cause file corruption with preallocation.
> 
> Also use block number -1 as the fake block number so that
> unmap_underlying_metadata doesn't drop wrong buffer_head

The buffer_head code is starting to scare me more and more. 

I'm looking at this code again and I can't figure out why it's safe
(or why we would need to) put in an invalid number into
bh_result->b_blocknr:

> @@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
>  		set_buffer_delay(bh_result);
>  	} else if (ret > 0) {
>  		bh_result->b_size = (ret << inode->i_blkbits);
> +		/*
> +		 * With sub-block writes into unwritten extents
> +		 * we also need to mark the buffer as new so that
> +		 * the unwritten parts of the buffer gets correctly zeroed.
> +		 */
> +		if (buffer_unwritten(bh_result)) {
> +			bh_result->b_bdev = inode->i_sb->s_bdev;
> +			set_buffer_new(bh_result);
> +			bh_result->b_blocknr = -1;

Why do we need to avoid calling unmap_underlying_metadata()?

And after the buffer is zero'ed out, it leaves b_blocknr in a
buffer_head attached to the page at an invalid block number.  Doesn't
that get us in trouble later on?

I see that this line is removed later on in the for-2.6.31 patch "Mark
the unwritten buffer_head as mapped during write_begin".  But is it
safe for 2.6.30?

						- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents
  2009-05-12  2:42 ` Theodore Tso
@ 2009-05-12  3:37   ` Eric Sandeen
  2009-05-12 15:16   ` [PATCH -V5] Fix sub-block zeroing for buffered writes intounwritten extents Aneesh Kumar K.V
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2009-05-12  3:37 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Aneesh Kumar K.V, cmm, linux-ext4

Theodore Tso wrote:
> On Wed, Apr 29, 2009 at 10:17:20AM +0530, Aneesh Kumar K.V wrote:
>> We need to mark the  buffer_head mapping prealloc space
>> as new during write_begin. Otherwise we don't zero out the
>> page cache content properly for a partial write. This will
>> cause file corruption with preallocation.
>>
>> Also use block number -1 as the fake block number so that
>> unmap_underlying_metadata doesn't drop wrong buffer_head
> 
> The buffer_head code is starting to scare me more and more. 
> 
> I'm looking at this code again and I can't figure out why it's safe
> (or why we would need to) put in an invalid number into
> bh_result->b_blocknr:

I don't know for sure why it should be invalid; I think a preallocated
block, since it has an *actual* *block* *allocated* after all, should
have that block number.  But if it's going to be fake, let's not use a
"real" one like the superblock location...

A real block nr does eventually get assigned when we do getblock with
create=1 AFAICT.

>> @@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
>>  		set_buffer_delay(bh_result);
>>  	} else if (ret > 0) {
>>  		bh_result->b_size = (ret << inode->i_blkbits);
>> +		/*
>> +		 * With sub-block writes into unwritten extents
>> +		 * we also need to mark the buffer as new so that
>> +		 * the unwritten parts of the buffer gets correctly zeroed.
>> +		 */
>> +		if (buffer_unwritten(bh_result)) {
>> +			bh_result->b_bdev = inode->i_sb->s_bdev;
>> +			set_buffer_new(bh_result);
>> +			bh_result->b_blocknr = -1;
> 
> Why do we need to avoid calling unmap_underlying_metadata()?

For that matter, why do we call unmap_underlying_metadata at all, ever?

> And after the buffer is zero'ed out, it leaves b_blocknr in a
> buffer_head attached to the page at an invalid block number.  Doesn't
> that get us in trouble later on?
> 
> I see that this line is removed later on in the for-2.6.31 patch "Mark
> the unwritten buffer_head as mapped during write_begin".  But is it
> safe for 2.6.30?

I have this in F11 now, but it's giving me the heebie-jeebies still.  At
least it's confined to preallocation (one of the great new ext4 features
I've been promoting recently... :)

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH -V5] Fix sub-block zeroing for buffered writes intounwritten extents
  2009-05-12  2:42 ` Theodore Tso
  2009-05-12  3:37   ` Eric Sandeen
@ 2009-05-12 15:16   ` Aneesh Kumar K.V
  1 sibling, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2009-05-12 15:16 UTC (permalink / raw)
  To: Theodore Tso; +Cc: cmm, sandeen, linux-ext4


ext3: Fix sub-block zeroing for writes into preallocated extents

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

We need to mark the buffer_head mapping preallocated space as new
during write_begin. Otherwise we don't zero out the page cache content
properly for a partial write. This will cause file corruption with
preallocation.

Now that we mark the buffer_head new we also need to have a valid
buffer_head blocknr so that unmap_underlying_metadata unmap the
right block.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/extents.c |    2 ++
 fs/ext4/inode.c   |    8 ++++++++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e403321..c3768cd 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2875,6 +2875,8 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
 				if (allocated > max_blocks)
 					allocated = max_blocks;
 				set_buffer_unwritten(bh_result);
+				bh_result->b_bdev    = inode->i_sb->s_bdev;
+				bh_result->b_blocknr = newblock;
 				goto out2;
 			}
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e91f978..498cf8b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2323,6 +2323,14 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 		set_buffer_delay(bh_result);
 	} else if (ret > 0) {
 		bh_result->b_size = (ret << inode->i_blkbits);
+		/*
+		 * With sub-block writes into unwritten extents
+		 * we also need to mark the buffer as new so that
+		 * the unwritten parts of the buffer gets correctly zeroed.
+		 */
+		if (buffer_unwritten(bh_result)) {
+			set_buffer_new(bh_result);
+		}
 		ret = 0;
 	}
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH -V5] ext4: Use -1 as the fake block number for delayed new buffer_head
  2009-04-29  4:47 ` [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head Aneesh Kumar K.V
  2009-04-29 13:59   ` Eric Sandeen
  2009-04-29 15:35   ` Theodore Tso
@ 2009-05-12 15:17   ` Aneesh Kumar K.V
  2 siblings, 0 replies; 15+ messages in thread
From: Aneesh Kumar K.V @ 2009-05-12 15:17 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4


ext4: Use a fake block number for delayed new buffer_head

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Use a very large unsigned number (~0xffff) as as the fake block number
for the delayed new buffer. The VFS should never try to write out this
number, but if it does, this will make it obvious.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/inode.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 498cf8b..4b1478c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2297,6 +2297,10 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 				  struct buffer_head *bh_result, int create)
 {
 	int ret = 0;
+	sector_t invalid_block = ~((sector_t) 0xffff);
+
+	if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
+		invalid_block = ~0;
 
 	BUG_ON(create == 0);
 	BUG_ON(bh_result->b_size != inode->i_sb->s_blocksize);
@@ -2318,7 +2322,7 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 			/* not enough space to reserve */
 			return ret;
 
-		map_bh(bh_result, inode->i_sb, 0);
+		map_bh(bh_result, inode->i_sb, invalid_block);
 		set_buffer_new(bh_result);
 		set_buffer_delay(bh_result);
 	} else if (ret > 0) {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-05-12 15:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-29  4:47 [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents Aneesh Kumar K.V
2009-04-29  4:47 ` [PATCH -V4 2/2] ext4: Use -1 as the fake block number for delayed new buffer_head Aneesh Kumar K.V
2009-04-29 13:59   ` Eric Sandeen
2009-04-29 15:35   ` Theodore Tso
2009-04-29 15:37     ` Eric Sandeen
2009-04-29 16:52       ` Theodore Tso
2009-04-29 17:01         ` Eric Sandeen
2009-05-04  8:54     ` Aneesh Kumar K.V
2009-05-04 15:06       ` Eric Sandeen
2009-05-12 15:17   ` [PATCH -V5] " Aneesh Kumar K.V
2009-04-29 13:59 ` [PATCH -V4 1/2] Fix sub-block zeroing for buffered writes into unwritten extents Eric Sandeen
2009-04-29 17:28   ` Mingming
2009-05-12  2:42 ` Theodore Tso
2009-05-12  3:37   ` Eric Sandeen
2009-05-12 15:16   ` [PATCH -V5] Fix sub-block zeroing for buffered writes intounwritten extents Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).