linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix buffer dirtying in data=journal mode
@ 2010-06-21 10:42 Jan Kara
  2010-06-21 10:42 ` [PATCH 1/2] ext3: " Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jan Kara @ 2010-06-21 10:42 UTC (permalink / raw)
  To: linux-ext4


  Hi,

  two patches below fix warnings and possible BUGs in ext3/ext4 when running
in data=journal mode (see https://bugzilla.kernel.org/show_bug.cgi?id=14156).
Admittedly, the workaround I've chosen is kind of ugly but reimplementing
block_prepare_write() just for the data=journal mode seems even uglier to me.
If anybody has better solution, I'd gladly hear it.

								Honza

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] ext3: Fix buffer dirtying in data=journal mode
  2010-06-21 10:42 [PATCH 0/2] Fix buffer dirtying in data=journal mode Jan Kara
@ 2010-06-21 10:42 ` Jan Kara
  2010-07-19 18:02   ` Josef Bacik
  2010-06-21 10:42 ` [PATCH 2/2] ext4: " Jan Kara
  2010-07-12 17:38 ` [PATCH 0/2] " Jan Kara
  2 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2010-06-21 10:42 UTC (permalink / raw)
  To: linux-ext4; +Cc: Jan Kara

block_prepare_write() can dirty freshly created buffer. This is a problem
for data=journal mode because data buffers shouldn't be dirty unless they
are undergoing checkpoint. So we have to tweak get_block function for
data=journal mode to catch the case when block_prepare_write would dirty
the buffer, do the work instead of block_prepare_write, and properly handle
dirty buffer as data=journal mode requires it.

It might be cleaner to avoid using block_prepare_write() for data=journal
mode writes but that would require us to duplicate most of the function
which isn't nice either...

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext3/inode.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index ea33bdf..2b61cc4 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -993,6 +993,43 @@ out:
 	return ret;
 }
 
+static int ext3_journalled_get_block(struct inode *inode, sector_t iblock,
+				     struct buffer_head *bh, int create)
+{
+	handle_t *handle = ext3_journal_current_handle();
+	int ret;
+
+	/* This function should ever be used only for real buffers */
+	BUG_ON(!bh->b_page);
+
+	ret = ext3_get_blocks_handle(handle, inode, iblock, 1, bh, create);
+	if (ret > 0) {
+		if (buffer_new(bh)) {
+			struct page *page = bh->b_page;
+
+			/*
+			 * This is a terrible hack to avoid block_prepare_write
+			 * marking our buffer as dirty
+			 */
+			if (PageUptodate(page)) {
+				ret = ext3_journal_get_write_access(handle, bh);
+				if (ret < 0)
+					goto out;
+				unmap_underlying_metadata(bh->b_bdev,
+							  bh->b_blocknr);
+				clear_buffer_new(bh);
+				set_buffer_uptodate(bh);
+				ret = ext3_journal_dirty_metadata(handle, bh);
+				if (ret < 0)
+					goto out;
+			}
+		}
+		ret = 0;
+	}
+out:
+	return ret;
+}
+
 int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		u64 start, u64 len)
 {
@@ -1196,15 +1233,18 @@ retry:
 		ret = PTR_ERR(handle);
 		goto out;
 	}
-	ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
-							ext3_get_block);
-	if (ret)
-		goto write_begin_failed;
-
 	if (ext3_should_journal_data(inode)) {
-		ret = walk_page_buffers(handle, page_buffers(page),
-				from, to, NULL, do_journal_get_write_access);
+		ret = block_write_begin(file, mapping, pos, len, flags, pagep,
+					fsdata, ext3_journalled_get_block);
+		if (ret)
+			goto write_begin_failed;
+		ret = walk_page_buffers(handle, page_buffers(page), from, to,
+					NULL, do_journal_get_write_access);
+	} else {
+		ret = block_write_begin(file, mapping, pos, len, flags, pagep,
+					fsdata, ext3_get_block);
 	}
+
 write_begin_failed:
 	if (ret) {
 		/*
@@ -1668,7 +1708,7 @@ static int ext3_journalled_writepage(struct page *page,
 		 */
 		ClearPageChecked(page);
 		ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
-					ext3_get_block);
+					ext3_journalled_get_block);
 		if (ret != 0) {
 			ext3_journal_stop(handle);
 			goto out_unlock;
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] ext4: Fix buffer dirtying in data=journal mode
  2010-06-21 10:42 [PATCH 0/2] Fix buffer dirtying in data=journal mode Jan Kara
  2010-06-21 10:42 ` [PATCH 1/2] ext3: " Jan Kara
@ 2010-06-21 10:42 ` Jan Kara
  2010-07-21 14:32   ` Jan Kara
  2010-07-12 17:38 ` [PATCH 0/2] " Jan Kara
  2 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2010-06-21 10:42 UTC (permalink / raw)
  To: linux-ext4; +Cc: Jan Kara, tytso

block_prepare_write() can dirty freshly created buffer. This is a problem
for data=journal mode because data buffers shouldn't be dirty unless they
are undergoing checkpoint. So we have to tweak get_block function for
data=journal mode to catch the case when block_prepare_write would dirty
the buffer, do the work instead of block_prepare_write, and properly handle
dirty buffer as data=journal mode requires it.

It might be cleaner to avoid using block_prepare_write() for data=journal
mode writes but that would require us to duplicate most of the function
which isn't nice either...

CC: tytso@mit.edu
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |   47 +++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 81d6054..526404b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1388,6 +1388,43 @@ out:
 	return ret;
 }
 
+static int ext4_journalled_get_block(struct inode *inode, sector_t iblock,
+				     struct buffer_head *bh, int create)
+{
+	handle_t *handle = ext4_journal_current_handle();
+	int ret;
+
+	/* This function should ever be used only for real buffers */
+	BUG_ON(!bh->b_page);
+
+	ret = ext4_get_blocks(handle, inode, iblock, 1, bh,
+			      create ? EXT4_GET_BLOCKS_CREATE : 0);
+	if (ret > 0) {
+		if (buffer_new(bh)) {
+			struct page *page = bh->b_page;
+
+			/*
+			 * This is a terrible hack to avoid block_prepare_write
+			 * marking our buffer as dirty
+			 */
+			if (PageUptodate(page)) {
+				ret = ext4_journal_get_write_access(handle, bh);
+				if (ret < 0)
+					return ret;
+				unmap_underlying_metadata(bh->b_bdev,
+							  bh->b_blocknr);
+				clear_buffer_new(bh);
+				set_buffer_uptodate(bh);
+				ret = jbd2_journal_dirty_metadata(handle, bh);
+				if (ret < 0)
+					return ret;
+			}
+		}
+		ret = 0;
+	}
+	return ret;
+}
+
 /*
  * `handle' can be NULL if create is zero
  */
@@ -1599,12 +1636,14 @@ retry:
 	if (ext4_should_dioread_nolock(inode))
 		ret = block_write_begin(file, mapping, pos, len, flags, pagep,
 				fsdata, ext4_get_block_write);
-	else
+	else if (!ext4_should_journal_data(inode))
 		ret = block_write_begin(file, mapping, pos, len, flags, pagep,
 				fsdata, ext4_get_block);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] Fix buffer dirtying in data=journal mode
  2010-06-21 10:42 [PATCH 0/2] Fix buffer dirtying in data=journal mode Jan Kara
  2010-06-21 10:42 ` [PATCH 1/2] ext3: " Jan Kara
  2010-06-21 10:42 ` [PATCH 2/2] ext4: " Jan Kara
@ 2010-07-12 17:38 ` Jan Kara
  2 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2010-07-12 17:38 UTC (permalink / raw)
  To: linux-ext4

  Hi,

>   two patches below fix warnings and possible BUGs in ext3/ext4 when running
> in data=journal mode (see https://bugzilla.kernel.org/show_bug.cgi?id=14156).
> Admittedly, the workaround I've chosen is kind of ugly but reimplementing
> block_prepare_write() just for the data=journal mode seems even uglier to me.
> If anybody has better solution, I'd gladly hear it.
  Ted, could you have a look at the ext4 patch and merge it if you're fine
with it? Thanks.

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ext3: Fix buffer dirtying in data=journal mode
  2010-06-21 10:42 ` [PATCH 1/2] ext3: " Jan Kara
@ 2010-07-19 18:02   ` Josef Bacik
  2010-07-19 18:41     ` Josef Bacik
  0 siblings, 1 reply; 7+ messages in thread
From: Josef Bacik @ 2010-07-19 18:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Mon, Jun 21, 2010 at 12:42:52PM +0200, Jan Kara wrote:
> block_prepare_write() can dirty freshly created buffer. This is a problem
> for data=journal mode because data buffers shouldn't be dirty unless they
> are undergoing checkpoint. So we have to tweak get_block function for
> data=journal mode to catch the case when block_prepare_write would dirty
> the buffer, do the work instead of block_prepare_write, and properly handle
> dirty buffer as data=journal mode requires it.
> 
> It might be cleaner to avoid using block_prepare_write() for data=journal
> mode writes but that would require us to duplicate most of the function
> which isn't nice either...
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext3/inode.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 files changed, 48 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
> index ea33bdf..2b61cc4 100644
> --- a/fs/ext3/inode.c
> +++ b/fs/ext3/inode.c
> @@ -993,6 +993,43 @@ out:
>  	return ret;
>  }
>  
> +static int ext3_journalled_get_block(struct inode *inode, sector_t iblock,
> +				     struct buffer_head *bh, int create)
> +{
> +	handle_t *handle = ext3_journal_current_handle();
> +	int ret;
> +
> +	/* This function should ever be used only for real buffers */
> +	BUG_ON(!bh->b_page);
> +
> +	ret = ext3_get_blocks_handle(handle, inode, iblock, 1, bh, create);
> +	if (ret > 0) {
> +		if (buffer_new(bh)) {
> +			struct page *page = bh->b_page;
> +
> +			/*
> +			 * This is a terrible hack to avoid block_prepare_write
> +			 * marking our buffer as dirty
> +			 */
> +			if (PageUptodate(page)) {
> +				ret = ext3_journal_get_write_access(handle, bh);
> +				if (ret < 0)
> +					goto out;
> +				unmap_underlying_metadata(bh->b_bdev,
> +							  bh->b_blocknr);
> +				clear_buffer_new(bh);
> +				set_buffer_uptodate(bh);
> +				ret = ext3_journal_dirty_metadata(handle, bh);
> +				if (ret < 0)
> +					goto out;
> +			}
> +		}

Hey Jan,

It looks like in __block_prepare_write we zero out the end of the page if we're
not writing to the entire block, but you short-circuit this behavior with this
get_block.  So it's possible that if we only write to half of the block, the
last half is going to have whatever stale data was in there before, right?
Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ext3: Fix buffer dirtying in data=journal mode
  2010-07-19 18:02   ` Josef Bacik
@ 2010-07-19 18:41     ` Josef Bacik
  0 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2010-07-19 18:41 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Jan Kara, linux-ext4

On Mon, Jul 19, 2010 at 02:02:22PM -0400, Josef Bacik wrote:
> On Mon, Jun 21, 2010 at 12:42:52PM +0200, Jan Kara wrote:
> > block_prepare_write() can dirty freshly created buffer. This is a problem
> > for data=journal mode because data buffers shouldn't be dirty unless they
> > are undergoing checkpoint. So we have to tweak get_block function for
> > data=journal mode to catch the case when block_prepare_write would dirty
> > the buffer, do the work instead of block_prepare_write, and properly handle
> > dirty buffer as data=journal mode requires it.
> > 
> > It might be cleaner to avoid using block_prepare_write() for data=journal
> > mode writes but that would require us to duplicate most of the function
> > which isn't nice either...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext3/inode.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++-------
> >  1 files changed, 48 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
> > index ea33bdf..2b61cc4 100644
> > --- a/fs/ext3/inode.c
> > +++ b/fs/ext3/inode.c
> > @@ -993,6 +993,43 @@ out:
> >  	return ret;
> >  }
> >  
> > +static int ext3_journalled_get_block(struct inode *inode, sector_t iblock,
> > +				     struct buffer_head *bh, int create)
> > +{
> > +	handle_t *handle = ext3_journal_current_handle();
> > +	int ret;
> > +
> > +	/* This function should ever be used only for real buffers */
> > +	BUG_ON(!bh->b_page);
> > +
> > +	ret = ext3_get_blocks_handle(handle, inode, iblock, 1, bh, create);
> > +	if (ret > 0) {
> > +		if (buffer_new(bh)) {
> > +			struct page *page = bh->b_page;
> > +
> > +			/*
> > +			 * This is a terrible hack to avoid block_prepare_write
> > +			 * marking our buffer as dirty
> > +			 */
> > +			if (PageUptodate(page)) {
> > +				ret = ext3_journal_get_write_access(handle, bh);
> > +				if (ret < 0)
> > +					goto out;
> > +				unmap_underlying_metadata(bh->b_bdev,
> > +							  bh->b_blocknr);
> > +				clear_buffer_new(bh);
> > +				set_buffer_uptodate(bh);
> > +				ret = ext3_journal_dirty_metadata(handle, bh);
> > +				if (ret < 0)
> > +					goto out;
> > +			}
> > +		}
> 
> Hey Jan,
> 
> It looks like in __block_prepare_write we zero out the end of the page if we're
> not writing to the entire block, but you short-circuit this behavior with this
> get_block.  So it's possible that if we only write to half of the block, the
> last half is going to have whatever stale data was in there before, right?
> Thanks,
> 

Oops, ignore me, nothing changes for the !PageUptodate() case, which is where
the page zero'ing part happens.  Carry on :),

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] ext4: Fix buffer dirtying in data=journal mode
  2010-06-21 10:42 ` [PATCH 2/2] ext4: " Jan Kara
@ 2010-07-21 14:32   ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2010-07-21 14:32 UTC (permalink / raw)
  To: tytso; +Cc: Jan Kara, linux-ext4

> block_prepare_write() can dirty freshly created buffer. This is a problem
> for data=journal mode because data buffers shouldn't be dirty unless they
> are undergoing checkpoint. So we have to tweak get_block function for
> data=journal mode to catch the case when block_prepare_write would dirty
> the buffer, do the work instead of block_prepare_write, and properly handle
> dirty buffer as data=journal mode requires it.
> 
> It might be cleaner to avoid using block_prepare_write() for data=journal
> mode writes but that would require us to duplicate most of the function
> which isn't nice either...
  Ted, how about this patch? Do you agree with it? Thanks.

								Honza

> CC: tytso@mit.edu
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/inode.c |   47 +++++++++++++++++++++++++++++++++++++++++++----
>  1 files changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 81d6054..526404b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1388,6 +1388,43 @@ out:
>  	return ret;
>  }
>  
> +static int ext4_journalled_get_block(struct inode *inode, sector_t iblock,
> +				     struct buffer_head *bh, int create)
> +{
> +	handle_t *handle = ext4_journal_current_handle();
> +	int ret;
> +
> +	/* This function should ever be used only for real buffers */
> +	BUG_ON(!bh->b_page);
> +
> +	ret = ext4_get_blocks(handle, inode, iblock, 1, bh,
> +			      create ? EXT4_GET_BLOCKS_CREATE : 0);
> +	if (ret > 0) {
> +		if (buffer_new(bh)) {
> +			struct page *page = bh->b_page;
> +
> +			/*
> +			 * This is a terrible hack to avoid block_prepare_write
> +			 * marking our buffer as dirty
> +			 */
> +			if (PageUptodate(page)) {
> +				ret = ext4_journal_get_write_access(handle, bh);
> +				if (ret < 0)
> +					return ret;
> +				unmap_underlying_metadata(bh->b_bdev,
> +							  bh->b_blocknr);
> +				clear_buffer_new(bh);
> +				set_buffer_uptodate(bh);
> +				ret = jbd2_journal_dirty_metadata(handle, bh);
> +				if (ret < 0)
> +					return ret;
> +			}
> +		}
> +		ret = 0;
> +	}
> +	return ret;
> +}
> +
>  /*
>   * `handle' can be NULL if create is zero
>   */
> @@ -1599,12 +1636,14 @@ retry:
>  	if (ext4_should_dioread_nolock(inode))
>  		ret = block_write_begin(file, mapping, pos, len, flags, pagep,
>  				fsdata, ext4_get_block_write);
> -	else
> +	else if (!ext4_should_journal_data(inode))
>  		ret = block_write_begin(file, mapping, pos, len, flags, pagep,
>  				fsdata, ext4_get_block);
> -
> -	if (!ret && ext4_should_journal_data(inode)) {
> -		ret = walk_page_buffers(handle, page_buffers(page),
> +	else {
> +		ret = block_write_begin(file, mapping, pos, len, flags, pagep,
> +				fsdata, ext4_journalled_get_block);
> +		if (!ret)
> +			ret = walk_page_buffers(handle, page_buffers(page),
>  				from, to, NULL, do_journal_get_write_access);
>  	}
>  
> -- 
> 1.6.4.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-21 14:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-21 10:42 [PATCH 0/2] Fix buffer dirtying in data=journal mode Jan Kara
2010-06-21 10:42 ` [PATCH 1/2] ext3: " Jan Kara
2010-07-19 18:02   ` Josef Bacik
2010-07-19 18:41     ` Josef Bacik
2010-06-21 10:42 ` [PATCH 2/2] ext4: " Jan Kara
2010-07-21 14:32   ` Jan Kara
2010-07-12 17:38 ` [PATCH 0/2] " Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).