linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ext4: fix reservation release on invalidatepage for delalloc fs
@ 2015-06-04  8:25 Lukas Czerner
  2015-06-04  9:00 ` Jan Kara
  2015-06-08 15:16 ` Theodore Ts'o
  0 siblings, 2 replies; 6+ messages in thread
From: Lukas Czerner @ 2015-06-04  8:25 UTC (permalink / raw)
  To: linux-ext4; +Cc: Lukas Czerner

On delalloc enabled file system on invalidatepage operation
in ext4_da_page_release_reservation() we want to clear the delayed
buffer and remove the extent covering the delayed buffer from the extent
status tree.

However currently there is a bug where on the systems with page size >
block size we will always remove extents from the start of the page
regardless where the actual delayed buffers are positioned in the page.
This leads to the errors like this:

EXT4-fs warning (device loop0): ext4_da_release_space:1225:
ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
blocks

This however can cause data loss on writeback time if the file system is
in ENOSPC condition because we're releasing reservation for someones
else delayed buffer.

Fix this by only removing extents that corresponds to the part of the
page we want to invalidate.

This problem is reproducible by the following fio receipt (however I was
only able to reproduce it with fio-2.1 or older.

[global]
bs=8k
iodepth=1024
iodepth_batch=60
randrepeat=1
size=1m
directory=/mnt/test
numjobs=20
[job1]
ioengine=sync
bs=1k
direct=1
rw=randread
filename=file1:file2
[job2]
ioengine=libaio
rw=randwrite
direct=1
filename=file1:file2
[job3]
bs=1k
ioengine=posixaio
rw=randwrite
direct=1
filename=file1:file2
[job5]
bs=1k
ioengine=sync
rw=randread
filename=file1:file2
[job7]
ioengine=libaio
rw=randwrite
filename=file1:file2
[job8]
ioengine=posixaio
rw=randwrite
filename=file1:file2
[job10]
ioengine=mmap
rw=randwrite
bs=1k
filename=file1:file2
[job11]
ioengine=mmap
rw=randwrite
direct=1
filename=file1:file2

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/inode.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0554b0b..46f4a49 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1342,7 +1342,7 @@ static void ext4_da_page_release_reservation(struct page *page,
 					     unsigned int offset,
 					     unsigned int length)
 {
-	int to_release = 0;
+	int to_release = 0, contiguous_blks = 0;
 	struct buffer_head *head, *bh;
 	unsigned int curr_off = 0;
 	struct inode *inode = page->mapping->host;
@@ -1363,14 +1363,23 @@ static void ext4_da_page_release_reservation(struct page *page,
 
 		if ((offset <= curr_off) && (buffer_delay(bh))) {
 			to_release++;
+			contiguous_blks++;
 			clear_buffer_delay(bh);
+		} else if (contiguous_blks) {
+			lblk = page->index <<
+			       (PAGE_CACHE_SHIFT - inode->i_blkbits);
+			lblk += (curr_off >> inode->i_blkbits) -
+				contiguous_blks;
+			ext4_es_remove_extent(inode, lblk, contiguous_blks);
+			contiguous_blks = 0;
 		}
 		curr_off = next_off;
 	} while ((bh = bh->b_this_page) != head);
 
-	if (to_release) {
+	if (contiguous_blks) {
 		lblk = page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
-		ext4_es_remove_extent(inode, lblk, to_release);
+		lblk += (curr_off >> inode->i_blkbits) - contiguous_blks;
+		ext4_es_remove_extent(inode, lblk, contiguous_blks);
 	}
 
 	/* If we have released all the blocks belonging to a cluster, then we
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: fix reservation release on invalidatepage for delalloc fs
  2015-06-04  8:25 [PATCH] ext4: fix reservation release on invalidatepage for delalloc fs Lukas Czerner
@ 2015-06-04  9:00 ` Jan Kara
  2015-06-08 15:16 ` Theodore Ts'o
  1 sibling, 0 replies; 6+ messages in thread
From: Jan Kara @ 2015-06-04  9:00 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4

On Thu 04-06-15 10:25:01, Lukas Czerner wrote:
> On delalloc enabled file system on invalidatepage operation
> in ext4_da_page_release_reservation() we want to clear the delayed
> buffer and remove the extent covering the delayed buffer from the extent
> status tree.
> 
> However currently there is a bug where on the systems with page size >
> block size we will always remove extents from the start of the page
> regardless where the actual delayed buffers are positioned in the page.
> This leads to the errors like this:
> 
> EXT4-fs warning (device loop0): ext4_da_release_space:1225:
> ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
> blocks
> 
> This however can cause data loss on writeback time if the file system is
> in ENOSPC condition because we're releasing reservation for someones
> else delayed buffer.
> 
> Fix this by only removing extents that corresponds to the part of the
> page we want to invalidate.
> 
> This problem is reproducible by the following fio receipt (however I was
> only able to reproduce it with fio-2.1 or older.
> 
> [global]
> bs=8k
> iodepth=1024
> iodepth_batch=60
> randrepeat=1
> size=1m
> directory=/mnt/test
> numjobs=20
> [job1]
> ioengine=sync
> bs=1k
> direct=1
> rw=randread
> filename=file1:file2
> [job2]
> ioengine=libaio
> rw=randwrite
> direct=1
> filename=file1:file2
> [job3]
> bs=1k
> ioengine=posixaio
> rw=randwrite
> direct=1
> filename=file1:file2
> [job5]
> bs=1k
> ioengine=sync
> rw=randread
> filename=file1:file2
> [job7]
> ioengine=libaio
> rw=randwrite
> filename=file1:file2
> [job8]
> ioengine=posixaio
> rw=randwrite
> filename=file1:file2
> [job10]
> ioengine=mmap
> rw=randwrite
> bs=1k
> filename=file1:file2
> [job11]
> ioengine=mmap
> rw=randwrite
> direct=1
> filename=file1:file2
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 0554b0b..46f4a49 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1342,7 +1342,7 @@ static void ext4_da_page_release_reservation(struct page *page,
>  					     unsigned int offset,
>  					     unsigned int length)
>  {
> -	int to_release = 0;
> +	int to_release = 0, contiguous_blks = 0;
>  	struct buffer_head *head, *bh;
>  	unsigned int curr_off = 0;
>  	struct inode *inode = page->mapping->host;
> @@ -1363,14 +1363,23 @@ static void ext4_da_page_release_reservation(struct page *page,
>  
>  		if ((offset <= curr_off) && (buffer_delay(bh))) {
>  			to_release++;
> +			contiguous_blks++;
>  			clear_buffer_delay(bh);
> +		} else if (contiguous_blks) {
> +			lblk = page->index <<
> +			       (PAGE_CACHE_SHIFT - inode->i_blkbits);
> +			lblk += (curr_off >> inode->i_blkbits) -
> +				contiguous_blks;
> +			ext4_es_remove_extent(inode, lblk, contiguous_blks);
> +			contiguous_blks = 0;
>  		}
>  		curr_off = next_off;
>  	} while ((bh = bh->b_this_page) != head);
>  
> -	if (to_release) {
> +	if (contiguous_blks) {
>  		lblk = page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
> -		ext4_es_remove_extent(inode, lblk, to_release);
> +		lblk += (curr_off >> inode->i_blkbits) - contiguous_blks;
> +		ext4_es_remove_extent(inode, lblk, contiguous_blks);
>  	}
>  
>  	/* If we have released all the blocks belonging to a cluster, then we
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: fix reservation release on invalidatepage for delalloc fs
  2015-06-04  8:25 [PATCH] ext4: fix reservation release on invalidatepage for delalloc fs Lukas Czerner
  2015-06-04  9:00 ` Jan Kara
@ 2015-06-08 15:16 ` Theodore Ts'o
  2015-06-08 15:45   ` Lukáš Czerner
  1 sibling, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2015-06-08 15:16 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4

On Thu, Jun 04, 2015 at 10:25:01AM +0200, Lukas Czerner wrote:
> On delalloc enabled file system on invalidatepage operation
> in ext4_da_page_release_reservation() we want to clear the delayed
> buffer and remove the extent covering the delayed buffer from the extent
> status tree.
> 
> However currently there is a bug where on the systems with page size >
> block size we will always remove extents from the start of the page
> regardless where the actual delayed buffers are positioned in the page.

Right, because we end up screwing up the accounting.

> @@ -1363,14 +1363,23 @@ static void ext4_da_page_release_reservation(struct page *page,
>  
>  		if ((offset <= curr_off) && (buffer_delay(bh))) {
>  			to_release++;
> +			contiguous_blks++;
>  			clear_buffer_delay(bh);
> +		} else if (contiguous_blks) {
> +			lblk = page->index <<
> +			       (PAGE_CACHE_SHIFT - inode->i_blkbits);
> +			lblk += (curr_off >> inode->i_blkbits) -
> +				contiguous_blks;
> +			ext4_es_remove_extent(inode, lblk, contiguous_blks);
> +			contiguous_blks = 0;
>  		}
>  		curr_off = next_off;
>  	} while ((bh = bh->b_this_page) != head);

Shouldn't we call ext4_es_remove_extent() on the portion of the page
containing the delayed allocation region, before we clear
contiguous_blks and resetting lblk?

For example, suppose we had the 4k page with a 1k block size, where
the first, second, and fourth blocks are delayed allocated.  With this
patch we will end up only clearing the extent status tree for the
fourth block, but not the first and second.

       	      	      	  	    - Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: fix reservation release on invalidatepage for delalloc fs
  2015-06-08 15:16 ` Theodore Ts'o
@ 2015-06-08 15:45   ` Lukáš Czerner
  2015-06-19  7:53     ` Lukáš Czerner
  0 siblings, 1 reply; 6+ messages in thread
From: Lukáš Czerner @ 2015-06-08 15:45 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

On Mon, 8 Jun 2015, Theodore Ts'o wrote:

> Date: Mon, 8 Jun 2015 11:16:51 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukas Czerner <lczerner@redhat.com>
> Cc: linux-ext4@vger.kernel.org
> Subject: Re: [PATCH] ext4: fix reservation release on invalidatepage for
>     delalloc fs
> 
> On Thu, Jun 04, 2015 at 10:25:01AM +0200, Lukas Czerner wrote:
> > On delalloc enabled file system on invalidatepage operation
> > in ext4_da_page_release_reservation() we want to clear the delayed
> > buffer and remove the extent covering the delayed buffer from the extent
> > status tree.
> > 
> > However currently there is a bug where on the systems with page size >
> > block size we will always remove extents from the start of the page
> > regardless where the actual delayed buffers are positioned in the page.
> 
> Right, because we end up screwing up the accounting.
> 
> > @@ -1363,14 +1363,23 @@ static void ext4_da_page_release_reservation(struct page *page,
> >  
> >  		if ((offset <= curr_off) && (buffer_delay(bh))) {
> >  			to_release++;
> > +			contiguous_blks++;
> >  			clear_buffer_delay(bh);
> > +		} else if (contiguous_blks) {
> > +			lblk = page->index <<
> > +			       (PAGE_CACHE_SHIFT - inode->i_blkbits);
> > +			lblk += (curr_off >> inode->i_blkbits) -
> > +				contiguous_blks;
> > +			ext4_es_remove_extent(inode, lblk, contiguous_blks);
> > +			contiguous_blks = 0;
> >  		}
> >  		curr_off = next_off;
> >  	} while ((bh = bh->b_this_page) != head);
> 
> Shouldn't we call ext4_es_remove_extent() on the portion of the page
> containing the delayed allocation region, before we clear
> contiguous_blks and resetting lblk?

Hi Ted,

right this is the point of the patch.

> 
> For example, suppose we had the 4k page with a 1k block size, where
> the first, second, and fourth blocks are delayed allocated.  With this
> patch we will end up only clearing the extent status tree for the
> fourth block, but not the first and second.

So when the first and second block are delayed, then the

if ((offset <= curr_off) && (buffer_delay(bh)))

will hit twice which means that we'll have contiguous_blks = 2

Now on the third block this condition will no longer be true
(because buffer_delay(bh) will be false) and so we will hit

else if (contiguous_blks) {

then lblk will be: start of the page + (curr_off - contiguous_blks).
curr_off at this point will point at third block (index 2) and
contiguous_blks is 2. Which means that lblk will point at the start
of the page - which is exactly right because the first delayed block
is at the start of the page.

So ext4_es_remove_extent() will remove extent of two blocks starting
from the end of the page - which means it removes first and second
delayed block.

Now when we check fourth block the

if ((offset <= curr_off) && (buffer_delay(bh)))

will hit again, leaving contiguous_blks with 1, then we leave the
while cycle and hit this:

if (contiguous_blks)

removing the extent starting at fourth block in the page removing
one block (the fourth block in the page).

That's how I wrote the code, but maybe I am missing something ? I am
a bit tired today already so my explanation is not very good, sorry.

Can you put your question in a form of a patch ?

Thanks!
-Lukas

> 
>        	      	      	  	    - Ted
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: fix reservation release on invalidatepage for delalloc fs
  2015-06-08 15:45   ` Lukáš Czerner
@ 2015-06-19  7:53     ` Lukáš Czerner
  2015-07-04  1:17       ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Lukáš Czerner @ 2015-06-19  7:53 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4054 bytes --]

On Mon, 8 Jun 2015, Lukáš Czerner wrote:

> Date: Mon, 8 Jun 2015 17:45:08 +0200 (CEST)
> From: Lukáš Czerner <lczerner@redhat.com>
> To: Theodore Ts'o <tytso@mit.edu>
> Cc: linux-ext4@vger.kernel.org
> Subject: Re: [PATCH] ext4: fix reservation release on invalidatepage for
>     delalloc fs
> 
> On Mon, 8 Jun 2015, Theodore Ts'o wrote:
> 
> > Date: Mon, 8 Jun 2015 11:16:51 -0400
> > From: Theodore Ts'o <tytso@mit.edu>
> > To: Lukas Czerner <lczerner@redhat.com>
> > Cc: linux-ext4@vger.kernel.org
> > Subject: Re: [PATCH] ext4: fix reservation release on invalidatepage for
> >     delalloc fs
> > 
> > On Thu, Jun 04, 2015 at 10:25:01AM +0200, Lukas Czerner wrote:
> > > On delalloc enabled file system on invalidatepage operation
> > > in ext4_da_page_release_reservation() we want to clear the delayed
> > > buffer and remove the extent covering the delayed buffer from the extent
> > > status tree.
> > > 
> > > However currently there is a bug where on the systems with page size >
> > > block size we will always remove extents from the start of the page
> > > regardless where the actual delayed buffers are positioned in the page.
> > 
> > Right, because we end up screwing up the accounting.
> > 
> > > @@ -1363,14 +1363,23 @@ static void ext4_da_page_release_reservation(struct page *page,
> > >  
> > >  		if ((offset <= curr_off) && (buffer_delay(bh))) {
> > >  			to_release++;
> > > +			contiguous_blks++;
> > >  			clear_buffer_delay(bh);
> > > +		} else if (contiguous_blks) {
> > > +			lblk = page->index <<
> > > +			       (PAGE_CACHE_SHIFT - inode->i_blkbits);
> > > +			lblk += (curr_off >> inode->i_blkbits) -
> > > +				contiguous_blks;
> > > +			ext4_es_remove_extent(inode, lblk, contiguous_blks);
> > > +			contiguous_blks = 0;
> > >  		}
> > >  		curr_off = next_off;
> > >  	} while ((bh = bh->b_this_page) != head);
> > 
> > Shouldn't we call ext4_es_remove_extent() on the portion of the page
> > containing the delayed allocation region, before we clear
> > contiguous_blks and resetting lblk?
> 
> Hi Ted,
> 
> right this is the point of the patch.
> 
> > 
> > For example, suppose we had the 4k page with a 1k block size, where
> > the first, second, and fourth blocks are delayed allocated.  With this
> > patch we will end up only clearing the extent status tree for the
> > fourth block, but not the first and second.
> 
> So when the first and second block are delayed, then the
> 
> if ((offset <= curr_off) && (buffer_delay(bh)))
> 
> will hit twice which means that we'll have contiguous_blks = 2
> 
> Now on the third block this condition will no longer be true
> (because buffer_delay(bh) will be false) and so we will hit
> 
> else if (contiguous_blks) {
> 
> then lblk will be: start of the page + (curr_off - contiguous_blks).
> curr_off at this point will point at third block (index 2) and
> contiguous_blks is 2. Which means that lblk will point at the start
> of the page - which is exactly right because the first delayed block
> is at the start of the page.
> 
> So ext4_es_remove_extent() will remove extent of two blocks starting
> from the end of the page - which means it removes first and second
> delayed block.
> 
> Now when we check fourth block the
> 
> if ((offset <= curr_off) && (buffer_delay(bh)))
> 
> will hit again, leaving contiguous_blks with 1, then we leave the
> while cycle and hit this:
> 
> if (contiguous_blks)
> 
> removing the extent starting at fourth block in the page removing
> one block (the fourth block in the page).
> 
> That's how I wrote the code, but maybe I am missing something ? I am
> a bit tired today already so my explanation is not very good, sorry.
> 
> Can you put your question in a form of a patch ?
> 
> Thanks!
> -Lukas

Hi Ted,

can you take a second look at the patch ?

Thanks!
-Lukas

> 
> > 
> >        	      	      	  	    - Ted
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: fix reservation release on invalidatepage for delalloc fs
  2015-06-19  7:53     ` Lukáš Czerner
@ 2015-07-04  1:17       ` Theodore Ts'o
  0 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2015-07-04  1:17 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: linux-ext4

Sorry for not getting back to this patch earlier.  When I took another
look at it, I realized I misunderstood what was going on.  The patch
looks correct, and I've added it to the ext4.git tree as part of bug
fixes that I'll send out to Linus for 4.2.

Thanks,

						- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-04  1:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-04  8:25 [PATCH] ext4: fix reservation release on invalidatepage for delalloc fs Lukas Czerner
2015-06-04  9:00 ` Jan Kara
2015-06-08 15:16 ` Theodore Ts'o
2015-06-08 15:45   ` Lukáš Czerner
2015-06-19  7:53     ` Lukáš Czerner
2015-07-04  1:17       ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).