* [PATCH] Btrfs: fix performance regression of writing to prealloc/nocow file
@ 2016-03-17 22:16 Liu Bo
  2016-03-18 11:59 ` Holger Hoffstätte
  2016-03-18 12:44 ` Roman Mamedov
  0 siblings, 2 replies; 4+ messages in thread
From: Liu Bo @ 2016-03-17 22:16 UTC (permalink / raw)
  To: linux-btrfs
For nocow/prealloc files, we try our best to not allocate space, however,
this ends up a huge performance regression since it's expensive to check
if data is shared.
Let's go back to only go check shared data once we're not able to allocate
space.
The test was made against a tmpfs backed loop device:
xfs_io -f -c "falloc 0 400M" foobar
xfs_io -c "pwrite -W -b 4096 0 400M" foobar
Vanilla:
wrote 419430400/419430400 bytes at offset 0
400 MiB, 102400 ops; 0:00:01.00 (200.015 MiB/sec and 51203.8403 ops/sec)
Patched kernel:
wrote 419430400/419430400 bytes at offset 0
400 MiB, 102400 ops; 0:00:01.00 (346.137 MiB/sec and 88610.9796 ops/sec)
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/file.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 098bb8f..f66c1bc 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1525,16 +1525,12 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 
 		reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
 
-		if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
-					     BTRFS_INODE_PREALLOC)) {
+		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
+		if (ret == -ENOSPC &&
+		    BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+						     BTRFS_INODE_PREALLOC)) {
 			ret = check_can_nocow(inode, pos, &write_bytes);
-			if (ret < 0)
-				break;
 			if (ret > 0) {
-				/*
-				 * For nodata cow case, no need to reserve
-				 * data space.
-				 */
 				only_release_metadata = true;
 				/*
 				 * our prealloc extent may be smaller than
@@ -1543,10 +1539,13 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
 				num_pages = DIV_ROUND_UP(write_bytes + offset,
 							 PAGE_CACHE_SIZE);
 				reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
+
+				ret = 0;
 				goto reserve_metadata;
+			} else {
+				ret = -ENOSPC;
 			}
 		}
-		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
 		if (ret < 0)
 			break;
 
-- 
2.5.0
^ permalink raw reply related	[flat|nested] 4+ messages in thread
* Re: [PATCH] Btrfs: fix performance regression of writing to prealloc/nocow file
  2016-03-17 22:16 [PATCH] Btrfs: fix performance regression of writing to prealloc/nocow file Liu Bo
@ 2016-03-18 11:59 ` Holger Hoffstätte
  2016-03-18 12:44 ` Roman Mamedov
  1 sibling, 0 replies; 4+ messages in thread
From: Holger Hoffstätte @ 2016-03-18 11:59 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs
(sorry for any duplicates, vger.org hates gmail)
On Thu, Mar 17, 2016 at 11:16 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
> For nocow/prealloc files, we try our best to not allocate space, however,
> this ends up a huge performance regression since it's expensive to check
> if data is shared.
>
> Let's go back to only go check shared data once we're not able to allocate
> space.
>
> The test was made against a tmpfs backed loop device:
>
> xfs_io -f -c "falloc 0 400M" foobar
> xfs_io -c "pwrite -W -b 4096 0 400M" foobar
>
> Vanilla:
> wrote 419430400/419430400 bytes at offset 0
> 400 MiB, 102400 ops; 0:00:01.00 (200.015 MiB/sec and 51203.8403 ops/sec)
>
> Patched kernel:
> wrote 419430400/419430400 bytes at offset 0
> 400 MiB, 102400 ops; 0:00:01.00 (346.137 MiB/sec and 88610.9796 ops/sec)
(snip)
Doesn't this effectively do the same as
https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/fs/btrfs?h=integration-4.6&id=4da2e26a2a32b174878744bd0f07db180c875f26
but in a slightly different way?
With a 4.4++ kernel (btrfs ~4.6) and the above patch I don't see this bad
performance (via loop on tmpfs):
$xfs_io -c "pwrite -W -b 4096 0 400M" foobar
wrote 419430400/419430400 bytes at offset 0
400 MiB, 102400 ops; 0.0000 sec (831.038 MiB/sec and 212745.6235 ops/sec)
Rewriting is even faster:
$xfs_io -c "pwrite -W -b 4096 0 400M" foobar
wrote 419430400/419430400 bytes at offset 0
400 MiB, 102400 ops; 0.0000 sec (1.124 GiB/sec and 294713.5752 ops/sec)
Looks decent enough considering the fsync.
-h
^ permalink raw reply	[flat|nested] 4+ messages in thread
* Re: [PATCH] Btrfs: fix performance regression of writing to prealloc/nocow file
  2016-03-17 22:16 [PATCH] Btrfs: fix performance regression of writing to prealloc/nocow file Liu Bo
  2016-03-18 11:59 ` Holger Hoffstätte
@ 2016-03-18 12:44 ` Roman Mamedov
  2016-03-21 19:10   ` Liu Bo
  1 sibling, 1 reply; 4+ messages in thread
From: Roman Mamedov @ 2016-03-18 12:44 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2922 bytes --]
On Thu, 17 Mar 2016 15:16:15 -0700
Liu Bo <bo.li.liu@oracle.com> wrote:
> For nocow/prealloc files, we try our best to not allocate space, however,
> this ends up a huge performance regression since it's expensive to check
> if data is shared.
> 
> Let's go back to only go check shared data once we're not able to allocate
> space.
As mentioned by Holger there's another patch modifying the same function:
https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/fs/btrfs?h=integration-4.6&id=4da2e26a2a32b174878744bd0f07db180c875f26
and that one fixes a serious regression in the 4.4 series.
I did not try yours, but could you please ensure that you do not hit the
described problem with your version of the patch, or perhaps even better,
consider re-testing performance and then basing any further performance
optimizations upon a state with no known grave functionality bugs (i.e. with
the above patch applied).
Thanks
> The test was made against a tmpfs backed loop device:
> 
> xfs_io -f -c "falloc 0 400M" foobar
> xfs_io -c "pwrite -W -b 4096 0 400M" foobar
> 
> Vanilla:
> wrote 419430400/419430400 bytes at offset 0
> 400 MiB, 102400 ops; 0:00:01.00 (200.015 MiB/sec and 51203.8403 ops/sec)
> 
> Patched kernel:
> wrote 419430400/419430400 bytes at offset 0
> 400 MiB, 102400 ops; 0:00:01.00 (346.137 MiB/sec and 88610.9796 ops/sec)
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/file.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 098bb8f..f66c1bc 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1525,16 +1525,12 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  
>  		reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
>  
> -		if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> -					     BTRFS_INODE_PREALLOC)) {
> +		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
> +		if (ret == -ENOSPC &&
> +		    BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> +						     BTRFS_INODE_PREALLOC)) {
>  			ret = check_can_nocow(inode, pos, &write_bytes);
> -			if (ret < 0)
> -				break;
>  			if (ret > 0) {
> -				/*
> -				 * For nodata cow case, no need to reserve
> -				 * data space.
> -				 */
>  				only_release_metadata = true;
>  				/*
>  				 * our prealloc extent may be smaller than
> @@ -1543,10 +1539,13 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
>  				num_pages = DIV_ROUND_UP(write_bytes + offset,
>  							 PAGE_CACHE_SIZE);
>  				reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
> +
> +				ret = 0;
>  				goto reserve_metadata;
> +			} else {
> +				ret = -ENOSPC;
>  			}
>  		}
> -		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
>  		if (ret < 0)
>  			break;
>  
-- 
With respect,
Roman
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply	[flat|nested] 4+ messages in thread
* Re: [PATCH] Btrfs: fix performance regression of writing to prealloc/nocow file
  2016-03-18 12:44 ` Roman Mamedov
@ 2016-03-21 19:10   ` Liu Bo
  0 siblings, 0 replies; 4+ messages in thread
From: Liu Bo @ 2016-03-21 19:10 UTC (permalink / raw)
  To: Roman Mamedov, Holger Hoffstätte; +Cc: linux-btrfs
Hi,
On Fri, Mar 18, 2016 at 05:44:03PM +0500, Roman Mamedov wrote:
> On Thu, 17 Mar 2016 15:16:15 -0700
> Liu Bo <bo.li.liu@oracle.com> wrote:
> 
> > For nocow/prealloc files, we try our best to not allocate space, however,
> > this ends up a huge performance regression since it's expensive to check
> > if data is shared.
> > 
> > Let's go back to only go check shared data once we're not able to allocate
> > space.
> 
> As mentioned by Holger there's another patch modifying the same function:
> https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/fs/btrfs?h=integration-4.6&id=4da2e26a2a32b174878744bd0f07db180c875f26
> and that one fixes a serious regression in the 4.4 series.
> I did not try yours, but could you please ensure that you do not hit the
> described problem with your version of the patch, or perhaps even better,
> consider re-testing performance and then basing any further performance
> optimizations upon a state with no known grave functionality bugs (i.e. with
> the above patch applied).
I've verified with the above test that my patch doesn't introduce a
regression although there're some patch conflicts to deal with.
In fact they're fixing two different problems.
Zhao's patch is fixing the incorrect behavior of parsing error caused by
creating snapshots, while my patch is to go back to reserving space for
nocow/prealloc writes unless we've run out of space.
Thanks,
-liubo
> 
> Thanks
> 
> > The test was made against a tmpfs backed loop device:
> > 
> > xfs_io -f -c "falloc 0 400M" foobar
> > xfs_io -c "pwrite -W -b 4096 0 400M" foobar
> > 
> > Vanilla:
> > wrote 419430400/419430400 bytes at offset 0
> > 400 MiB, 102400 ops; 0:00:01.00 (200.015 MiB/sec and 51203.8403 ops/sec)
> > 
> > Patched kernel:
> > wrote 419430400/419430400 bytes at offset 0
> > 400 MiB, 102400 ops; 0:00:01.00 (346.137 MiB/sec and 88610.9796 ops/sec)
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/file.c | 17 ++++++++---------
> >  1 file changed, 8 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index 098bb8f..f66c1bc 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -1525,16 +1525,12 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
> >  
> >  		reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
> >  
> > -		if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> > -					     BTRFS_INODE_PREALLOC)) {
> > +		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
> > +		if (ret == -ENOSPC &&
> > +		    BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> > +						     BTRFS_INODE_PREALLOC)) {
> >  			ret = check_can_nocow(inode, pos, &write_bytes);
> > -			if (ret < 0)
> > -				break;
> >  			if (ret > 0) {
> > -				/*
> > -				 * For nodata cow case, no need to reserve
> > -				 * data space.
> > -				 */
> >  				only_release_metadata = true;
> >  				/*
> >  				 * our prealloc extent may be smaller than
> > @@ -1543,10 +1539,13 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
> >  				num_pages = DIV_ROUND_UP(write_bytes + offset,
> >  							 PAGE_CACHE_SIZE);
> >  				reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
> > +
> > +				ret = 0;
> >  				goto reserve_metadata;
> > +			} else {
> > +				ret = -ENOSPC;
> >  			}
> >  		}
> > -		ret = btrfs_check_data_free_space(inode, pos, write_bytes);
> >  		if (ret < 0)
> >  			break;
> >  
> 
> 
> -- 
> With respect,
> Roman
^ permalink raw reply	[flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-03-21 20:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-17 22:16 [PATCH] Btrfs: fix performance regression of writing to prealloc/nocow file Liu Bo
2016-03-18 11:59 ` Holger Hoffstätte
2016-03-18 12:44 ` Roman Mamedov
2016-03-21 19:10   ` Liu Bo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).