[PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size
@ 2024-08-12 12:11 Zhang Yi
  2024-08-12 12:11 ` [PATCH v2 1/6] iomap: correct the range of a partial dirty clear Zhang Yi
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-12 12:11 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, djwong, hch, brauner, david, jack, willy, yi.zhang,
	yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Changes since v1:
 - Patch 5 fix a stale data exposure problem pointed out by Willy, drop
   the setting of uptodate bits after zeroing out unaligned range.
 - As Dave suggested, in order to prevent increasing the complexity of
   maintain the state_lock, don't just drop all the state_lock in the
   buffered write path, patch 6 introduce a new helper to set uptodate
   bit and dirty bits together under the state_lock, reduce one time of
   locking per write, the benefits of performance optimization do not
   change too much.

This series contains some minor non-critical fixes and performance
improvements on the filesystem with block size < folio size.

The first 4 patches fix the handling of setting and clearing folio ifs
dirty bits when mark the folio dirty and when invalidat the folio.
Although none of these code mistakes caused a real problem now, it's
still deserve a fix to correct the behavior.

The second 2 patches drop the unnecessary state_lock in ifs when setting
and clearing dirty/uptodate bits in the buffered write path, it could
improve some (~8% on my machine) buffer write performance. I tested it
through UnixBench on my x86_64 (Xeon Gold 6151) and arm64 (Kunpeng-920)
virtual machine with 50GB ramdisk and xfs filesystem, the results shows
below.

UnixBench test cmd:
 ./Run -i 1 -c 1 fstime-w

Before:
x86    File Write 1024 bufsize 2000 maxblocks       524708.0 KBps
arm64  File Write 1024 bufsize 2000 maxblocks       801965.0 KBps

After:
x86    File Write 1024 bufsize 2000 maxblocks       569218.0 KBps
arm64  File Write 1024 bufsize 2000 maxblocks       871605.0 KBps

Thanks,
Yi.

Zhang Yi (6):
  iomap: correct the range of a partial dirty clear
  iomap: support invalidating partial folios
  iomap: advance the ifs allocation if we have more than one blocks per
    folio
  iomap: correct the dirty length in page mkwrite
  iomap: don't mark blocks uptodate after partial zeroing
  iomap: reduce unnecessary state_lock when setting ifs uptodate and
    dirty bits

 fs/iomap/buffered-io.c | 73 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 60 insertions(+), 13 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/6] iomap: correct the range of a partial dirty clear
  2024-08-12 12:11 [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
@ 2024-08-12 12:11 ` Zhang Yi
  2024-08-12 16:33   ` Darrick J. Wong
  2024-08-12 12:11 ` [PATCH v2 2/6] iomap: support invalidating partial folios Zhang Yi
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Zhang Yi @ 2024-08-12 12:11 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, djwong, hch, brauner, david, jack, willy, yi.zhang,
	yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

The block range calculation in ifs_clear_range_dirty() is incorrect when
partial clear a range in a folio. We can't clear the dirty bit of the
first block or the last block if the start or end offset is blocksize
unaligned, this has not yet caused any issue since we always clear a
whole folio in iomap_writepage_map()->iomap_clear_range_dirty(). Fix
this by round up the first block and round down the last block and
correct the calculation of nr_blks.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f420c53d86ac..4da453394aaf 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -138,11 +138,14 @@ static void ifs_clear_range_dirty(struct folio *folio,
 {
 	struct inode *inode = folio->mapping->host;
 	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
-	unsigned int first_blk = (off >> inode->i_blkbits);
-	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
-	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned int first_blk = DIV_ROUND_UP(off, i_blocksize(inode));
+	unsigned int last_blk = (off + len) >> inode->i_blkbits;
+	unsigned int nr_blks = last_blk - first_blk;
 	unsigned long flags;
 
+	if (!nr_blks)
+		return;
+
 	spin_lock_irqsave(&ifs->state_lock, flags);
 	bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks);
 	spin_unlock_irqrestore(&ifs->state_lock, flags);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/6] iomap: correct the range of a partial dirty clear
  2024-08-12 12:11 ` [PATCH v2 1/6] iomap: correct the range of a partial dirty clear Zhang Yi
@ 2024-08-12 16:33   ` Darrick J. Wong
  2024-08-13  2:14     ` Zhang Yi
  2024-08-14  1:53     ` Dave Chinner
  0 siblings, 2 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-12 16:33 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, hch, brauner, david, jack,
	willy, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:54PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> The block range calculation in ifs_clear_range_dirty() is incorrect when
> partial clear a range in a folio. We can't clear the dirty bit of the
> first block or the last block if the start or end offset is blocksize
> unaligned, this has not yet caused any issue since we always clear a
> whole folio in iomap_writepage_map()->iomap_clear_range_dirty(). Fix
> this by round up the first block and round down the last block and
> correct the calculation of nr_blks.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>  fs/iomap/buffered-io.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f420c53d86ac..4da453394aaf 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -138,11 +138,14 @@ static void ifs_clear_range_dirty(struct folio *folio,
>  {
>  	struct inode *inode = folio->mapping->host;
>  	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> -	unsigned int first_blk = (off >> inode->i_blkbits);
> -	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
> -	unsigned int nr_blks = last_blk - first_blk + 1;
> +	unsigned int first_blk = DIV_ROUND_UP(off, i_blocksize(inode));

Is there a round up macro that doesn't involve integer division?

--D

> +	unsigned int last_blk = (off + len) >> inode->i_blkbits;
> +	unsigned int nr_blks = last_blk - first_blk;
>  	unsigned long flags;
>  
> +	if (!nr_blks)
> +		return;
> +
>  	spin_lock_irqsave(&ifs->state_lock, flags);
>  	bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks);
>  	spin_unlock_irqrestore(&ifs->state_lock, flags);
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/6] iomap: correct the range of a partial dirty clear
  2024-08-12 16:33   ` Darrick J. Wong
@ 2024-08-13  2:14     ` Zhang Yi
  2024-08-14  1:53     ` Dave Chinner
  1 sibling, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-13  2:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel, linux-kernel, hch, brauner, david, jack,
	willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/13 0:33, Darrick J. Wong wrote:
> On Mon, Aug 12, 2024 at 08:11:54PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> The block range calculation in ifs_clear_range_dirty() is incorrect when
>> partial clear a range in a folio. We can't clear the dirty bit of the
>> first block or the last block if the start or end offset is blocksize
>> unaligned, this has not yet caused any issue since we always clear a
>> whole folio in iomap_writepage_map()->iomap_clear_range_dirty(). Fix
>> this by round up the first block and round down the last block and
>> correct the calculation of nr_blks.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>>  fs/iomap/buffered-io.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index f420c53d86ac..4da453394aaf 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -138,11 +138,14 @@ static void ifs_clear_range_dirty(struct folio *folio,
>>  {
>>  	struct inode *inode = folio->mapping->host;
>>  	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> -	unsigned int first_blk = (off >> inode->i_blkbits);
>> -	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
>> -	unsigned int nr_blks = last_blk - first_blk + 1;
>> +	unsigned int first_blk = DIV_ROUND_UP(off, i_blocksize(inode));
> 
> Is there a round up macro that doesn't involve integer division?
> 

Sorry, I don't find a common macro now, if we want to avoid integer division,
how about open code here?

	first_blk = round_up(off, i_blocksize(inode)) >> inode->i_blkbits;

Thanks,
Yi.

> 
>> +	unsigned int last_blk = (off + len) >> inode->i_blkbits;
>> +	unsigned int nr_blks = last_blk - first_blk;
>>  	unsigned long flags;
>>  
>> +	if (!nr_blks)
>> +		return;
>> +
>>  	spin_lock_irqsave(&ifs->state_lock, flags);
>>  	bitmap_clear(ifs->state, first_blk + blks_per_folio, nr_blks);
>>  	spin_unlock_irqrestore(&ifs->state_lock, flags);
>> -- 
>> 2.39.2
>>
>>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/6] iomap: correct the range of a partial dirty clear
  2024-08-12 16:33   ` Darrick J. Wong
  2024-08-13  2:14     ` Zhang Yi
@ 2024-08-14  1:53     ` Dave Chinner
  1 sibling, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2024-08-14  1:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Zhang Yi, linux-xfs, linux-fsdevel, linux-kernel, hch, brauner,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 09:33:39AM -0700, Darrick J. Wong wrote:
> On Mon, Aug 12, 2024 at 08:11:54PM +0800, Zhang Yi wrote:
> > From: Zhang Yi <yi.zhang@huawei.com>
> > 
> > The block range calculation in ifs_clear_range_dirty() is incorrect when
> > partial clear a range in a folio. We can't clear the dirty bit of the
> > first block or the last block if the start or end offset is blocksize
> > unaligned, this has not yet caused any issue since we always clear a
> > whole folio in iomap_writepage_map()->iomap_clear_range_dirty(). Fix
> > this by round up the first block and round down the last block and
> > correct the calculation of nr_blks.
> > 
> > Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > ---
> >  fs/iomap/buffered-io.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index f420c53d86ac..4da453394aaf 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -138,11 +138,14 @@ static void ifs_clear_range_dirty(struct folio *folio,
> >  {
> >  	struct inode *inode = folio->mapping->host;
> >  	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> > -	unsigned int first_blk = (off >> inode->i_blkbits);
> > -	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
> > -	unsigned int nr_blks = last_blk - first_blk + 1;
> > +	unsigned int first_blk = DIV_ROUND_UP(off, i_blocksize(inode));
> 
> Is there a round up macro that doesn't involve integer division?

i_blocksize() is supposed to be a power of 2, so yes:

/**
 * round_up - round up to next specified power of 2
 * @x: the value to round
 * @y: multiple to round up to (must be a power of 2)
 *
 * Rounds @x up to next multiple of @y (which must be a power of 2).
 * To perform arbitrary rounding up, use roundup() below.
 */
#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 2/6] iomap: support invalidating partial folios
  2024-08-12 12:11 [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
  2024-08-12 12:11 ` [PATCH v2 1/6] iomap: correct the range of a partial dirty clear Zhang Yi
@ 2024-08-12 12:11 ` Zhang Yi
  2024-08-12 16:55   ` Darrick J. Wong
  2024-08-12 12:11 ` [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio Zhang Yi
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Zhang Yi @ 2024-08-12 12:11 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, djwong, hch, brauner, david, jack, willy, yi.zhang,
	yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Current iomap_invalidate_folio() could only invalidate an entire folio,
if we truncate a partial folio on a filesystem with blocksize < folio
size, it will left over the dirty bits of truncated/punched blocks, and
the write back process will try to map the invalid hole range, but
fortunately it hasn't trigger any real problems now since ->map_blocks()
will fix the length. Fix this by supporting invalidating partial folios.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4da453394aaf..763deabe8331 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -644,6 +644,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len)
 		WARN_ON_ONCE(folio_test_writeback(folio));
 		folio_cancel_dirty(folio);
 		ifs_free(folio);
+	} else {
+		iomap_clear_range_dirty(folio, offset, len);
 	}
 }
 EXPORT_SYMBOL_GPL(iomap_invalidate_folio);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/6] iomap: support invalidating partial folios
  2024-08-12 12:11 ` [PATCH v2 2/6] iomap: support invalidating partial folios Zhang Yi
@ 2024-08-12 16:55   ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-12 16:55 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, hch, brauner, david, jack,
	willy, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:55PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Current iomap_invalidate_folio() could only invalidate an entire folio,
> if we truncate a partial folio on a filesystem with blocksize < folio
> size, it will left over the dirty bits of truncated/punched blocks, and
> the write back process will try to map the invalid hole range, but
> fortunately it hasn't trigger any real problems now since ->map_blocks()
> will fix the length. Fix this by supporting invalidating partial folios.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Seems ok to me
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 4da453394aaf..763deabe8331 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -644,6 +644,8 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len)
>  		WARN_ON_ONCE(folio_test_writeback(folio));
>  		folio_cancel_dirty(folio);
>  		ifs_free(folio);
> +	} else {
> +		iomap_clear_range_dirty(folio, offset, len);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(iomap_invalidate_folio);
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-12 12:11 [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
  2024-08-12 12:11 ` [PATCH v2 1/6] iomap: correct the range of a partial dirty clear Zhang Yi
  2024-08-12 12:11 ` [PATCH v2 2/6] iomap: support invalidating partial folios Zhang Yi
@ 2024-08-12 12:11 ` Zhang Yi
  2024-08-12 12:47   ` yangerkun
  2024-08-14  5:32   ` Christoph Hellwig
  2024-08-12 12:11 ` [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite Zhang Yi
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-12 12:11 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, djwong, hch, brauner, david, jack, willy, yi.zhang,
	yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Now we allocate ifs if i_blocks_per_folio is larger than one when
writing back dirty folios in iomap_writepage_map(), so we don't attach
an ifs after buffer write to an entire folio until it starts writing
back, if we partial truncate that folio, iomap_invalidate_folio() can't
clear counterpart block's dirty bit as expected. Fix this by advance the
ifs allocation to __iomap_write_begin().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 763deabe8331..79031b7517e5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -699,6 +699,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 	size_t from = offset_in_folio(folio, pos), to = from + len;
 	size_t poff, plen;
 
+	if (nr_blocks > 1) {
+		ifs = ifs_alloc(iter->inode, folio, iter->flags);
+		if ((iter->flags & IOMAP_NOWAIT) && !ifs)
+			return -EAGAIN;
+	}
+
 	/*
 	 * If the write or zeroing completely overlaps the current folio, then
 	 * entire folio will be dirtied so there is no need for
@@ -710,10 +716,6 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 	    pos + len >= folio_pos(folio) + folio_size(folio))
 		return 0;
 
-	ifs = ifs_alloc(iter->inode, folio, iter->flags);
-	if ((iter->flags & IOMAP_NOWAIT) && !ifs && nr_blocks > 1)
-		return -EAGAIN;
-
 	if (folio_test_uptodate(folio))
 		return 0;
 
@@ -1928,7 +1930,12 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	WARN_ON_ONCE(end_pos <= pos);
 
 	if (i_blocks_per_folio(inode, folio) > 1) {
-		if (!ifs) {
+		/*
+		 * This should not happen since we always allocate ifs in
+		 * iomap_folio_mkwrite_iter() and there is more than one
+		 * blocks per folio in __iomap_write_begin().
+		 */
+		if (WARN_ON_ONCE(!ifs)) {
 			ifs = ifs_alloc(inode, folio, 0);
 			iomap_set_range_dirty(folio, 0, end_pos - pos);
 		}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-12 12:11 ` [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio Zhang Yi
@ 2024-08-12 12:47   ` yangerkun
  2024-08-13  2:21     ` Zhang Yi
  2024-08-14  5:32   ` Christoph Hellwig
  1 sibling, 1 reply; 42+ messages in thread
From: yangerkun @ 2024-08-12 12:47 UTC (permalink / raw)
  To: Zhang Yi, linux-xfs, linux-fsdevel
  Cc: linux-kernel, djwong, hch, brauner, david, jack, willy, yi.zhang,
	chengzhihao1, yukuai3



在 2024/8/12 20:11, Zhang Yi 写道:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Now we allocate ifs if i_blocks_per_folio is larger than one when
> writing back dirty folios in iomap_writepage_map(), so we don't attach
> an ifs after buffer write to an entire folio until it starts writing
> back, if we partial truncate that folio, iomap_invalidate_folio() can't
> clear counterpart block's dirty bit as expected. Fix this by advance the
> ifs allocation to __iomap_write_begin().
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>   fs/iomap/buffered-io.c | 17 ++++++++++++-----
>   1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 763deabe8331..79031b7517e5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -699,6 +699,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>   	size_t from = offset_in_folio(folio, pos), to = from + len;
>   	size_t poff, plen;
>   
> +	if (nr_blocks > 1) {
> +		ifs = ifs_alloc(iter->inode, folio, iter->flags);
> +		if ((iter->flags & IOMAP_NOWAIT) && !ifs)
> +			return -EAGAIN;
> +	}
> +
>   	/*
>   	 * If the write or zeroing completely overlaps the current folio, then
>   	 * entire folio will be dirtied so there is no need for

The comments upper need change too.


> @@ -710,10 +716,6 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>   	    pos + len >= folio_pos(folio) + folio_size(folio))
>   		return 0;
>   
> -	ifs = ifs_alloc(iter->inode, folio, iter->flags);
> -	if ((iter->flags & IOMAP_NOWAIT) && !ifs && nr_blocks > 1)
> -		return -EAGAIN;
> -
>   	if (folio_test_uptodate(folio))
>   		return 0;
>   
> @@ -1928,7 +1930,12 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>   	WARN_ON_ONCE(end_pos <= pos);
>   
>   	if (i_blocks_per_folio(inode, folio) > 1) {
> -		if (!ifs) {
> +		/*
> +		 * This should not happen since we always allocate ifs in
> +		 * iomap_folio_mkwrite_iter() and there is more than one
> +		 * blocks per folio in __iomap_write_begin().
> +		 */
> +		if (WARN_ON_ONCE(!ifs)) {
>   			ifs = ifs_alloc(inode, folio, 0);
>   			iomap_set_range_dirty(folio, 0, end_pos - pos);
>   		}

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-12 12:47   ` yangerkun
@ 2024-08-13  2:21     ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-13  2:21 UTC (permalink / raw)
  To: yangerkun, linux-xfs, linux-fsdevel
  Cc: linux-kernel, djwong, hch, brauner, david, jack, willy, yi.zhang,
	chengzhihao1, yukuai3

On 2024/8/12 20:47, yangerkun wrote:
> 
> 
> 在 2024/8/12 20:11, Zhang Yi 写道:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Now we allocate ifs if i_blocks_per_folio is larger than one when
>> writing back dirty folios in iomap_writepage_map(), so we don't attach
>> an ifs after buffer write to an entire folio until it starts writing
>> back, if we partial truncate that folio, iomap_invalidate_folio() can't
>> clear counterpart block's dirty bit as expected. Fix this by advance the
>> ifs allocation to __iomap_write_begin().
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>>   fs/iomap/buffered-io.c | 17 ++++++++++++-----
>>   1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 763deabe8331..79031b7517e5 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -699,6 +699,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>>       size_t from = offset_in_folio(folio, pos), to = from + len;
>>       size_t poff, plen;
>>   +    if (nr_blocks > 1) {
>> +        ifs = ifs_alloc(iter->inode, folio, iter->flags);
>> +        if ((iter->flags & IOMAP_NOWAIT) && !ifs)
>> +            return -EAGAIN;
>> +    }
>> +
>>       /*
>>        * If the write or zeroing completely overlaps the current folio, then
>>        * entire folio will be dirtied so there is no need for
> 
> The comments upper need change too.

Will update as well, thanks for pointing this out.

Thanks,
Yi.

> 
> 
>> @@ -710,10 +716,6 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>>           pos + len >= folio_pos(folio) + folio_size(folio))
>>           return 0;
>>   -    ifs = ifs_alloc(iter->inode, folio, iter->flags);
>> -    if ((iter->flags & IOMAP_NOWAIT) && !ifs && nr_blocks > 1)
>> -        return -EAGAIN;
>> -
>>       if (folio_test_uptodate(folio))
>>           return 0;
>>   @@ -1928,7 +1930,12 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>>       WARN_ON_ONCE(end_pos <= pos);
>>         if (i_blocks_per_folio(inode, folio) > 1) {
>> -        if (!ifs) {
>> +        /*
>> +         * This should not happen since we always allocate ifs in
>> +         * iomap_folio_mkwrite_iter() and there is more than one
>> +         * blocks per folio in __iomap_write_begin().
>> +         */
>> +        if (WARN_ON_ONCE(!ifs)) {
>>               ifs = ifs_alloc(inode, folio, 0);
>>               iomap_set_range_dirty(folio, 0, end_pos - pos);
>>           }


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-12 12:11 ` [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio Zhang Yi
  2024-08-12 12:47   ` yangerkun
@ 2024-08-14  5:32   ` Christoph Hellwig
  2024-08-14  7:08     ` Zhang Yi
  2024-08-17  4:27     ` Zhang Yi
  1 sibling, 2 replies; 42+ messages in thread
From: Christoph Hellwig @ 2024-08-14  5:32 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	david, jack, willy, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:56PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Now we allocate ifs if i_blocks_per_folio is larger than one when
> writing back dirty folios in iomap_writepage_map(), so we don't attach
> an ifs after buffer write to an entire folio until it starts writing
> back, if we partial truncate that folio, iomap_invalidate_folio() can't
> clear counterpart block's dirty bit as expected. Fix this by advance the
> ifs allocation to __iomap_write_begin().

Wouldn't it make more sense to only allocate the ifѕ in
iomap_invalidate_folio when it actually is needed?

Also do you have a reproducer for this?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-14  5:32   ` Christoph Hellwig
@ 2024-08-14  7:08     ` Zhang Yi
  2024-08-15  6:00       ` Christoph Hellwig
  2024-08-17  4:27     ` Zhang Yi
  1 sibling, 1 reply; 42+ messages in thread
From: Zhang Yi @ 2024-08-14  7:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, brauner, david,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/14 13:32, Christoph Hellwig wrote:
> On Mon, Aug 12, 2024 at 08:11:56PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Now we allocate ifs if i_blocks_per_folio is larger than one when
>> writing back dirty folios in iomap_writepage_map(), so we don't attach
>> an ifs after buffer write to an entire folio until it starts writing
>> back, if we partial truncate that folio, iomap_invalidate_folio() can't
>> clear counterpart block's dirty bit as expected. Fix this by advance the
>> ifs allocation to __iomap_write_begin().
> 
> Wouldn't it make more sense to only allocate the ifѕ in
> iomap_invalidate_folio when it actually is needed?
> 

Therefore, you mean current strategy of allocating ifs is to try to delay
the allocation time as much as possible? The advantage is that it could
avoid some unnecessary allocation operations if the whole folio are
invalidated before write back. right?

> Also do you have a reproducer for this?
> 

This mistake doesn't case any real problem now, because once the folio
has been partial truncated, the counterpart range becomes a hole, although
the ifs dirty bit is not cleared, iomap_writepage_map_blocks() can deal
with it and won't cause any problem. Hence I don't have reproducer for
this.

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-14  7:08     ` Zhang Yi
@ 2024-08-15  6:00       ` Christoph Hellwig
  2024-08-16  1:44         ` Zhang Yi
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2024-08-15  6:00 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-kernel, djwong,
	brauner, david, jack, willy, yi.zhang, chengzhihao1, yukuai3

On Wed, Aug 14, 2024 at 03:08:25PM +0800, Zhang Yi wrote:
> > iomap_invalidate_folio when it actually is needed?
> > 
> 
> Therefore, you mean current strategy of allocating ifs is to try to delay
> the allocation time as much as possible?

Yes.

> The advantage is that it could
> avoid some unnecessary allocation operations if the whole folio are
> invalidated before write back. right?

Yes.  And hopefully we can also get to the point where we don't need
to actually allocate it for writeback.  I've been wanting to do that
for a while but never got it.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-15  6:00       ` Christoph Hellwig
@ 2024-08-16  1:44         ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-16  1:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, brauner, david,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/15 14:00, Christoph Hellwig wrote:
> On Wed, Aug 14, 2024 at 03:08:25PM +0800, Zhang Yi wrote:
>>> iomap_invalidate_folio when it actually is needed?
>>>
>>
>> Therefore, you mean current strategy of allocating ifs is to try to delay
>> the allocation time as much as possible?
> 
> Yes.
> 
>> The advantage is that it could
>> avoid some unnecessary allocation operations if the whole folio are
>> invalidated before write back. right?
> 
> Yes.  And hopefully we can also get to the point where we don't need
> to actually allocate it for writeback.  I've been wanting to do that
> for a while but never got it.
> 

Yeah, this sounds like a good idea.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-14  5:32   ` Christoph Hellwig
  2024-08-14  7:08     ` Zhang Yi
@ 2024-08-17  4:27     ` Zhang Yi
  2024-08-17  4:42       ` Matthew Wilcox
  1 sibling, 1 reply; 42+ messages in thread
From: Zhang Yi @ 2024-08-17  4:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, brauner, david,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/14 13:32, Christoph Hellwig wrote:
> On Mon, Aug 12, 2024 at 08:11:56PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Now we allocate ifs if i_blocks_per_folio is larger than one when
>> writing back dirty folios in iomap_writepage_map(), so we don't attach
>> an ifs after buffer write to an entire folio until it starts writing
>> back, if we partial truncate that folio, iomap_invalidate_folio() can't
>> clear counterpart block's dirty bit as expected. Fix this by advance the
>> ifs allocation to __iomap_write_begin().
> 
> Wouldn't it make more sense to only allocate the ifѕ in
> iomap_invalidate_folio when it actually is needed?
> 

I forget to mention that truncate_inode_partial_folio() call
folio_invalidate()->iomap_invalidate_folio() only when the folio has
private, if the folio doesn't has ifs, the iomap_invalidate_folio()
would nerver be called, hence allocate the ifs in
iomap_invalidate_folio() is useless.

In my opinion, one solution is change to always call folio_invalidate()
in truncate_inode_partial_folio(), all callbacks should handle the no
private case. Another solution is add a magic (a fake ifs) to
folio->private and then convert it to a real one in
iomap_invalidate_folio(), any thoughts?

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-17  4:27     ` Zhang Yi
@ 2024-08-17  4:42       ` Matthew Wilcox
  2024-08-17  6:16         ` Zhang Yi
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2024-08-17  4:42 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-kernel, djwong,
	brauner, david, jack, yi.zhang, chengzhihao1, yukuai3

On Sat, Aug 17, 2024 at 12:27:49PM +0800, Zhang Yi wrote:
> On 2024/8/14 13:32, Christoph Hellwig wrote:
> > On Mon, Aug 12, 2024 at 08:11:56PM +0800, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> Now we allocate ifs if i_blocks_per_folio is larger than one when
> >> writing back dirty folios in iomap_writepage_map(), so we don't attach
> >> an ifs after buffer write to an entire folio until it starts writing
> >> back, if we partial truncate that folio, iomap_invalidate_folio() can't
> >> clear counterpart block's dirty bit as expected. Fix this by advance the
> >> ifs allocation to __iomap_write_begin().
> > 
> > Wouldn't it make more sense to only allocate the ifѕ in
> > iomap_invalidate_folio when it actually is needed?
> > 
> 
> I forget to mention that truncate_inode_partial_folio() call
> folio_invalidate()->iomap_invalidate_folio() only when the folio has
> private, if the folio doesn't has ifs, the iomap_invalidate_folio()
> would nerver be called, hence allocate the ifs in
> iomap_invalidate_folio() is useless.
> 
> In my opinion, one solution is change to always call folio_invalidate()
> in truncate_inode_partial_folio(), all callbacks should handle the no
> private case. Another solution is add a magic (a fake ifs) to
> folio->private and then convert it to a real one in
> iomap_invalidate_folio(), any thoughts?

Why do we need iomap_invalidate_folio() to be called if there is no ifs?
Even today, all it does is call ifs_free() if we're freeing the entire
folio (which is done by truncate_cleanup_folio() and not by
truncate_inode_partial_folio().


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio
  2024-08-17  4:42       ` Matthew Wilcox
@ 2024-08-17  6:16         ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-17  6:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-kernel, djwong,
	brauner, david, jack, yi.zhang, chengzhihao1, yukuai3

On 2024/8/17 12:42, Matthew Wilcox wrote:
> On Sat, Aug 17, 2024 at 12:27:49PM +0800, Zhang Yi wrote:
>> On 2024/8/14 13:32, Christoph Hellwig wrote:
>>> On Mon, Aug 12, 2024 at 08:11:56PM +0800, Zhang Yi wrote:
>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> Now we allocate ifs if i_blocks_per_folio is larger than one when
>>>> writing back dirty folios in iomap_writepage_map(), so we don't attach
>>>> an ifs after buffer write to an entire folio until it starts writing
>>>> back, if we partial truncate that folio, iomap_invalidate_folio() can't
>>>> clear counterpart block's dirty bit as expected. Fix this by advance the
>>>> ifs allocation to __iomap_write_begin().
>>>
>>> Wouldn't it make more sense to only allocate the ifѕ in
>>> iomap_invalidate_folio when it actually is needed?
>>>
>>
>> I forget to mention that truncate_inode_partial_folio() call
>> folio_invalidate()->iomap_invalidate_folio() only when the folio has
>> private, if the folio doesn't has ifs, the iomap_invalidate_folio()
>> would nerver be called, hence allocate the ifs in
>> iomap_invalidate_folio() is useless.
>>
>> In my opinion, one solution is change to always call folio_invalidate()
>> in truncate_inode_partial_folio(), all callbacks should handle the no
>> private case. Another solution is add a magic (a fake ifs) to
>> folio->private and then convert it to a real one in
>> iomap_invalidate_folio(), any thoughts?
> 
> Why do we need iomap_invalidate_folio() to be called if there is no ifs?
> Even today, all it does is call ifs_free() if we're freeing the entire
> folio (which is done by truncate_cleanup_folio() and not by
> truncate_inode_partial_folio().
> 
Please see patch 2, if we truncate a partial folio (through punch hole or
truncate) on a filesystem with blocksize < folio size, it will left over
dirty bits of truncated/punched blocks, and this will lead to a hole with
dirty bit set but without any block allocated/reserved, this is not
correct.

Hence we also need to call iomap_invalidate_folio() by
truncate_inode_partial_folio() and clear partial folio of the counterpart
dirty bits. But now we don't allocate ifs in __iomap_write_begin() when
writing an entire folio, so it doesn't has ifs, but we still can
partial truncate this folio, we should allocate ifs for it and update
the dirty bits on it.

Thanks,
Yi.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite
  2024-08-12 12:11 [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
                   ` (2 preceding siblings ...)
  2024-08-12 12:11 ` [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio Zhang Yi
@ 2024-08-12 12:11 ` Zhang Yi
  2024-08-12 16:45   ` Darrick J. Wong
                     ` (2 more replies)
  2024-08-12 12:11 ` [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing Zhang Yi
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-12 12:11 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, djwong, hch, brauner, david, jack, willy, yi.zhang,
	yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
folio by folio_mark_dirty() even the map length is shorter than one
folio. However, on the filesystem with more than one blocks per folio,
we'd better to only set counterpart block's dirty bit according to
iomap_length(), so open code folio_mark_dirty() and pass the correct
length.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 79031b7517e5..ac762de9a27f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1492,7 +1492,10 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter,
 		block_commit_write(&folio->page, 0, length);
 	} else {
 		WARN_ON_ONCE(!folio_test_uptodate(folio));
-		folio_mark_dirty(folio);
+
+		ifs_alloc(iter->inode, folio, 0);
+		iomap_set_range_dirty(folio, 0, length);
+		filemap_dirty_folio(iter->inode->i_mapping, folio);
 	}
 
 	return length;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite
  2024-08-12 12:11 ` [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite Zhang Yi
@ 2024-08-12 16:45   ` Darrick J. Wong
  2024-08-13  2:49     ` Zhang Yi
  2024-08-14  5:36   ` Christoph Hellwig
  2024-08-17  4:45   ` Matthew Wilcox
  2 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-12 16:45 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, hch, brauner, david, jack,
	willy, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
> folio by folio_mark_dirty() even the map length is shorter than one
> folio. However, on the filesystem with more than one blocks per folio,
> we'd better to only set counterpart block's dirty bit according to
> iomap_length(), so open code folio_mark_dirty() and pass the correct
> length.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>  fs/iomap/buffered-io.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 79031b7517e5..ac762de9a27f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1492,7 +1492,10 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter,
>  		block_commit_write(&folio->page, 0, length);
>  	} else {
>  		WARN_ON_ONCE(!folio_test_uptodate(folio));
> -		folio_mark_dirty(folio);
> +
> +		ifs_alloc(iter->inode, folio, 0);
> +		iomap_set_range_dirty(folio, 0, length);
> +		filemap_dirty_folio(iter->inode->i_mapping, folio);

Is it correct to be doing a lot more work by changing folio_mark_dirty
to filemap_dirty_folio?  Now pagefaults call __mark_inode_dirty which
they did not before.  Also, the folio itself must be marked dirty if any
of the ifs bitmap is marked dirty, so I don't understand the change
here.

--D

>  	}
>  
>  	return length;
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite
  2024-08-12 16:45   ` Darrick J. Wong
@ 2024-08-13  2:49     ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-13  2:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel, linux-kernel, hch, brauner, david, jack,
	willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/13 0:45, Darrick J. Wong wrote:
> On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
>> folio by folio_mark_dirty() even the map length is shorter than one
>> folio. However, on the filesystem with more than one blocks per folio,
>> we'd better to only set counterpart block's dirty bit according to
>> iomap_length(), so open code folio_mark_dirty() and pass the correct
>> length.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>>  fs/iomap/buffered-io.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 79031b7517e5..ac762de9a27f 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -1492,7 +1492,10 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter,
>>  		block_commit_write(&folio->page, 0, length);
>>  	} else {
>>  		WARN_ON_ONCE(!folio_test_uptodate(folio));
>> -		folio_mark_dirty(folio);
>> +
>> +		ifs_alloc(iter->inode, folio, 0);
>> +		iomap_set_range_dirty(folio, 0, length);
>> +		filemap_dirty_folio(iter->inode->i_mapping, folio);
> 
> Is it correct to be doing a lot more work by changing folio_mark_dirty
> to filemap_dirty_folio?  Now pagefaults call __mark_inode_dirty which
> they did not before.  Also, the folio itself must be marked dirty if any
> of the ifs bitmap is marked dirty, so I don't understand the change
> here.
> 

This change is just open code iomap_dirty_folio() and correct the length
that passing to iomap_set_range_dirty().

bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio)
{
	struct inode *inode = mapping->host;
	size_t len = folio_size(folio);
...
	ifs_alloc(inode, folio, 0);
	iomap_set_range_dirty(folio, 0, len);
	return filemap_dirty_folio(mapping, folio);
}

Before this change, the code also call filemap_dirty_folio() (though
folio_mark_dirty()->iomap_dirty_folio()->filemap_dirty_folio()), so it call
__mark_inode_dirty() too. After this change, filemap_dirty_folio()->
folio_test_set_dirty() will mark the folio dirty. Hence there is no
difference between the two points you mentioned. Am I missing something?

Thanks,
Yi.

> 
>>  	}
>>  
>>  	return length;
>> -- 
>> 2.39.2
>>
>>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite
  2024-08-12 12:11 ` [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite Zhang Yi
  2024-08-12 16:45   ` Darrick J. Wong
@ 2024-08-14  5:36   ` Christoph Hellwig
  2024-08-14  7:49     ` Zhang Yi
  2024-08-17  4:45   ` Matthew Wilcox
  2 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2024-08-14  5:36 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	david, jack, willy, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
> folio by folio_mark_dirty() even the map length is shorter than one
> folio. However, on the filesystem with more than one blocks per folio,
> we'd better to only set counterpart block's dirty bit according to
> iomap_length(), so open code folio_mark_dirty() and pass the correct
> length.

What about moving the folio_mark_dirty out of the loop and directly
into iomap_page_mkwrite so that it is exactly called once?  The
iterator then does nothing for the !buffer_head case (but we still
need to call it to allocate the blocks).


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite
  2024-08-14  5:36   ` Christoph Hellwig
@ 2024-08-14  7:49     ` Zhang Yi
  2024-08-15  5:59       ` Christoph Hellwig
  0 siblings, 1 reply; 42+ messages in thread
From: Zhang Yi @ 2024-08-14  7:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, brauner, david,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/14 13:36, Christoph Hellwig wrote:
> On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
>> folio by folio_mark_dirty() even the map length is shorter than one
>> folio. However, on the filesystem with more than one blocks per folio,
>> we'd better to only set counterpart block's dirty bit according to
>> iomap_length(), so open code folio_mark_dirty() and pass the correct
>> length.
> 
> What about moving the folio_mark_dirty out of the loop and directly
> into iomap_page_mkwrite so that it is exactly called once?  The
> iterator then does nothing for the !buffer_head case (but we still
> need to call it to allocate the blocks).
> 

Sorry, this makes me confused. How does this could prevent setting
redundant dirty bits?

Suppose we have a 3K regular file on a filesystem with 1K block size.
In iomap_page_mkwrite(), the iter.len is 3K, if the folio size is 4K,
folio_mark_dirty() will also mark all 4 bits of ifs dirty. And then,
if we expand this file size to 4K, and this will still lead to a hole
with dirty bit set but without any block allocated/reserved. Am I
missing something?

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite
  2024-08-14  7:49     ` Zhang Yi
@ 2024-08-15  5:59       ` Christoph Hellwig
  2024-08-16  2:19         ` Zhang Yi
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2024-08-15  5:59 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-kernel, djwong,
	brauner, david, jack, willy, yi.zhang, chengzhihao1, yukuai3

On Wed, Aug 14, 2024 at 03:49:41PM +0800, Zhang Yi wrote:
> Sorry, this makes me confused. How does this could prevent setting
> redundant dirty bits?
> 
> Suppose we have a 3K regular file on a filesystem with 1K block size.
> In iomap_page_mkwrite(), the iter.len is 3K, if the folio size is 4K,
> folio_mark_dirty() will also mark all 4 bits of ifs dirty. And then,
> if we expand this file size to 4K, and this will still lead to a hole
> with dirty bit set but without any block allocated/reserved. Am I
> missing something?

No, we still need the ifs manipulation in the loop indeed.  But
the filemap_dirty_folio (and the not uptodate warning) can still
move outside the iterator.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite
  2024-08-15  5:59       ` Christoph Hellwig
@ 2024-08-16  2:19         ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-16  2:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, brauner, david,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/15 13:59, Christoph Hellwig wrote:
> On Wed, Aug 14, 2024 at 03:49:41PM +0800, Zhang Yi wrote:
>> Sorry, this makes me confused. How does this could prevent setting
>> redundant dirty bits?
>>
>> Suppose we have a 3K regular file on a filesystem with 1K block size.
>> In iomap_page_mkwrite(), the iter.len is 3K, if the folio size is 4K,
>> folio_mark_dirty() will also mark all 4 bits of ifs dirty. And then,
>> if we expand this file size to 4K, and this will still lead to a hole
>> with dirty bit set but without any block allocated/reserved. Am I
>> missing something?
> 
> No, we still need the ifs manipulation in the loop indeed.  But
> the filemap_dirty_folio (and the not uptodate warning) can still
> move outside the iterator.
> 
Yes, right, I misunderstood.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite
  2024-08-12 12:11 ` [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite Zhang Yi
  2024-08-12 16:45   ` Darrick J. Wong
  2024-08-14  5:36   ` Christoph Hellwig
@ 2024-08-17  4:45   ` Matthew Wilcox
  2024-08-17  6:43     ` Zhang Yi
  2 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2024-08-17  4:45 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	david, jack, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
> folio by folio_mark_dirty() even the map length is shorter than one
> folio. However, on the filesystem with more than one blocks per folio,
> we'd better to only set counterpart block's dirty bit according to
> iomap_length(), so open code folio_mark_dirty() and pass the correct
> length.

We shouldn't waste any time trying to optimise writing to files through
mmap().  People have written papers about what a terrible programming
model this is.  eg https://db.cs.cmu.edu/mmap-cidr2022/

There are some programs that do it, but they are few and far between.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite
  2024-08-17  4:45   ` Matthew Wilcox
@ 2024-08-17  6:43     ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-17  6:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	david, jack, yi.zhang, chengzhihao1, yukuai3

On 2024/8/17 12:45, Matthew Wilcox wrote:
> On Mon, Aug 12, 2024 at 08:11:57PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> When doing page mkwrite, iomap_folio_mkwrite_iter() dirty the entire
>> folio by folio_mark_dirty() even the map length is shorter than one
>> folio. However, on the filesystem with more than one blocks per folio,
>> we'd better to only set counterpart block's dirty bit according to
>> iomap_length(), so open code folio_mark_dirty() and pass the correct
>> length.
> 
> We shouldn't waste any time trying to optimise writing to files through
> mmap().  People have written papers about what a terrible programming
> model this is.  eg https://db.cs.cmu.edu/mmap-cidr2022/
> 
> There are some programs that do it, but they are few and far between.
> 

I don't think it's an optimization, this is a fix, the issue is the same
to the one that previous 2 patches want to fix: there left a hole with
ifs dirty bit set but without any valid data and without any block
allocated/reserved. This mmap() path is just once case that could lead
to this issue (The case if as I said in my reply to Christoph, suppose
we have a 3K regular file on a filesystem with 1K block size. If the
folio size is 4K, In iomap_page_mkwrite(), it will mark all 4 bits of ifs
dirty, the last one will become redundant if we expand the file to 4K
later).

Hence in my opinion, we need to fix this, or it's gonna be a potential
problem one day.

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing
  2024-08-12 12:11 [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
                   ` (3 preceding siblings ...)
  2024-08-12 12:11 ` [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite Zhang Yi
@ 2024-08-12 12:11 ` Zhang Yi
  2024-08-12 16:49   ` Darrick J. Wong
                     ` (2 more replies)
  2024-08-12 12:11 ` [PATCH v2 6/6] iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits Zhang Yi
  2024-08-14  1:49 ` [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Dave Chinner
  6 siblings, 3 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-12 12:11 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, djwong, hch, brauner, david, jack, willy, yi.zhang,
	yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

In __iomap_write_begin(), if we unaligned buffered write data to a hole
of a regular file, we only zero out the place where aligned to block
size that we don't want to write, but mark the whole range uptodate if
block size < folio size. This is wrong since the not zeroed part will
contains stale data and can be accessed by a concurrent buffered read
easily (on the filesystem may not hold inode->i_rwsem) once we mark the
range uptodate. Fix this by drop iomap_set_range_uptodate() in the
zeroing out branch.

Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Reported-by: Matthew Wilcox <willy@infradead.org>
Closes: https://lore.kernel.org/all/ZqsN5ouQTEc1KAzV@casper.infradead.org/
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ac762de9a27f..96600405dbb5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -744,8 +744,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 					poff, plen, srcmap);
 			if (status)
 				return status;
+			iomap_set_range_uptodate(folio, poff, plen);
 		}
-		iomap_set_range_uptodate(folio, poff, plen);
 	} while ((block_start += plen) < block_end);
 
 	return 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing
  2024-08-12 12:11 ` [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing Zhang Yi
@ 2024-08-12 16:49   ` Darrick J. Wong
  2024-08-13  3:01     ` Zhang Yi
  2024-08-14  5:39   ` Christoph Hellwig
  2024-08-17  4:48   ` Matthew Wilcox
  2 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-12 16:49 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, hch, brauner, david, jack,
	willy, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:58PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> In __iomap_write_begin(), if we unaligned buffered write data to a hole
> of a regular file, we only zero out the place where aligned to block
> size that we don't want to write, but mark the whole range uptodate if
> block size < folio size. This is wrong since the not zeroed part will
> contains stale data and can be accessed by a concurrent buffered read
> easily (on the filesystem may not hold inode->i_rwsem) once we mark the
> range uptodate. Fix this by drop iomap_set_range_uptodate() in the
> zeroing out branch.
> 
> Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
> Reported-by: Matthew Wilcox <willy@infradead.org>
> Closes: https://lore.kernel.org/all/ZqsN5ouQTEc1KAzV@casper.infradead.org/
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>  fs/iomap/buffered-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ac762de9a27f..96600405dbb5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -744,8 +744,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  					poff, plen, srcmap);
>  			if (status)
>  				return status;
> +			iomap_set_range_uptodate(folio, poff, plen);
>  		}
> -		iomap_set_range_uptodate(folio, poff, plen);

Don't we need to iomap_set_range_uptodate for the bytes that we zeroed
with folio_zero_segments?

--D

>  	} while ((block_start += plen) < block_end);
>  
>  	return 0;
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing
  2024-08-12 16:49   ` Darrick J. Wong
@ 2024-08-13  3:01     ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-13  3:01 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel, linux-kernel, hch, brauner, david, jack,
	willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/13 0:49, Darrick J. Wong wrote:
> On Mon, Aug 12, 2024 at 08:11:58PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> In __iomap_write_begin(), if we unaligned buffered write data to a hole
>> of a regular file, we only zero out the place where aligned to block
>> size that we don't want to write, but mark the whole range uptodate if
>> block size < folio size. This is wrong since the not zeroed part will
>> contains stale data and can be accessed by a concurrent buffered read
>> easily (on the filesystem may not hold inode->i_rwsem) once we mark the
>> range uptodate. Fix this by drop iomap_set_range_uptodate() in the
>> zeroing out branch.
>>
>> Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
>> Reported-by: Matthew Wilcox <willy@infradead.org>
>> Closes: https://lore.kernel.org/all/ZqsN5ouQTEc1KAzV@casper.infradead.org/
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>>  fs/iomap/buffered-io.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index ac762de9a27f..96600405dbb5 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -744,8 +744,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>>  					poff, plen, srcmap);
>>  			if (status)
>>  				return status;
>> +			iomap_set_range_uptodate(folio, poff, plen);
>>  		}
>> -		iomap_set_range_uptodate(folio, poff, plen);
> 
> Don't we need to iomap_set_range_uptodate for the bytes that we zeroed
> with folio_zero_segments?
> 

We must do partial block zeroing here, hence we don't need to set update
bit.

Thanks,
Yi.

> --D
> 
>>  	} while ((block_start += plen) < block_end);
>>  
>>  	return 0;
>> -- 
>> 2.39.2
>>
>>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing
  2024-08-12 12:11 ` [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing Zhang Yi
  2024-08-12 16:49   ` Darrick J. Wong
@ 2024-08-14  5:39   ` Christoph Hellwig
  2024-08-17  4:48   ` Matthew Wilcox
  2 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2024-08-14  5:39 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	david, jack, willy, yi.zhang, chengzhihao1, yukuai3

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing
  2024-08-12 12:11 ` [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing Zhang Yi
  2024-08-12 16:49   ` Darrick J. Wong
  2024-08-14  5:39   ` Christoph Hellwig
@ 2024-08-17  4:48   ` Matthew Wilcox
  2024-08-17  7:16     ` Zhang Yi
  2 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2024-08-17  4:48 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	david, jack, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:58PM +0800, Zhang Yi wrote:
> +++ b/fs/iomap/buffered-io.c
> @@ -744,8 +744,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  					poff, plen, srcmap);
>  			if (status)
>  				return status;
> +			iomap_set_range_uptodate(folio, poff, plen);
>  		}
> -		iomap_set_range_uptodate(folio, poff, plen);
>  	} while ((block_start += plen) < block_end);

Um, what I meant was to just delete the iomap_set_range_uptodate()
call in __iomap_write_begin() altogether.  We'll call it soon enough in
__iomap_write_end().

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing
  2024-08-17  4:48   ` Matthew Wilcox
@ 2024-08-17  7:16     ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-17  7:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	david, jack, yi.zhang, chengzhihao1, yukuai3

On 2024/8/17 12:48, Matthew Wilcox wrote:
> On Mon, Aug 12, 2024 at 08:11:58PM +0800, Zhang Yi wrote:
>> +++ b/fs/iomap/buffered-io.c
>> @@ -744,8 +744,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>>  					poff, plen, srcmap);
>>  			if (status)
>>  				return status;
>> +			iomap_set_range_uptodate(folio, poff, plen);
>>  		}
>> -		iomap_set_range_uptodate(folio, poff, plen);
>>  	} while ((block_start += plen) < block_end);
> 
> Um, what I meant was to just delete the iomap_set_range_uptodate()
> call in __iomap_write_begin() altogether.  We'll call it soon enough in
> __iomap_write_end().
> 

Yeah! Looks reasonable to me.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 6/6] iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits
  2024-08-12 12:11 [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
                   ` (4 preceding siblings ...)
  2024-08-12 12:11 ` [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing Zhang Yi
@ 2024-08-12 12:11 ` Zhang Yi
  2024-08-12 16:54   ` Darrick J. Wong
  2024-08-12 17:00   ` Matthew Wilcox
  2024-08-14  1:49 ` [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Dave Chinner
  6 siblings, 2 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-12 12:11 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, djwong, hch, brauner, david, jack, willy, yi.zhang,
	yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

When doing buffered write, we set uptodate and drity bits of the written
range separately, it holds the ifs->state_lock twice when blocksize <
folio size, which is redundant. After large folio is supported, the
spinlock could affect more about the performance, merge them could
reduce some unnecessary locking overhead and gets some performance gain.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 96600405dbb5..67d7c1c22c98 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -182,6 +182,37 @@ static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
 		ifs_set_range_dirty(folio, ifs, off, len);
 }
 
+static void ifs_set_range_dirty_uptodate(struct folio *folio,
+		struct iomap_folio_state *ifs, size_t off, size_t len)
+{
+	struct inode *inode = folio->mapping->host;
+	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
+	unsigned int first_blk = (off >> inode->i_blkbits);
+	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ifs->state_lock, flags);
+	bitmap_set(ifs->state, first_blk, nr_blks);
+	if (ifs_is_fully_uptodate(folio, ifs))
+		folio_mark_uptodate(folio);
+	bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks);
+	spin_unlock_irqrestore(&ifs->state_lock, flags);
+}
+
+static void iomap_set_range_dirty_uptodate(struct folio *folio,
+		size_t off, size_t len)
+{
+	struct iomap_folio_state *ifs = folio->private;
+
+	if (ifs)
+		ifs_set_range_dirty_uptodate(folio, ifs, off, len);
+	else
+		folio_mark_uptodate(folio);
+
+	filemap_dirty_folio(folio->mapping, folio);
+}
+
 static struct iomap_folio_state *ifs_alloc(struct inode *inode,
 		struct folio *folio, unsigned int flags)
 {
@@ -851,6 +882,8 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 		size_t copied, struct folio *folio)
 {
+	size_t from = offset_in_folio(folio, pos);
+
 	flush_dcache_folio(folio);
 
 	/*
@@ -866,9 +899,8 @@ static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	 */
 	if (unlikely(copied < len && !folio_test_uptodate(folio)))
 		return false;
-	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
-	iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
-	filemap_dirty_folio(inode->i_mapping, folio);
+
+	iomap_set_range_dirty_uptodate(folio, from, copied);
 	return true;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/6] iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits
  2024-08-12 12:11 ` [PATCH v2 6/6] iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits Zhang Yi
@ 2024-08-12 16:54   ` Darrick J. Wong
  2024-08-12 17:00   ` Matthew Wilcox
  1 sibling, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-12 16:54 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, hch, brauner, david, jack,
	willy, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:59PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> When doing buffered write, we set uptodate and drity bits of the written
> range separately, it holds the ifs->state_lock twice when blocksize <
> folio size, which is redundant. After large folio is supported, the
> spinlock could affect more about the performance, merge them could
> reduce some unnecessary locking overhead and gets some performance gain.
> 
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Seems reasonable to me
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 38 +++++++++++++++++++++++++++++++++++---
>  1 file changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 96600405dbb5..67d7c1c22c98 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -182,6 +182,37 @@ static void iomap_set_range_dirty(struct folio *folio, size_t off, size_t len)
>  		ifs_set_range_dirty(folio, ifs, off, len);
>  }
>  
> +static void ifs_set_range_dirty_uptodate(struct folio *folio,
> +		struct iomap_folio_state *ifs, size_t off, size_t len)
> +{
> +	struct inode *inode = folio->mapping->host;
> +	unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> +	unsigned int first_blk = (off >> inode->i_blkbits);
> +	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
> +	unsigned int nr_blks = last_blk - first_blk + 1;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ifs->state_lock, flags);
> +	bitmap_set(ifs->state, first_blk, nr_blks);
> +	if (ifs_is_fully_uptodate(folio, ifs))
> +		folio_mark_uptodate(folio);
> +	bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks);
> +	spin_unlock_irqrestore(&ifs->state_lock, flags);
> +}
> +
> +static void iomap_set_range_dirty_uptodate(struct folio *folio,
> +		size_t off, size_t len)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +
> +	if (ifs)
> +		ifs_set_range_dirty_uptodate(folio, ifs, off, len);
> +	else
> +		folio_mark_uptodate(folio);
> +
> +	filemap_dirty_folio(folio->mapping, folio);
> +}
> +
>  static struct iomap_folio_state *ifs_alloc(struct inode *inode,
>  		struct folio *folio, unsigned int flags)
>  {
> @@ -851,6 +882,8 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
>  static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>  		size_t copied, struct folio *folio)
>  {
> +	size_t from = offset_in_folio(folio, pos);
> +
>  	flush_dcache_folio(folio);
>  
>  	/*
> @@ -866,9 +899,8 @@ static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>  	 */
>  	if (unlikely(copied < len && !folio_test_uptodate(folio)))
>  		return false;
> -	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
> -	iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
> -	filemap_dirty_folio(inode->i_mapping, folio);
> +
> +	iomap_set_range_dirty_uptodate(folio, from, copied);
>  	return true;
>  }
>  
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/6] iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits
  2024-08-12 12:11 ` [PATCH v2 6/6] iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits Zhang Yi
  2024-08-12 16:54   ` Darrick J. Wong
@ 2024-08-12 17:00   ` Matthew Wilcox
  2024-08-13  8:15     ` Zhang Yi
  1 sibling, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2024-08-12 17:00 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	david, jack, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:59PM +0800, Zhang Yi wrote:
> @@ -866,9 +899,8 @@ static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>  	 */
>  	if (unlikely(copied < len && !folio_test_uptodate(folio)))
>  		return false;
> -	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
> -	iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
> -	filemap_dirty_folio(inode->i_mapping, folio);
> +
> +	iomap_set_range_dirty_uptodate(folio, from, copied);
>  	return true;

I wonder how often we overwrite a completely uptodate folio rather than
writing new data to a fresh folio?  iow, would this be a measurable
optimisation?

	if (folio_test_uptodate(folio))
		iomap_set_range_dirty(folio, from, copied);
	else
		iomap_set_range_dirty_uptodate(folio, from, copied);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/6] iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits
  2024-08-12 17:00   ` Matthew Wilcox
@ 2024-08-13  8:15     ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-13  8:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	david, jack, yi.zhang, chengzhihao1, yukuai3

On 2024/8/13 1:00, Matthew Wilcox wrote:
> On Mon, Aug 12, 2024 at 08:11:59PM +0800, Zhang Yi wrote:
>> @@ -866,9 +899,8 @@ static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>>  	 */
>>  	if (unlikely(copied < len && !folio_test_uptodate(folio)))
>>  		return false;
>> -	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
>> -	iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
>> -	filemap_dirty_folio(inode->i_mapping, folio);
>> +
>> +	iomap_set_range_dirty_uptodate(folio, from, copied);
>>  	return true;
> 
> I wonder how often we overwrite a completely uptodate folio rather than
> writing new data to a fresh folio?  iow, would this be a measurable
> optimisation?
> 
> 	if (folio_test_uptodate(folio))
> 		iomap_set_range_dirty(folio, from, copied);
> 	else
> 		iomap_set_range_dirty_uptodate(folio, from, copied);
> 

Thanks for the suggestion, I'm not sure how often as well, but I suppose
we could do this optimisation since I've tested it and found this is
harmless for the case of writing new data to a fresh folio. However, this
can further improves the overwrite performance, the UnixBench tests result
shows the performance gain can be increased to about ~15% on my machine
with 50GB ramdisk and xfs filesystem.

UnixBench test cmd:
 ./Run -i 1 -c 1 fstime-w

Base:
x86    File Write 1024 bufsize 2000 maxblocks       524708.0 KBps
arm64  File Write 1024 bufsize 2000 maxblocks       801965.0 KBps

After this series:
x86    File Write 1024 bufsize 2000 maxblocks       569218.0 KBps
arm64  File Write 1024 bufsize 2000 maxblocks       871605.0 KBps

After this measurable optimisation:
x86    File Write 1024 bufsize 2000 maxblocks       609620.0 KBps
arm64  File Write 1024 bufsize 2000 maxblocks       910534.0 KBps

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size
  2024-08-12 12:11 [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
                   ` (5 preceding siblings ...)
  2024-08-12 12:11 ` [PATCH v2 6/6] iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits Zhang Yi
@ 2024-08-14  1:49 ` Dave Chinner
  2024-08-14  2:14   ` Zhang Yi
  6 siblings, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2024-08-14  1:49 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On Mon, Aug 12, 2024 at 08:11:53PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Changes since v1:
>  - Patch 5 fix a stale data exposure problem pointed out by Willy, drop
>    the setting of uptodate bits after zeroing out unaligned range.
>  - As Dave suggested, in order to prevent increasing the complexity of
>    maintain the state_lock, don't just drop all the state_lock in the
>    buffered write path, patch 6 introduce a new helper to set uptodate
>    bit and dirty bits together under the state_lock, reduce one time of
>    locking per write, the benefits of performance optimization do not
>    change too much.

It's helpful to provide a lore link to the previous version so that
reviewers don't have to go looking for it themselves to remind them
of what was discussed last time.

https://lore.kernel.org/linux-xfs/20240731091305.2896873-1-yi.zhang@huaweicloud.com/T/

> This series contains some minor non-critical fixes and performance
> improvements on the filesystem with block size < folio size.
> 
> The first 4 patches fix the handling of setting and clearing folio ifs
> dirty bits when mark the folio dirty and when invalidat the folio.
> Although none of these code mistakes caused a real problem now, it's
> still deserve a fix to correct the behavior.
> 
> The second 2 patches drop the unnecessary state_lock in ifs when setting
> and clearing dirty/uptodate bits in the buffered write path, it could
> improve some (~8% on my machine) buffer write performance. I tested it
> through UnixBench on my x86_64 (Xeon Gold 6151) and arm64 (Kunpeng-920)
> virtual machine with 50GB ramdisk and xfs filesystem, the results shows
> below.
> 
> UnixBench test cmd:
>  ./Run -i 1 -c 1 fstime-w
> 
> Before:
> x86    File Write 1024 bufsize 2000 maxblocks       524708.0 KBps
> arm64  File Write 1024 bufsize 2000 maxblocks       801965.0 KBps
> 
> After:
> x86    File Write 1024 bufsize 2000 maxblocks       569218.0 KBps
> arm64  File Write 1024 bufsize 2000 maxblocks       871605.0 KBps

Those are the same performance numbers as you posted for the
previous version of the patch. How does this new version perform
given that it's a complete rework of the optimisation? It's
important to know if the changes made actually provided the benefit
we expected them to make....

i.e. this is the sort of table of results I'd like to see provided:

platform	base		v1		v2
x86		524708.0	569218.0	????
arm64		801965.0	871605.0	????

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size
  2024-08-14  1:49 ` [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Dave Chinner
@ 2024-08-14  2:14   ` Zhang Yi
  2024-08-14  2:47     ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Zhang Yi @ 2024-08-14  2:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/14 9:49, Dave Chinner wrote:
> On Mon, Aug 12, 2024 at 08:11:53PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Changes since v1:
>>  - Patch 5 fix a stale data exposure problem pointed out by Willy, drop
>>    the setting of uptodate bits after zeroing out unaligned range.
>>  - As Dave suggested, in order to prevent increasing the complexity of
>>    maintain the state_lock, don't just drop all the state_lock in the
>>    buffered write path, patch 6 introduce a new helper to set uptodate
>>    bit and dirty bits together under the state_lock, reduce one time of
>>    locking per write, the benefits of performance optimization do not
>>    change too much.
> 
> It's helpful to provide a lore link to the previous version so that
> reviewers don't have to go looking for it themselves to remind them
> of what was discussed last time.
> 
> https://lore.kernel.org/linux-xfs/20240731091305.2896873-1-yi.zhang@huaweicloud.com/T/

Sure, will add in my later iterations.

> 
>> This series contains some minor non-critical fixes and performance
>> improvements on the filesystem with block size < folio size.
>>
>> The first 4 patches fix the handling of setting and clearing folio ifs
>> dirty bits when mark the folio dirty and when invalidat the folio.
>> Although none of these code mistakes caused a real problem now, it's
>> still deserve a fix to correct the behavior.
>>
>> The second 2 patches drop the unnecessary state_lock in ifs when setting
>> and clearing dirty/uptodate bits in the buffered write path, it could
>> improve some (~8% on my machine) buffer write performance. I tested it
>> through UnixBench on my x86_64 (Xeon Gold 6151) and arm64 (Kunpeng-920)
>> virtual machine with 50GB ramdisk and xfs filesystem, the results shows
>> below.
>>
>> UnixBench test cmd:
>>  ./Run -i 1 -c 1 fstime-w
>>
>> Before:
>> x86    File Write 1024 bufsize 2000 maxblocks       524708.0 KBps
>> arm64  File Write 1024 bufsize 2000 maxblocks       801965.0 KBps
>>
>> After:
>> x86    File Write 1024 bufsize 2000 maxblocks       569218.0 KBps
>> arm64  File Write 1024 bufsize 2000 maxblocks       871605.0 KBps
> 
> Those are the same performance numbers as you posted for the
> previous version of the patch. How does this new version perform
> given that it's a complete rework of the optimisation? It's

It's not exactly the same, but the difference is small, I've updated
the performance number in this cover letter.

> important to know if the changes made actually provided the benefit
> we expected them to make....
> 
> i.e. this is the sort of table of results I'd like to see provided:
> 
> platform	base		v1		v2
> x86		524708.0	569218.0	????
> arm64		801965.0	871605.0	????
> 

 platform	base		v1		v2
 x86		524708.0	571315.0 	569218.0
 arm64		801965.0	876077.0	871605.0

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size
  2024-08-14  2:14   ` Zhang Yi
@ 2024-08-14  2:47     ` Dave Chinner
  2024-08-14  3:57       ` Zhang Yi
  0 siblings, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2024-08-14  2:47 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On Wed, Aug 14, 2024 at 10:14:01AM +0800, Zhang Yi wrote:
> On 2024/8/14 9:49, Dave Chinner wrote:
> > important to know if the changes made actually provided the benefit
> > we expected them to make....
> > 
> > i.e. this is the sort of table of results I'd like to see provided:
> > 
> > platform	base		v1		v2
> > x86		524708.0	569218.0	????
> > arm64		801965.0	871605.0	????
> > 
> 
>  platform	base		v1		v2
>  x86		524708.0	571315.0 	569218.0
>  arm64	801965.0	876077.0	871605.0

So avoiding the lock cycle in iomap_write_begin() (in patch 5) in
this partial block write workload made no difference to performance
at all, and removing a lock cycle in iomap_write_end provided all
that gain?

Is this an overwrite workload or a file extending workload? The
result implies that iomap_block_needs_zeroing() is returning false,
hence it's an overwrite workload and it's reading partial blocks
from disk. i.e. it is doing synchronous RMW cycles from the ramdisk
and so still calling the uptodate bitmap update function rather than
hitting the zeroing case and skipping it.

Hence I'm just trying to understand what the test is doing because
that tells me what the result should be...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size
  2024-08-14  2:47     ` Dave Chinner
@ 2024-08-14  3:57       ` Zhang Yi
  2024-08-14  5:16         ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Zhang Yi @ 2024-08-14  3:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/14 10:47, Dave Chinner wrote:
> On Wed, Aug 14, 2024 at 10:14:01AM +0800, Zhang Yi wrote:
>> On 2024/8/14 9:49, Dave Chinner wrote:
>>> important to know if the changes made actually provided the benefit
>>> we expected them to make....
>>>
>>> i.e. this is the sort of table of results I'd like to see provided:
>>>
>>> platform	base		v1		v2
>>> x86		524708.0	569218.0	????
>>> arm64		801965.0	871605.0	????
>>>
>>
>>  platform	base		v1		v2
>>  x86		524708.0	571315.0 	569218.0
>>  arm64	801965.0	876077.0	871605.0
> 
> So avoiding the lock cycle in iomap_write_begin() (in patch 5) in
> this partial block write workload made no difference to performance
> at all, and removing a lock cycle in iomap_write_end provided all
> that gain?

Yes.

> 
> Is this an overwrite workload or a file extending workload? The
> result implies that iomap_block_needs_zeroing() is returning false,
> hence it's an overwrite workload and it's reading partial blocks
> from disk. i.e. it is doing synchronous RMW cycles from the ramdisk
> and so still calling the uptodate bitmap update function rather than
> hitting the zeroing case and skipping it.
> 
> Hence I'm just trying to understand what the test is doing because
> that tells me what the result should be...
> 

I forgot to mentioned that I test this on xfs with 1K block size, this
is a simple case of block size < folio size that I can direct use
UnixBench.

This test first do buffered append write with bs=1K,count=2000 in the
first round, and then do overwrite from the start position with the same
parameters repetitively in 30 seconds. All the write operations are
block size aligned, so iomap_write_begin() just continue after
iomap_adjust_read_range(), don't call iomap_set_range_uptodate() to set
range uptodate originally, hence there is no difference whether with or
without patch 5 in this test case.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size
  2024-08-14  3:57       ` Zhang Yi
@ 2024-08-14  5:16         ` Dave Chinner
  2024-08-14  6:32           ` Zhang Yi
  0 siblings, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2024-08-14  5:16 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On Wed, Aug 14, 2024 at 11:57:03AM +0800, Zhang Yi wrote:
> On 2024/8/14 10:47, Dave Chinner wrote:
> > On Wed, Aug 14, 2024 at 10:14:01AM +0800, Zhang Yi wrote:
> >> On 2024/8/14 9:49, Dave Chinner wrote:
> >>> important to know if the changes made actually provided the benefit
> >>> we expected them to make....
> >>>
> >>> i.e. this is the sort of table of results I'd like to see provided:
> >>>
> >>> platform	base		v1		v2
> >>> x86		524708.0	569218.0	????
> >>> arm64		801965.0	871605.0	????
> >>>
> >>
> >>  platform	base		v1		v2
> >>  x86		524708.0	571315.0 	569218.0
> >>  arm64	801965.0	876077.0	871605.0
> > 
> > So avoiding the lock cycle in iomap_write_begin() (in patch 5) in
> > this partial block write workload made no difference to performance
> > at all, and removing a lock cycle in iomap_write_end provided all
> > that gain?
> 
> Yes.
> 
> > 
> > Is this an overwrite workload or a file extending workload? The
> > result implies that iomap_block_needs_zeroing() is returning false,
> > hence it's an overwrite workload and it's reading partial blocks
> > from disk. i.e. it is doing synchronous RMW cycles from the ramdisk
> > and so still calling the uptodate bitmap update function rather than
> > hitting the zeroing case and skipping it.
> > 
> > Hence I'm just trying to understand what the test is doing because
> > that tells me what the result should be...
> > 
> 
> I forgot to mentioned that I test this on xfs with 1K block size, this
> is a simple case of block size < folio size that I can direct use
> UnixBench.

OK. So it's an even more highly contrived microbenchmark than I
thought. :/

What is the impact on a 4kB block size filesystem running that same
1kB write test? That's going to be a far more common thing to occur
in production machines for such small IO, let's make sure that we
haven't regressed that case in optimising for this one.

> This test first do buffered append write with bs=1K,count=2000 in the
> first round, and then do overwrite from the start position with the same
> parameters repetitively in 30 seconds. All the write operations are
> block size aligned, so iomap_write_begin() just continue after
> iomap_adjust_read_range(), don't call iomap_set_range_uptodate() to set
> range uptodate originally, hence there is no difference whether with or
> without patch 5 in this test case.

Ok, so you really need to come up with an equivalent test that
exercises the paths that patch 5 modifies, because right now we have
no real idea of what the impact of that change will be...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size
  2024-08-14  5:16         ` Dave Chinner
@ 2024-08-14  6:32           ` Zhang Yi
  0 siblings, 0 replies; 42+ messages in thread
From: Zhang Yi @ 2024-08-14  6:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, linux-fsdevel, linux-kernel, djwong, hch, brauner,
	jack, willy, yi.zhang, chengzhihao1, yukuai3

On 2024/8/14 13:16, Dave Chinner wrote:
> On Wed, Aug 14, 2024 at 11:57:03AM +0800, Zhang Yi wrote:
>> On 2024/8/14 10:47, Dave Chinner wrote:
>>> On Wed, Aug 14, 2024 at 10:14:01AM +0800, Zhang Yi wrote:
>>>> On 2024/8/14 9:49, Dave Chinner wrote:
>>>>> important to know if the changes made actually provided the benefit
>>>>> we expected them to make....
>>>>>
>>>>> i.e. this is the sort of table of results I'd like to see provided:
>>>>>
>>>>> platform	base		v1		v2
>>>>> x86		524708.0	569218.0	????
>>>>> arm64		801965.0	871605.0	????
>>>>>
>>>>
>>>>  platform	base		v1		v2
>>>>  x86		524708.0	571315.0 	569218.0
>>>>  arm64	801965.0	876077.0	871605.0
>>>
>>> So avoiding the lock cycle in iomap_write_begin() (in patch 5) in
>>> this partial block write workload made no difference to performance
>>> at all, and removing a lock cycle in iomap_write_end provided all
>>> that gain?
>>
>> Yes.
>>
>>>
>>> Is this an overwrite workload or a file extending workload? The
>>> result implies that iomap_block_needs_zeroing() is returning false,
>>> hence it's an overwrite workload and it's reading partial blocks
>>> from disk. i.e. it is doing synchronous RMW cycles from the ramdisk
>>> and so still calling the uptodate bitmap update function rather than
>>> hitting the zeroing case and skipping it.
>>>
>>> Hence I'm just trying to understand what the test is doing because
>>> that tells me what the result should be...
>>>
>>
>> I forgot to mentioned that I test this on xfs with 1K block size, this
>> is a simple case of block size < folio size that I can direct use
>> UnixBench.
> 
> OK. So it's an even more highly contrived microbenchmark than I
> thought. :/
> 
> What is the impact on a 4kB block size filesystem running that same
> 1kB write test? That's going to be a far more common thing to occur
> in production machines for such small IO, 

Yeah, I agree with you, the original test case I want to test is
buffered overwrite with bs=4K to the 4KB filesystem which has existing
larger size folios (> 4KB), this is one kind of common case of
block size < folio size after large folio is enabled. But I don't find
a benchmark tool can do this test easily, so I use the above tests
parameters to simulate this case.

> let's make sure that we
> haven't regressed that case in optimising for this one.

Sure, I will test this case either.

> 
>> This test first do buffered append write with bs=1K,count=2000 in the
>> first round, and then do overwrite from the start position with the same
>> parameters repetitively in 30 seconds. All the write operations are
>> block size aligned, so iomap_write_begin() just continue after
>> iomap_adjust_read_range(), don't call iomap_set_range_uptodate() to set
>> range uptodate originally, hence there is no difference whether with or
>> without patch 5 in this test case.
> 
> Ok, so you really need to come up with an equivalent test that
> exercises the paths that patch 5 modifies, because right now we have
> no real idea of what the impact of that change will be...
> 

Sure.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-08-17  7:16 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-12 12:11 [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
2024-08-12 12:11 ` [PATCH v2 1/6] iomap: correct the range of a partial dirty clear Zhang Yi
2024-08-12 16:33   ` Darrick J. Wong
2024-08-13  2:14     ` Zhang Yi
2024-08-14  1:53     ` Dave Chinner
2024-08-12 12:11 ` [PATCH v2 2/6] iomap: support invalidating partial folios Zhang Yi
2024-08-12 16:55   ` Darrick J. Wong
2024-08-12 12:11 ` [PATCH v2 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio Zhang Yi
2024-08-12 12:47   ` yangerkun
2024-08-13  2:21     ` Zhang Yi
2024-08-14  5:32   ` Christoph Hellwig
2024-08-14  7:08     ` Zhang Yi
2024-08-15  6:00       ` Christoph Hellwig
2024-08-16  1:44         ` Zhang Yi
2024-08-17  4:27     ` Zhang Yi
2024-08-17  4:42       ` Matthew Wilcox
2024-08-17  6:16         ` Zhang Yi
2024-08-12 12:11 ` [PATCH v2 4/6] iomap: correct the dirty length in page mkwrite Zhang Yi
2024-08-12 16:45   ` Darrick J. Wong
2024-08-13  2:49     ` Zhang Yi
2024-08-14  5:36   ` Christoph Hellwig
2024-08-14  7:49     ` Zhang Yi
2024-08-15  5:59       ` Christoph Hellwig
2024-08-16  2:19         ` Zhang Yi
2024-08-17  4:45   ` Matthew Wilcox
2024-08-17  6:43     ` Zhang Yi
2024-08-12 12:11 ` [PATCH v2 5/6] iomap: don't mark blocks uptodate after partial zeroing Zhang Yi
2024-08-12 16:49   ` Darrick J. Wong
2024-08-13  3:01     ` Zhang Yi
2024-08-14  5:39   ` Christoph Hellwig
2024-08-17  4:48   ` Matthew Wilcox
2024-08-17  7:16     ` Zhang Yi
2024-08-12 12:11 ` [PATCH v2 6/6] iomap: reduce unnecessary state_lock when setting ifs uptodate and dirty bits Zhang Yi
2024-08-12 16:54   ` Darrick J. Wong
2024-08-12 17:00   ` Matthew Wilcox
2024-08-13  8:15     ` Zhang Yi
2024-08-14  1:49 ` [PATCH v2 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Dave Chinner
2024-08-14  2:14   ` Zhang Yi
2024-08-14  2:47     ` Dave Chinner
2024-08-14  3:57       ` Zhang Yi
2024-08-14  5:16         ` Dave Chinner
2024-08-14  6:32           ` Zhang Yi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).