linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] btrfs: small cleanups to buffered write path
@ 2024-09-30 22:47 Qu Wenruo
  2024-09-30 22:47 ` [PATCH v2 1/2] btrfs: remove the dirty_page local variable Qu Wenruo
  2024-09-30 22:47 ` [PATCH v2 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo
  0 siblings, 2 replies; 6+ messages in thread
From: Qu Wenruo @ 2024-09-30 22:47 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Utilize folio APIs instead of the old page APIs for the 2nd patch

These two are small cleanups to the buffered write path, inspired by the
btrfs test failure of generic/563 with 4K sector size and 64K page size.

The root cause of that failure is, btrfs can't avoid full page read when
the dirty range covers sectors, but not yet the full page.

This is only the preparation part, we can not yet switch to the skip the
full page read, or it will still lead to incorrect data.

The full fix needs extra co-operation between the subpage read and
write, until then prepare_uptodate_page() still needs to read the full
page.

Qu Wenruo (2):
  btrfs: remove the dirty_page local variable
  btrfs: simplify the page uptodate preparation for prepare_pages()

 fs/btrfs/file.c             | 87 +++++++++++++++++++------------------
 fs/btrfs/file.h             |  2 +-
 fs/btrfs/free-space-cache.c |  2 +-
 3 files changed, 46 insertions(+), 45 deletions(-)

-- 
2.46.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] btrfs: remove the dirty_page local variable
  2024-09-30 22:47 [PATCH v2 0/2] btrfs: small cleanups to buffered write path Qu Wenruo
@ 2024-09-30 22:47 ` Qu Wenruo
  2024-09-30 22:47 ` [PATCH v2 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo
  1 sibling, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2024-09-30 22:47 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

Inside btrfs_buffered_write(), we have a local variable @dirty_pages,
recording the number of pages we dirtied in the current iteration.

However we do not really need that variable, since it can be calculated
from @pos and @copied.

In fact there is already a problem inside the short copy path, where we
use @dirty_pages to calculate the range we need to release.
But that usage assumes sectorsize == PAGE_SIZE, which is no longer true.

Instead of keeping @dirty_pages and cause incorrect usage, just
calculate the number of dirtied pages inside btrfs_dirty_pages().

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c             | 19 +++++++------------
 fs/btrfs/file.h             |  2 +-
 fs/btrfs/free-space-cache.c |  2 +-
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4fb521d91b06..9555a3485670 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -124,12 +124,14 @@ static void btrfs_drop_pages(struct btrfs_fs_info *fs_info,
  * - Update inode size for past EOF write
  */
 int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
-		      size_t num_pages, loff_t pos, size_t write_bytes,
+		      loff_t pos, size_t write_bytes,
 		      struct extent_state **cached, bool noreserve)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	int ret = 0;
 	int i;
+	const int num_pages = (round_up(pos + write_bytes, PAGE_SIZE) -
+			       round_down(pos, PAGE_SIZE)) >> PAGE_SHIFT;
 	u64 num_bytes;
 	u64 start_pos;
 	u64 end_of_last_block;
@@ -1242,7 +1244,6 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 					 offset);
 		size_t num_pages;
 		size_t reserve_bytes;
-		size_t dirty_pages;
 		size_t copied;
 		size_t dirty_sectors;
 		size_t num_sectors;
@@ -1361,11 +1362,8 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 		if (copied == 0) {
 			force_page_uptodate = true;
 			dirty_sectors = 0;
-			dirty_pages = 0;
 		} else {
 			force_page_uptodate = false;
-			dirty_pages = DIV_ROUND_UP(copied + offset,
-						   PAGE_SIZE);
 		}
 
 		if (num_sectors > dirty_sectors) {
@@ -1375,13 +1373,10 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 				btrfs_delalloc_release_metadata(BTRFS_I(inode),
 							release_bytes, true);
 			} else {
-				u64 __pos;
-
-				__pos = round_down(pos,
-						   fs_info->sectorsize) +
-					(dirty_pages << PAGE_SHIFT);
+				u64 release_start = round_up(pos + copied,
+							     fs_info->sectorsize);
 				btrfs_delalloc_release_space(BTRFS_I(inode),
-						data_reserved, __pos,
+						data_reserved, release_start,
 						release_bytes, true);
 			}
 		}
@@ -1390,7 +1385,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i)
 					fs_info->sectorsize);
 
 		ret = btrfs_dirty_pages(BTRFS_I(inode), pages,
-					dirty_pages, pos, copied,
+					pos, copied,
 					&cached_state, only_release_metadata);
 
 		/*
diff --git a/fs/btrfs/file.h b/fs/btrfs/file.h
index 912254e653cf..c23d0bf42598 100644
--- a/fs/btrfs/file.h
+++ b/fs/btrfs/file.h
@@ -35,7 +35,7 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 			    const struct btrfs_ioctl_encoded_io_args *encoded);
 int btrfs_release_file(struct inode *inode, struct file *file);
 int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
-		      size_t num_pages, loff_t pos, size_t write_bytes,
+		      loff_t pos, size_t write_bytes,
 		      struct extent_state **cached, bool noreserve);
 int btrfs_fdatawrite_range(struct btrfs_inode *inode, loff_t start, loff_t end);
 int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index f4bcb2530660..4a3a6db91878 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1458,7 +1458,7 @@ static int __btrfs_write_out_cache(struct inode *inode,
 
 	/* Everything is written out, now we dirty the pages in the file. */
 	ret = btrfs_dirty_pages(BTRFS_I(inode), io_ctl->pages,
-				io_ctl->num_pages, 0, i_size_read(inode),
+				0, i_size_read(inode),
 				&cached_state, false);
 	if (ret)
 		goto out_nospc;
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] btrfs: simplify the page uptodate preparation for prepare_pages()
  2024-09-30 22:47 [PATCH v2 0/2] btrfs: small cleanups to buffered write path Qu Wenruo
  2024-09-30 22:47 ` [PATCH v2 1/2] btrfs: remove the dirty_page local variable Qu Wenruo
@ 2024-09-30 22:47 ` Qu Wenruo
  2024-10-01 15:30   ` David Sterba
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2024-09-30 22:47 UTC (permalink / raw)
  To: linux-btrfs

Currently inside prepare_pages(), we handle the leading and tailing page
differently, and skip the middle pages (if any).

This is to avoid reading pages which is fully covered by the dirty
range.

Refactor the code by moving all checks (alignment check, range check,
force read check) into prepare_uptodate_page().

So that prepare_pages() only need to iterate all the pages
unconditionally.

And since we're here, also update prepare_uptodate_page() to use
folio API other than the old page API.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 68 +++++++++++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9555a3485670..c5d52fcee0be 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -858,36 +858,46 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
  */
 static int prepare_uptodate_page(struct inode *inode,
 				 struct page *page, u64 pos,
-				 bool force_uptodate)
+				 u64 len, bool force_uptodate)
 {
 	struct folio *folio = page_folio(page);
+	u64 clamp_start = max_t(u64, pos, folio_pos(folio));
+	u64 clamp_end = min_t(u64, pos + len, folio_pos(folio) + folio_size(folio));
 	int ret = 0;
 
-	if (((pos & (PAGE_SIZE - 1)) || force_uptodate) &&
-	    !PageUptodate(page)) {
-		ret = btrfs_read_folio(NULL, folio);
-		if (ret)
-			return ret;
-		lock_page(page);
-		if (!PageUptodate(page)) {
-			unlock_page(page);
-			return -EIO;
-		}
+	if (folio_test_uptodate(folio))
+		return 0;
 
-		/*
-		 * Since btrfs_read_folio() will unlock the folio before it
-		 * returns, there is a window where btrfs_release_folio() can be
-		 * called to release the page.  Here we check both inode
-		 * mapping and PagePrivate() to make sure the page was not
-		 * released.
-		 *
-		 * The private flag check is essential for subpage as we need
-		 * to store extra bitmap using folio private.
-		 */
-		if (page->mapping != inode->i_mapping || !folio_test_private(folio)) {
-			unlock_page(page);
-			return -EAGAIN;
-		}
+	if (force_uptodate)
+		goto read;
+
+	/* The dirty range fully cover the page, no need to read it out. */
+	if (IS_ALIGNED(clamp_start, PAGE_SIZE) &&
+	    IS_ALIGNED(clamp_end, PAGE_SIZE))
+		return 0;
+read:
+	ret = btrfs_read_folio(NULL, folio);
+	if (ret)
+		return ret;
+	folio_lock(folio);
+	if (!folio_test_uptodate(folio)) {
+		folio_unlock(folio);
+		return -EIO;
+	}
+
+	/*
+	 * Since btrfs_read_folio() will unlock the folio before it
+	 * returns, there is a window where btrfs_release_folio() can be
+	 * called to release the page.  Here we check both inode
+	 * mapping and PagePrivate() to make sure the page was not
+	 * released.
+	 *
+	 * The private flag check is essential for subpage as we need
+	 * to store extra bitmap using folio private.
+	 */
+	if (page->mapping != inode->i_mapping || !folio_test_private(folio)) {
+		folio_unlock(folio);
+		return -EAGAIN;
 	}
 	return 0;
 }
@@ -949,12 +959,8 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 			goto fail;
 		}
 
-		if (i == 0)
-			ret = prepare_uptodate_page(inode, pages[i], pos,
-						    force_uptodate);
-		if (!ret && i == num_pages - 1)
-			ret = prepare_uptodate_page(inode, pages[i],
-						    pos + write_bytes, false);
+		ret = prepare_uptodate_page(inode, pages[i], pos, write_bytes,
+					    force_uptodate);
 		if (ret) {
 			put_page(pages[i]);
 			if (!nowait && ret == -EAGAIN) {
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] btrfs: simplify the page uptodate preparation for prepare_pages()
  2024-09-30 22:47 ` [PATCH v2 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo
@ 2024-10-01 15:30   ` David Sterba
  2024-10-01 21:11     ` Qu Wenruo
  0 siblings, 1 reply; 6+ messages in thread
From: David Sterba @ 2024-10-01 15:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Oct 01, 2024 at 08:17:11AM +0930, Qu Wenruo wrote:
> +	if (force_uptodate)
> +		goto read;
> +
> +	/* The dirty range fully cover the page, no need to read it out. */
> +	if (IS_ALIGNED(clamp_start, PAGE_SIZE) &&
> +	    IS_ALIGNED(clamp_end, PAGE_SIZE))
> +		return 0;
> +read:

I've commented under v1 and will repeat it here so it does not get lost,
this if and goto is not necessary here and can be a normal if () { ...  }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] btrfs: simplify the page uptodate preparation for prepare_pages()
  2024-10-01 15:30   ` David Sterba
@ 2024-10-01 21:11     ` Qu Wenruo
  2024-10-02 13:04       ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2024-10-01 21:11 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



在 2024/10/2 01:00, David Sterba 写道:
> On Tue, Oct 01, 2024 at 08:17:11AM +0930, Qu Wenruo wrote:
>> +	if (force_uptodate)
>> +		goto read;
>> +
>> +	/* The dirty range fully cover the page, no need to read it out. */
>> +	if (IS_ALIGNED(clamp_start, PAGE_SIZE) &&
>> +	    IS_ALIGNED(clamp_end, PAGE_SIZE))
>> +		return 0;
>> +read:
>
> I've commented under v1 and will repeat it here so it does not get lost,
> this if and goto is not necessary here and can be a normal if () { ...  }
>

This is not that simple.

If @force_uptodate we need to read the folio no matter what, meanwhile
if the range is not aligned, we also need to read the folio.

And the @force_uptodate check should go before the alignment check, thus
I'm wondering if there is any better alternative.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] btrfs: simplify the page uptodate preparation for prepare_pages()
  2024-10-01 21:11     ` Qu Wenruo
@ 2024-10-02 13:04       ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2024-10-02 13:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Wed, Oct 02, 2024 at 06:41:58AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/10/2 01:00, David Sterba 写道:
> > On Tue, Oct 01, 2024 at 08:17:11AM +0930, Qu Wenruo wrote:
> >> +	if (force_uptodate)
> >> +		goto read;
> >> +
> >> +	/* The dirty range fully cover the page, no need to read it out. */
> >> +	if (IS_ALIGNED(clamp_start, PAGE_SIZE) &&
> >> +	    IS_ALIGNED(clamp_end, PAGE_SIZE))
> >> +		return 0;
> >> +read:
> >
> > I've commented under v1 and will repeat it here so it does not get lost,
> > this if and goto is not necessary here and can be a normal if () { ...  }
> >
> 
> This is not that simple.
> 
> If @force_uptodate we need to read the folio no matter what, meanwhile
> if the range is not aligned, we also need to read the folio.
> 
> And the @force_uptodate check should go before the alignment check, thus
> I'm wondering if there is any better alternative.

if (!force_update) {
	/* The dirty range fully cover the page, no need to read it out. */
	if (IS_ALIGNED(clamp_start, PAGE_SIZE) &&
	    IS_ALIGNED(clamp_end, PAGE_SIZE))
		return 0;
}



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-10-02 13:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-30 22:47 [PATCH v2 0/2] btrfs: small cleanups to buffered write path Qu Wenruo
2024-09-30 22:47 ` [PATCH v2 1/2] btrfs: remove the dirty_page local variable Qu Wenruo
2024-09-30 22:47 ` [PATCH v2 2/2] btrfs: simplify the page uptodate preparation for prepare_pages() Qu Wenruo
2024-10-01 15:30   ` David Sterba
2024-10-01 21:11     ` Qu Wenruo
2024-10-02 13:04       ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).