public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: refactor how we prepare pages for btrfs_buffered_write()
@ 2020-08-07  7:27 Qu Wenruo
  2020-08-13 11:16 ` Qu Wenruo
  0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2020-08-07  7:27 UTC (permalink / raw)
  To: linux-btrfs

When we prepare pages for buffered write, we need to ensure that if
we're partially dirting a page, then it must be uptodate to make sure it
has the old content read from disk.

While for pages that is completely inside the write range, we don't need
to make it uptodate at all, thus we can skip one page read.

The old method uses @force_page_uptodate flag, but in fact it doesn't
make any sense at all.
In parepare_uptodate_page() it has the check to ensure we always force
a paritially written page to be uptodate.

So this patch will refactor the mess, to make it more easier to read by:
- Remove the @force_page_uptodate bit
  It makes no difference at all

- Remove the @pos and @force_uptodate for prepare_uptodate_page()
  Now the caller, parepare_pages() will determine if it's necessary.

- Add more comment for parepare_pages()
  This will explain why we don't need some pages to be uptodate.

- Use page_offset() to be more user-friendly
  After find_or_create_page(), page_offset() will return the offset
  inside the address space, thus can be used directly to be more
  reader-friendly.

The new code uses the following method to do such check:

	full_dirty_page_start = round_up(pos, fs_info->sectorsize);
	full_dirty_page_end = round_down(pos + write_bytes,
					 fs_info->sectorsize) - 1;
	for (i = 0; i < num_pages; i++) {
		...
		if (!err && !(page_offset >= full_dirty_page_start &&
			      page_offset <= full_dirty_page_end))
			err = prepare_uptodate_page(inode, pages[i]);
	}

Which can handle all the possible cases like:
- Dirty range in side a page
  || |///| ||
  Then @full_dirty_page_start > @full_dirty_page_end, and it result will
  always be (!err && !(false))

- Dirty range across one page boundary
  ||   |///||//|     ||
  Then @full_dirty_page_start == @full_dirty_page_end + 1, the same
  as above case.

- Dirty range covers a full page
  ||   |///||////////||//|     ||
  Then only for the 2nd page meet the condition, and skip the
  prepare_uptodate_page() call.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 841c516079a9..705ebe709e8d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1378,13 +1378,11 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
  * on success we return a locked page and 0
  */
 static int prepare_uptodate_page(struct inode *inode,
-				 struct page *page, u64 pos,
-				 bool force_uptodate)
+				 struct page *page)
 {
 	int ret = 0;
 
-	if (((pos & (PAGE_SIZE - 1)) || force_uptodate) &&
-	    !PageUptodate(page)) {
+	if (!PageUptodate(page)) {
 		ret = btrfs_readpage(NULL, page);
 		if (ret)
 			return ret;
@@ -1406,14 +1404,29 @@ static int prepare_uptodate_page(struct inode *inode,
  */
 static noinline int prepare_pages(struct inode *inode, struct page **pages,
 				  size_t num_pages, loff_t pos,
-				  size_t write_bytes, bool force_uptodate)
+				  size_t write_bytes)
 {
-	int i;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 	unsigned long index = pos >> PAGE_SHIFT;
 	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
+	u64 full_dirty_page_start;
+	u64 full_dirty_page_end;
 	int err = 0;
 	int faili;
+	int i;
 
+	/*
+	 * || = page boundary
+	 *
+	 *     @pos                   @pos + write_bytes
+	 * ||  |///||//////||/////||//|   ||
+	 *	    \		  /
+	 *           In this range, we don't need to the page to
+	 *           be uptodate at all.
+	 */
+	full_dirty_page_start = round_up(pos, fs_info->sectorsize);
+	full_dirty_page_end = round_down(pos + write_bytes,
+					 fs_info->sectorsize) - 1;
 	for (i = 0; i < num_pages; i++) {
 again:
 		pages[i] = find_or_create_page(inode->i_mapping, index + i,
@@ -1424,12 +1437,9 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 			goto fail;
 		}
 
-		if (i == 0)
-			err = prepare_uptodate_page(inode, pages[i], pos,
-						    force_uptodate);
-		if (!err && i == num_pages - 1)
-			err = prepare_uptodate_page(inode, pages[i],
-						    pos + write_bytes, false);
+		if (!err && !(page_offset(pages[i]) >= full_dirty_page_start &&
+			      page_offset(pages[i]) <= full_dirty_page_end))
+			err = prepare_uptodate_page(inode, pages[i]);
 		if (err) {
 			put_page(pages[i]);
 			if (err == -EAGAIN) {
@@ -1638,7 +1648,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	int nrptrs;
 	int ret = 0;
 	bool only_release_metadata = false;
-	bool force_page_uptodate = false;
 
 	nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
 			PAGE_SIZE / (sizeof(struct page *)));
@@ -1727,8 +1736,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		 * contents of pages from loop to loop
 		 */
 		ret = prepare_pages(inode, pages, num_pages,
-				    pos, write_bytes,
-				    force_page_uptodate);
+				    pos, write_bytes);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
@@ -1763,11 +1771,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 			nrptrs = 1;
 
 		if (copied == 0) {
-			force_page_uptodate = true;
 			dirty_sectors = 0;
 			dirty_pages = 0;
 		} else {
-			force_page_uptodate = false;
 			dirty_pages = DIV_ROUND_UP(copied + offset,
 						   PAGE_SIZE);
 		}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-13 11:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-07  7:27 [PATCH] btrfs: refactor how we prepare pages for btrfs_buffered_write() Qu Wenruo
2020-08-13 11:16 ` Qu Wenruo
2020-08-13 11:22   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox