All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible
Date: Thu, 17 Dec 2020 12:57:37 +0800	[thread overview]
Message-ID: <20201217045737.48100-5-wqu@suse.com> (raw)
In-Reply-To: <20201217045737.48100-1-wqu@suse.com>

[BUG]
With current subpage RW patchset, the following script can lead to
filesystem hang:
  # mkfs.btrfs -f -s 4k $dev
  # mount $dev -o nospace_cache $mnt
  # fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt

The file system will hang at wait_event() of
btrfs_start_ordered_extent().

[CAUSE]
The root cause is, btrfs_invalidatepage() is freeing page::private which
still has subpage dirty bit set.

The offending situation happens like this:
btrfs_fllocate()
|- btrfs_zero_range()
   |- btrfs_punch_hole_lock_range()
      |- truncate_pagecache_range()
         |- btrfs_invalidatepage()

The involved range looks like:

0	32K	64K	96K	128K
	|///////||//////|
	| Range to drop |

For the [32K, 64K) range, since the offset is 32K, the page won't be
invalidated.

But for the [64K, 96K) range, the offset is 0, current
btrfs_invalidatepage() will call clear_page_extent_mapped() which will
detach page::private, making the subpage dirty bitmap being cleared.

This prevents later __extent_writepage_io() to locate any range to
write, thus no way to wake up the ordered extents.

[FIX]
To fix the problem this patch will:
- Only clear page status and detach page private when the full page
  is invalidated

- Change how we handle unfinished ordered extent
  If there is any ordered extent unfinished in the page range, we can't
  call clear_extent_bit() with delete == true.

[REASON FOR RFC]
There is still uncertainty around the btrfs_releasepage() call.

1. Why we need btrfs_releasepage() call for non-full-page condition?
   Other fs (aka. xfs) just exit without doing special handling if
   invalidatepage() is called with part of the page.

   Thus I didn't completely understand why btrfs_releasepage() here is
   needed for non-full page call.

2. Why "if (offset)" is not causing problem for current code?
   This existing if (offset) call can be skipped for cases like
   offset == 0 length == 2K.
   As MM layer can call invalidatepage() with unaligned offset/length,
   for cases like truncate_inode_pages_range().
   This will make btrfs_invalidatepage() to truncate the whole page when
   we only need to zero part of the page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eb493fbb65f9..872c5309b4ca 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8180,7 +8180,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	int inode_evicting = inode->vfs_inode.i_state & I_FREEING;
 	bool cleared_private2;
 	bool found_ordered = false;
-	bool completed_ordered = false;
+	bool incompleted_ordered = false;
 
 	/*
 	 * we have the page locked, so new writeback can't start,
@@ -8191,7 +8191,13 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 	 */
 	wait_on_page_writeback(page);
 
-	if (offset) {
+	/*
+	 * The range doesn't cover the full page, just let btrfs_releasepage() to
+	 * check if we can release the extent mapping.
+	 * Any locked/pinned/logged extent map would prevent us freeing the
+	 * extent mapping.
+	 */
+	if (!(offset == 0 && length == PAGE_SIZE)) {
 		btrfs_releasepage(page, GFP_NOFS);
 		return;
 	}
@@ -8208,9 +8214,10 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		end = min(page_end,
 			  ordered->file_offset + ordered->num_bytes - 1);
 		/*
-		 * IO on this page will never be started, so we need to account
-		 * for any ordered extents now. Don't clear EXTENT_DELALLOC_NEW
-		 * here, must leave that up for the ordered extent completion.
+		 * IO on this ordered extent will never be started, so we need
+		 * to account for any ordered extents now. Don't clear
+		 * EXTENT_DELALLOC_NEW here, must leave that up for the
+		 * ordered extent completion.
 		 */
 		if (!inode_evicting)
 			clear_extent_bit(tree, start, end,
@@ -8234,7 +8241,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 							   start,
 							   end - start + 1, 1)) {
 				btrfs_finish_ordered_io(ordered);
-				completed_ordered = true;
+			} else {
+				incompleted_ordered = true;
 			}
 		}
 
@@ -8276,7 +8284,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		 * is cleared if we don't delete, otherwise it can lead to
 		 * corruptions if the i_size is extented later.
 		 */
-		if (found_ordered && !completed_ordered)
+		if (found_ordered && incompleted_ordered)
 			delete = false;
 		clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED |
 				 EXTENT_DELALLOC | EXTENT_UPTODATE |
@@ -8286,6 +8294,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
 		__btrfs_releasepage(page, GFP_NOFS);
 	}
 
+	ClearPagePrivate2(page);
 	ClearPageChecked(page);
 	clear_page_extent_mapped(page);
 }
-- 
2.29.2


  parent reply	other threads:[~2020-12-17  4:59 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17  4:57 [PATCH 0/4] btrfs: inode: btrfs_invalidatepage() related refactor and fix for subpage Qu Wenruo
2020-12-17  4:57 ` [PATCH 1/4] btrfs: inode: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
2020-12-17  4:57 ` [PATCH 2/4] btrfs: inode: remove variable shadowing " Qu Wenruo
2020-12-17  5:38   ` Su Yue
2020-12-17  5:42     ` Qu Wenruo
2020-12-17  6:08       ` Su Yue
2020-12-17  5:55   ` Nikolay Borisov
2020-12-17  5:59     ` Nikolay Borisov
2020-12-17  6:13       ` Qu Wenruo
2020-12-17 12:29         ` David Sterba
2020-12-17  4:57 ` [PATCH 3/4] btrfs: inode: move the timing of TestClearPagePrivate() " Qu Wenruo
2020-12-17  4:57 ` Qu Wenruo [this message]
2020-12-17 11:20   ` [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible Filipe Manana
2020-12-22  4:38     ` Qu Wenruo
2020-12-17 14:51   ` Josef Bacik
2020-12-18  0:42     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201217045737.48100-5-wqu@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.