From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible
Date: Thu, 17 Dec 2020 12:57:37 +0800 [thread overview]
Message-ID: <20201217045737.48100-5-wqu@suse.com> (raw)
In-Reply-To: <20201217045737.48100-1-wqu@suse.com>
[BUG]
With current subpage RW patchset, the following script can lead to
filesystem hang:
# mkfs.btrfs -f -s 4k $dev
# mount $dev -o nospace_cache $mnt
# fsstress -w -n 100 -p 1 -s 1608140256 -v -d $mnt
The file system will hang at wait_event() of
btrfs_start_ordered_extent().
[CAUSE]
The root cause is, btrfs_invalidatepage() is freeing page::private which
still has subpage dirty bit set.
The offending situation happens like this:
btrfs_fllocate()
|- btrfs_zero_range()
|- btrfs_punch_hole_lock_range()
|- truncate_pagecache_range()
|- btrfs_invalidatepage()
The involved range looks like:
0 32K 64K 96K 128K
|///////||//////|
| Range to drop |
For the [32K, 64K) range, since the offset is 32K, the page won't be
invalidated.
But for the [64K, 96K) range, the offset is 0, current
btrfs_invalidatepage() will call clear_page_extent_mapped() which will
detach page::private, making the subpage dirty bitmap being cleared.
This prevents later __extent_writepage_io() to locate any range to
write, thus no way to wake up the ordered extents.
[FIX]
To fix the problem this patch will:
- Only clear page status and detach page private when the full page
is invalidated
- Change how we handle unfinished ordered extent
If there is any ordered extent unfinished in the page range, we can't
call clear_extent_bit() with delete == true.
[REASON FOR RFC]
There is still uncertainty around the btrfs_releasepage() call.
1. Why we need btrfs_releasepage() call for non-full-page condition?
Other fs (aka. xfs) just exit without doing special handling if
invalidatepage() is called with part of the page.
Thus I didn't completely understand why btrfs_releasepage() here is
needed for non-full page call.
2. Why "if (offset)" is not causing problem for current code?
This existing if (offset) call can be skipped for cases like
offset == 0 length == 2K.
As MM layer can call invalidatepage() with unaligned offset/length,
for cases like truncate_inode_pages_range().
This will make btrfs_invalidatepage() to truncate the whole page when
we only need to zero part of the page.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index eb493fbb65f9..872c5309b4ca 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8180,7 +8180,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
int inode_evicting = inode->vfs_inode.i_state & I_FREEING;
bool cleared_private2;
bool found_ordered = false;
- bool completed_ordered = false;
+ bool incompleted_ordered = false;
/*
* we have the page locked, so new writeback can't start,
@@ -8191,7 +8191,13 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
*/
wait_on_page_writeback(page);
- if (offset) {
+ /*
+ * The range doesn't cover the full page, just let btrfs_releasepage() to
+ * check if we can release the extent mapping.
+ * Any locked/pinned/logged extent map would prevent us freeing the
+ * extent mapping.
+ */
+ if (!(offset == 0 && length == PAGE_SIZE)) {
btrfs_releasepage(page, GFP_NOFS);
return;
}
@@ -8208,9 +8214,10 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
end = min(page_end,
ordered->file_offset + ordered->num_bytes - 1);
/*
- * IO on this page will never be started, so we need to account
- * for any ordered extents now. Don't clear EXTENT_DELALLOC_NEW
- * here, must leave that up for the ordered extent completion.
+ * IO on this ordered extent will never be started, so we need
+ * to account for any ordered extents now. Don't clear
+ * EXTENT_DELALLOC_NEW here, must leave that up for the
+ * ordered extent completion.
*/
if (!inode_evicting)
clear_extent_bit(tree, start, end,
@@ -8234,7 +8241,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
start,
end - start + 1, 1)) {
btrfs_finish_ordered_io(ordered);
- completed_ordered = true;
+ } else {
+ incompleted_ordered = true;
}
}
@@ -8276,7 +8284,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
* is cleared if we don't delete, otherwise it can lead to
* corruptions if the i_size is extented later.
*/
- if (found_ordered && !completed_ordered)
+ if (found_ordered && incompleted_ordered)
delete = false;
clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED |
EXTENT_DELALLOC | EXTENT_UPTODATE |
@@ -8286,6 +8294,7 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset,
__btrfs_releasepage(page, GFP_NOFS);
}
+ ClearPagePrivate2(page);
ClearPageChecked(page);
clear_page_extent_mapped(page);
}
--
2.29.2
next prev parent reply other threads:[~2020-12-17 4:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-17 4:57 [PATCH 0/4] btrfs: inode: btrfs_invalidatepage() related refactor and fix for subpage Qu Wenruo
2020-12-17 4:57 ` [PATCH 1/4] btrfs: inode: use min() to replace open-code in btrfs_invalidatepage() Qu Wenruo
2020-12-17 4:57 ` [PATCH 2/4] btrfs: inode: remove variable shadowing " Qu Wenruo
2020-12-17 5:38 ` Su Yue
2020-12-17 5:42 ` Qu Wenruo
2020-12-17 6:08 ` Su Yue
2020-12-17 5:55 ` Nikolay Borisov
2020-12-17 5:59 ` Nikolay Borisov
2020-12-17 6:13 ` Qu Wenruo
2020-12-17 12:29 ` David Sterba
2020-12-17 4:57 ` [PATCH 3/4] btrfs: inode: move the timing of TestClearPagePrivate() " Qu Wenruo
2020-12-17 4:57 ` Qu Wenruo [this message]
2020-12-17 11:20 ` [PATCH RFC 4/4] btrfs: inode: make btrfs_invalidatepage() to be subpage compatible Filipe Manana
2020-12-22 4:38 ` Qu Wenruo
2020-12-17 14:51 ` Josef Bacik
2020-12-18 0:42 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201217045737.48100-5-wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.