From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 27/42] btrfs: make __extent_writepage_io() only submit dirty range for subpage
Date: Thu, 15 Apr 2021 13:04:33 +0800 [thread overview]
Message-ID: <20210415050448.267306-28-wqu@suse.com> (raw)
In-Reply-To: <20210415050448.267306-1-wqu@suse.com>
__extent_writepage_io() function originally just iterate through all the
extent maps of a page, and submit any regular extents.
This is fine for sectorsize == PAGE_SIZE case, as if a page is dirty, we
need to submit the only sector contained in the page.
But for subpage case, one dirty page can contain several clean sectors
with at least one dirty sector.
If __extent_writepage_io() still submit all regular extent maps, it can
submit data which is already written to disk.
And since such already written data won't have corresponding ordered
extents, it will trigger a BUG_ON() in btrfs_csum_one_bio().
Change the behavior of __extent_writepage_io() by finding the first
dirty byte in the page, and only submit the dirty range other than the
full extent.
Since we're also here, also modify the following calls to be subpage
compatible:
- SetPageError()
- end_page_writeback()
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 100 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 95 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c593071fa8c1..be825b73ee43 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3728,6 +3728,74 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
return 0;
}
+/*
+ * To find the first byte we need to write.
+ *
+ * For subpage, one page can contain several sectors, and
+ * __extent_writepage_io() will just grab all extent maps in the page
+ * range and try to submit all non-inline/non-compressed extents.
+ *
+ * This is a big problem for subpage, we shouldn't re-submit already written
+ * data at all.
+ * This function will lookup subpage dirty bit to find which range we really
+ * need to submit.
+ *
+ * Return the next dirty range in [@start, @end).
+ * If no dirty range is found, @start will be page_offset(page) + PAGE_SIZE.
+ */
+static void find_next_dirty_byte(struct btrfs_fs_info *fs_info,
+ struct page *page, u64 *start, u64 *end)
+{
+ struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
+ u64 orig_start = *start;
+ u16 dirty_bitmap;
+ unsigned long flags;
+ int nbits = (orig_start - page_offset(page)) >> fs_info->sectorsize;
+ int first_bit_set;
+ int first_bit_zero;
+
+ /*
+ * For regular sector size == page size case, since one page only
+ * contains one sector, we return the page offset directly.
+ */
+ if (fs_info->sectorsize == PAGE_SIZE) {
+ *start = page_offset(page);
+ *end = page_offset(page) + PAGE_SIZE;
+ return;
+ }
+
+ /* We should have the page locked, but just in case */
+ spin_lock_irqsave(&subpage->lock, flags);
+ dirty_bitmap = subpage->dirty_bitmap;
+ spin_unlock_irqrestore(&subpage->lock, flags);
+
+ /* Set bits lower than @nbits with 0 */
+ dirty_bitmap &= ~((1 << nbits) - 1);
+
+ first_bit_set = ffs(dirty_bitmap);
+ /* No dirty range found */
+ if (first_bit_set == 0) {
+ *start = page_offset(page) + PAGE_SIZE;
+ return;
+ }
+
+ ASSERT(first_bit_set > 0 && first_bit_set <= BTRFS_SUBPAGE_BITMAP_SIZE);
+ *start = page_offset(page) + (first_bit_set - 1) * fs_info->sectorsize;
+
+ /* Set all bits lower than @nbits to 1 for ffz() */
+ dirty_bitmap |= ((1 << nbits) - 1);
+
+ first_bit_zero = ffz(dirty_bitmap);
+ if (first_bit_zero == 0 || first_bit_zero > BTRFS_SUBPAGE_BITMAP_SIZE) {
+ *end = page_offset(page) + PAGE_SIZE;
+ return;
+ }
+ ASSERT(first_bit_zero > 0 &&
+ first_bit_zero <= BTRFS_SUBPAGE_BITMAP_SIZE);
+ *end = page_offset(page) + first_bit_zero * fs_info->sectorsize;
+ ASSERT(*end > *start);
+}
+
/*
* helper for __extent_writepage. This calls the writepage start hooks,
* and does the loop to map the page into extents and bios.
@@ -3775,6 +3843,8 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
while (cur <= end) {
u64 disk_bytenr;
u64 em_end;
+ u64 dirty_range_start = cur;
+ u64 dirty_range_end;
u32 iosize;
if (cur >= i_size) {
@@ -3782,9 +3852,17 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
end, 1);
break;
}
+
+ find_next_dirty_byte(fs_info, page, &dirty_range_start,
+ &dirty_range_end);
+ if (cur < dirty_range_start) {
+ cur = dirty_range_start;
+ continue;
+ }
+
em = btrfs_get_extent(inode, NULL, 0, cur, end - cur + 1);
if (IS_ERR_OR_NULL(em)) {
- SetPageError(page);
+ btrfs_page_set_error(fs_info, page, cur, end - cur + 1);
ret = PTR_ERR_OR_ZERO(em);
break;
}
@@ -3799,8 +3877,11 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
compressed = test_bit(EXTENT_FLAG_COMPRESSED, &em->flags);
disk_bytenr = em->block_start + extent_offset;
- /* Note that em_end from extent_map_end() is exclusive */
- iosize = min(em_end, end + 1) - cur;
+ /*
+ * Note that em_end from extent_map_end() and dirty_range_end from
+ * find_next_dirty_byte() are all exclusive
+ */
+ iosize = min(min(em_end, end + 1), dirty_range_end) - cur;
if (btrfs_use_zone_append(inode, em))
opf = REQ_OP_ZONE_APPEND;
@@ -3830,15 +3911,24 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
page->index, cur, end);
}
+ /*
+ * Although the PageDirty bit is cleared before entering this
+ * function, subpage dirty bit is not cleared.
+ * So clear subpage dirty bit here so next time we won't
+ * submit page for range already written to disk.
+ */
+ btrfs_page_clear_dirty(fs_info, page, cur, iosize);
+
ret = submit_extent_page(opf | write_flags, wbc, page,
disk_bytenr, iosize,
cur - page_offset(page), &epd->bio,
end_bio_extent_writepage,
0, 0, 0, false);
if (ret) {
- SetPageError(page);
+ btrfs_page_set_error(fs_info, page, cur, iosize);
if (PageWriteback(page))
- end_page_writeback(page);
+ btrfs_page_clear_writeback(fs_info, page, cur,
+ iosize);
}
cur += iosize;
--
2.31.1
next prev parent reply other threads:[~2021-04-15 5:05 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-15 5:04 [PATCH 00/42] btrfs: add full read-write support for subpage Qu Wenruo
2021-04-15 5:04 ` [PATCH 01/42] btrfs: introduce end_bio_subpage_eb_writepage() function Qu Wenruo
2021-04-15 18:50 ` Josef Bacik
2021-04-15 23:21 ` Qu Wenruo
2021-04-15 5:04 ` [PATCH 02/42] btrfs: introduce write_one_subpage_eb() function Qu Wenruo
2021-04-15 19:03 ` Josef Bacik
2021-04-15 23:25 ` Qu Wenruo
2021-04-16 13:26 ` Josef Bacik
2021-04-18 19:45 ` Thiago Jung Bauermann
2021-04-15 5:04 ` [PATCH 03/42] btrfs: make lock_extent_buffer_for_io() to be subpage compatible Qu Wenruo
2021-04-15 19:04 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 04/42] btrfs: introduce submit_eb_subpage() to submit a subpage metadata page Qu Wenruo
2021-04-15 19:27 ` Josef Bacik
2021-04-15 23:28 ` Qu Wenruo
2021-04-16 13:25 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 05/42] btrfs: remove the unused parameter @len for btrfs_bio_fits_in_stripe() Qu Wenruo
2021-04-16 13:46 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 06/42] btrfs: allow btrfs_bio_fits_in_stripe() to accept bio without any page Qu Wenruo
2021-04-16 13:50 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 07/42] btrfs: use u32 for length related members of btrfs_ordered_extent Qu Wenruo
2021-04-16 13:54 ` Josef Bacik
2021-04-16 23:59 ` Qu Wenruo
2021-04-15 5:04 ` [PATCH 08/42] btrfs: pass btrfs_inode into btrfs_writepage_endio_finish_ordered() Qu Wenruo
2021-04-16 13:58 ` Josef Bacik
2021-04-17 0:02 ` Qu Wenruo
2021-04-15 5:04 ` [PATCH 09/42] btrfs: refactor how we finish ordered extent io for endio functions Qu Wenruo
2021-04-16 14:09 ` Josef Bacik
2021-04-17 0:06 ` Qu Wenruo
2021-04-15 5:04 ` [PATCH 10/42] btrfs: update the comments in btrfs_invalidatepage() Qu Wenruo
2021-04-16 14:32 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 11/42] btrfs: refactor btrfs_invalidatepage() Qu Wenruo
2021-04-16 14:42 ` Josef Bacik
2021-04-17 0:13 ` Qu Wenruo
2021-04-15 5:04 ` [PATCH 12/42] btrfs: make Private2 lifespan more consistent Qu Wenruo
2021-04-16 14:43 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 13/42] btrfs: rename PagePrivate2 to PageOrdered inside btrfs Qu Wenruo
2021-04-16 14:49 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 14/42] btrfs: pass bytenr directly to __process_pages_contig() Qu Wenruo
2021-04-16 14:58 ` Josef Bacik
2021-04-17 0:15 ` Qu Wenruo
2021-04-15 5:04 ` [PATCH 15/42] btrfs: refactor the page status update into process_one_page() Qu Wenruo
2021-04-16 15:06 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 16/42] btrfs: provide btrfs_page_clamp_*() helpers Qu Wenruo
2021-04-16 15:09 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 17/42] btrfs: only require sector size alignment for end_bio_extent_writepage() Qu Wenruo
2021-04-16 15:13 ` Josef Bacik
2021-04-17 0:16 ` Qu Wenruo
2021-04-15 5:04 ` [PATCH 18/42] btrfs: make btrfs_dirty_pages() to be subpage compatible Qu Wenruo
2021-04-16 15:14 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 19/42] btrfs: make __process_pages_contig() to handle subpage dirty/error/writeback status Qu Wenruo
2021-04-16 15:20 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 20/42] btrfs: make end_bio_extent_writepage() to be subpage compatible Qu Wenruo
2021-04-16 15:21 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 21/42] btrfs: make process_one_page() to handle subpage locking Qu Wenruo
2021-04-16 15:36 ` Josef Bacik
2021-04-15 5:04 ` [PATCH 22/42] btrfs: introduce helpers for subpage ordered status Qu Wenruo
2021-04-15 5:04 ` [PATCH 23/42] btrfs: make page Ordered bit to be subpage compatible Qu Wenruo
2021-04-15 5:04 ` [PATCH 24/42] btrfs: update locked page dirty/writeback/error bits in __process_pages_contig Qu Wenruo
2021-04-15 5:04 ` [PATCH 25/42] btrfs: prevent extent_clear_unlock_delalloc() to unlock page not locked by __process_pages_contig() Qu Wenruo
2021-04-15 5:04 ` [PATCH 26/42] btrfs: make btrfs_set_range_writeback() subpage compatible Qu Wenruo
2021-04-15 5:04 ` Qu Wenruo [this message]
2021-04-15 5:04 ` [PATCH 28/42] btrfs: add extra assert for submit_extent_page() Qu Wenruo
2021-04-15 5:04 ` [PATCH 29/42] btrfs: make btrfs_truncate_block() to be subpage compatible Qu Wenruo
2021-04-15 5:04 ` [PATCH 30/42] btrfs: make btrfs_page_mkwrite() " Qu Wenruo
2021-04-15 5:04 ` [PATCH 31/42] btrfs: reflink: make copy_inline_to_page() " Qu Wenruo
2021-04-15 5:04 ` [PATCH 32/42] btrfs: fix the filemap_range_has_page() call in btrfs_punch_hole_lock_range() Qu Wenruo
2021-04-15 5:04 ` [PATCH 33/42] btrfs: don't clear page extent mapped if we're not invalidating the full page Qu Wenruo
2021-04-15 5:04 ` [PATCH 34/42] btrfs: extract relocation page read and dirty part into its own function Qu Wenruo
2021-04-15 5:04 ` [PATCH 35/42] btrfs: make relocate_one_page() to handle subpage case Qu Wenruo
2021-04-15 5:04 ` [PATCH 36/42] btrfs: fix wild subpage writeback which does not have ordered extent Qu Wenruo
2021-04-15 5:04 ` [PATCH 37/42] btrfs: disable inline extent creation for subpage Qu Wenruo
2021-04-15 5:04 ` [PATCH 38/42] btrfs: skip validation for subpage read repair Qu Wenruo
2021-04-15 5:04 ` [PATCH 39/42] btrfs: make free space cache size consistent across different PAGE_SIZE Qu Wenruo
2021-04-15 5:04 ` [PATCH 40/42] btrfs: refactor submit_extent_page() to make bio and its flag tracing easier Qu Wenruo
2021-04-15 5:04 ` [PATCH 41/42] btrfs: allow submit_extent_page() to do bio split for subpage Qu Wenruo
2021-04-15 5:04 ` [PATCH 42/42] btrfs: allow read-write for 4K sectorsize on 64K page size systems Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210415050448.267306-28-wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).