From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v3 05/26] btrfs: make add_ra_bio_pages() to be subpage compatible
Date: Mon, 27 Sep 2021 15:21:47 +0800 [thread overview]
Message-ID: <20210927072208.21634-6-wqu@suse.com> (raw)
In-Reply-To: <20210927072208.21634-1-wqu@suse.com>
[BUG]
If we remove the subpage limitation in add_ra_bio_pages(), then read a
compressed extent which has part of its range in next page, like the
following inode layout:
0 32K 64K 96K 128K
|<--------------|-------------->|
Btrfs will trigger ASSERT() in endio function:
assertion failed: atomic_read(&subpage->readers) >= nbits
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.h:3431!
Internal error: Oops - BUG: 0 [#1] SMP
Workqueue: btrfs-endio btrfs_work_helper [btrfs]
Call trace:
assertfail.constprop.0+0x28/0x2c [btrfs]
btrfs_subpage_end_reader+0x148/0x14c [btrfs]
end_page_read+0x8c/0x100 [btrfs]
end_bio_extent_readpage+0x320/0x6b0 [btrfs]
bio_endio+0x15c/0x1dc
end_workqueue_fn+0x44/0x64 [btrfs]
btrfs_work_helper+0x74/0x250 [btrfs]
process_one_work+0x1d4/0x47c
worker_thread+0x180/0x400
kthread+0x11c/0x120
ret_from_fork+0x10/0x30
---[ end trace c8b7b552d3bb408c ]---
[CAUSE]
When we read the page range [0, 64K), we find it's a compressed extent,
and we will try to add extra pages in add_ra_bio_pages() to avoid
reading the same compressed extent.
But when we add such page into the read bio, it doesn't follow the
behavior of btrfs_do_readpage() to properly set subpage::readers.
This means, for page [64K, 128K), its subpage::readers is still 0.
And when endio is executed on both pages, since page [64K, 128K) has 0
subpage::readers, it triggers above ASSERT()
[FIX]
Function add_ra_bio_pages() is far from subpage compatible, it always
assume PAGE_SIZE == sectorsize, thus when it skip to next range it
always just skip PAGE_SIZE.
Make it subpage compatible by:
- Skip to next page properly when needed
If we find there is already a page cache, we need to skip to next
page.
For that case, we shouldn't just skip PAGE_SIZE bytes, but use
@pg_index to calculate the next bytenr and continue.
- Only add the page range covered by current extent map
We need to calculate which range is covered by current extent map and
only add that part into the read bio.
- Update subpage::readers before submitting the bio
- Use proper cursor other than confusing @last_offset
- Calculate the missed threshold based on sector size
It's no longer using missed pages, as for 64K page size, we have at
most 3 pages to skip. (If aligned only 2 pages)
- Add ASSERT() to make sure our bytenr is always aligned
- Add comment for the function
Add an especial note for subpage case, as the function won't really
work well for subpage cases.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/compression.c | 89 +++++++++++++++++++++++++++---------------
fs/btrfs/extent_io.c | 1 +
2 files changed, 59 insertions(+), 31 deletions(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b50927740d27..7a5d56eaceb7 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -541,13 +541,24 @@ static u64 bio_end_offset(struct bio *bio)
return page_offset(last->bv_page) + last->bv_len + last->bv_offset;
}
+/*
+ * Add extra pages in the same compressed file extent so that we don't
+ * need to re-read the same extent again and again.
+ *
+ * NOTE: this won't work well for subpage, as for subpage read, we lock the
+ * full page then submit bio for each compressed/regular extents.
+ *
+ * This means, if we have several sectors in the same page points to the same
+ * on-disk compressed data, we will re-read the same extent many times and
+ * this function can only help for the next page.
+ */
static noinline int add_ra_bio_pages(struct inode *inode,
u64 compressed_end,
struct compressed_bio *cb)
{
+ struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
unsigned long end_index;
- unsigned long pg_index;
- u64 last_offset;
+ u64 cur = bio_end_offset(cb->orig_bio);
u64 isize = i_size_read(inode);
int ret;
struct page *page;
@@ -555,10 +566,8 @@ static noinline int add_ra_bio_pages(struct inode *inode,
struct address_space *mapping = inode->i_mapping;
struct extent_map_tree *em_tree;
struct extent_io_tree *tree;
- u64 end;
- int misses = 0;
+ int sectors_missed = 0;
- last_offset = bio_end_offset(cb->orig_bio);
em_tree = &BTRFS_I(inode)->extent_tree;
tree = &BTRFS_I(inode)->io_tree;
@@ -577,18 +586,29 @@ static noinline int add_ra_bio_pages(struct inode *inode,
end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
- while (last_offset < compressed_end) {
- pg_index = last_offset >> PAGE_SHIFT;
+ while (cur < compressed_end) {
+ u64 page_end;
+ u64 pg_index = cur >> PAGE_SHIFT;
+ u32 add_size;
if (pg_index > end_index)
break;
page = xa_load(&mapping->i_pages, pg_index);
if (page && !xa_is_value(page)) {
- misses++;
- if (misses > 4)
+ sectors_missed += (PAGE_SIZE - offset_in_page(cur)) >>
+ fs_info->sectorsize_bits;
+
+ /* Beyond threshold, no need to continue */
+ if (sectors_missed > 4)
break;
- goto next;
+
+ /*
+ * Jump to next page start as we already have page for
+ * current offset.
+ */
+ cur = (pg_index << PAGE_SHIFT) + PAGE_SIZE;
+ continue;
}
page = __page_cache_alloc(mapping_gfp_constraint(mapping,
@@ -598,14 +618,11 @@ static noinline int add_ra_bio_pages(struct inode *inode,
if (add_to_page_cache_lru(page, mapping, pg_index, GFP_NOFS)) {
put_page(page);
- goto next;
+ /* There is already a page, skip to page end */
+ cur = (pg_index << PAGE_SHIFT) + PAGE_SIZE;
+ continue;
}
- /*
- * at this point, we have a locked page in the page cache
- * for these bytes in the file. But, we have to make
- * sure they map to this compressed extent on disk.
- */
ret = set_page_extent_mapped(page);
if (ret < 0) {
unlock_page(page);
@@ -613,18 +630,22 @@ static noinline int add_ra_bio_pages(struct inode *inode,
break;
}
- end = last_offset + PAGE_SIZE - 1;
- lock_extent(tree, last_offset, end);
+ page_end = (pg_index << PAGE_SHIFT) + PAGE_SIZE - 1;
+ lock_extent(tree, cur, page_end);
read_lock(&em_tree->lock);
- em = lookup_extent_mapping(em_tree, last_offset,
- PAGE_SIZE);
+ em = lookup_extent_mapping(em_tree, cur, page_end + 1 - cur);
read_unlock(&em_tree->lock);
- if (!em || last_offset < em->start ||
- (last_offset + PAGE_SIZE > extent_map_end(em)) ||
+ /*
+ * At this point, we have a locked page in the page cache
+ * for these bytes in the file. But, we have to make
+ * sure they map to this compressed extent on disk.
+ */
+ if (!em || cur < em->start ||
+ (cur + fs_info->sectorsize > extent_map_end(em)) ||
(em->block_start >> 9) != cb->orig_bio->bi_iter.bi_sector) {
free_extent_map(em);
- unlock_extent(tree, last_offset, end);
+ unlock_extent(tree, cur, page_end);
unlock_page(page);
put_page(page);
break;
@@ -642,19 +663,25 @@ static noinline int add_ra_bio_pages(struct inode *inode,
}
}
+ add_size = min(em->start + em->len, page_end + 1) - cur;
ret = bio_add_page(cb->orig_bio, page,
- PAGE_SIZE, 0);
-
- if (ret == PAGE_SIZE) {
- put_page(page);
- } else {
- unlock_extent(tree, last_offset, end);
+ add_size, offset_in_page(cur));
+ if (ret != add_size) {
+ unlock_extent(tree, cur, page_end);
unlock_page(page);
put_page(page);
break;
}
-next:
- last_offset += PAGE_SIZE;
+ /*
+ * If it's subpage, we also need to increase its
+ * subpage::readers numbre, as at endio we will
+ * decrease subpage::readers and to unlock the page.
+ */
+ if (fs_info->sectorsize < PAGE_SIZE)
+ btrfs_subpage_start_reader(fs_info, page,
+ cur, add_size);
+ put_page(page);
+ cur += add_size;
}
return 0;
}
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ba258b8329c9..781f6ea3c92d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3590,6 +3590,7 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
bool force_bio_submit = false;
u64 disk_bytenr;
+ ASSERT(IS_ALIGNED(cur, fs_info->sectorsize));
if (cur >= last_byte) {
struct extent_state *cached = NULL;
--
2.33.0
next prev parent reply other threads:[~2021-09-27 7:22 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-27 7:21 [PATCH v3 00/26] Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 01/26] btrfs: remove unused parameter @nr_pages in add_ra_bio_pages() Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 02/26] btrfs: remove unnecessary parameter @delalloc_start for writepage_delalloc() Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 03/26] btrfs: use async_chunk::async_cow to replace the confusing pending pointer Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 04/26] btrfs: don't pass compressed pages to btrfs_writepage_endio_finish_ordered() Qu Wenruo
2021-09-27 7:21 ` Qu Wenruo [this message]
2021-10-04 17:45 ` [PATCH v3 05/26] btrfs: make add_ra_bio_pages() to be subpage compatible David Sterba
2021-10-04 22:39 ` Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 06/26] btrfs: introduce compressed_bio::pending_sectors to trace compressed bio more elegantly Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 07/26] btrfs: add subpage checked bitmap to make PageChecked flag to be subpage compatible Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 08/26] btrfs: handle errors properly inside btrfs_submit_compressed_read() Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 09/26] btrfs: handle errors properly inside btrfs_submit_compressed_write() Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 10/26] btrfs: introduce submit_compressed_bio() for compression Qu Wenruo
2021-10-04 19:42 ` David Sterba
2021-10-04 22:41 ` Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 11/26] btrfs: introduce alloc_compressed_bio() " Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 12/26] btrfs: make btrfs_submit_compressed_read() to determine stripe boundary at bio allocation time Qu Wenruo
2021-10-04 20:10 ` David Sterba
2021-10-05 6:24 ` Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 13/26] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 14/26] btrfs: remove unused function btrfs_bio_fits_in_stripe() Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 15/26] btrfs: refactor submit_compressed_extents() Qu Wenruo
2021-10-05 16:33 ` David Sterba
2021-09-27 7:21 ` [PATCH v3 16/26] btrfs: cleanup for extent_write_locked_range() Qu Wenruo
2021-09-27 7:21 ` [PATCH v3 17/26] btrfs: make compress_file_range() to be subpage compatible Qu Wenruo
2021-09-27 7:22 ` [PATCH v3 18/26] btrfs: make btrfs_submit_compressed_write() " Qu Wenruo
2021-09-27 7:22 ` [PATCH v3 19/26] btrfs: make end_compressed_bio_writeback() to be subpage compatble Qu Wenruo
2021-09-27 7:22 ` [PATCH v3 20/26] btrfs: make extent_write_locked_range() to be subpage compatible Qu Wenruo
2021-10-13 14:50 ` Josef Bacik
2021-10-14 0:08 ` Qu Wenruo
2021-09-27 7:22 ` [PATCH v3 21/26] btrfs: extract uncompressed async extent submission code into a new helper Qu Wenruo
2021-09-27 7:22 ` [PATCH v3 22/26] btrfs: rework lzo_compress_pages() to make it subpage compatible Qu Wenruo
2021-09-27 7:22 ` [PATCH v3 23/26] btrfs: teach __extent_writepage() to handle locked page differently Qu Wenruo
2021-09-27 7:22 ` [PATCH v3 24/26] btrfs: allow page to be unlocked by btrfs_page_end_writer_lock() even if it's locked by plain page_lock() Qu Wenruo
2021-09-27 7:22 ` [PATCH v3 25/26] btrfs: don't run delalloc range which is beyond the locked_page to prevent deadlock for subpage compression Qu Wenruo
2021-09-27 7:22 ` [PATCH v3 26/26] btrfs: only allow subpage compression if the range is fully page aligned Qu Wenruo
2021-10-05 17:21 ` [PATCH v3 00/26] David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210927072208.21634-6-wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox