[PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups
@ 2023-12-01  6:06 Qu Wenruo
  2023-12-01  6:06 ` [PATCH v2 1/2] btrfs: migrate extent_buffer::pages[] to folio Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Qu Wenruo @ 2023-12-01  6:06 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Adda new patch to do more cleanups for metadata page pointers usage

This patchset would migrate extent_buffer::pages[] to folios[], then
cleanup the existing metadata page pointer usage to proper folios ones.

This cleanup would help future higher order folios usage for metadata in
the following aspects:

- No more need to iterate through the remaining pages for page flags
  We just call folio_set/mark/start_*() helpers, for the single folio.
  The helper would only set the flag (mostly for the leading page).

- Single bio_add_folio() call for the whole eb

- Better filio helpers naming
  PageUptodate() compared to folio_test_uptodate().

The first patch would do a very simple conversion, then the 2nd patch do
the prepartion for the higher order folio situation.

There are two locations which won't be converted to folios yet:

- Subpage code
  There is no meaning to support higher order folio for subpage.
  The two conditions are just conflicting with each other.

- Data page pointers
  That would be more useful in the future, before we going to support
  multi-page sectorsize.

However the 2nd one would also add a new corner case:

- Order mismatch in filemap and eb folios
  Unforatunately I don't have a better plan other than re-allocate the
  folios to the same order.
  Maybe in the future we would have better ways to handle it? Like
  migrating the pages to the higher order one?



Qu Wenruo (2):
  btrfs: migrate extent_buffer::pages[] to folio
  btrfs: cleanup metadata page pointer usage

 fs/btrfs/accessors.c             |  20 +-
 fs/btrfs/accessors.h             |   4 +-
 fs/btrfs/ctree.c                 |   2 +-
 fs/btrfs/disk-io.c               |  25 +-
 fs/btrfs/extent_io.c             | 402 ++++++++++++++++++-------------
 fs/btrfs/extent_io.h             |  21 +-
 fs/btrfs/inode.c                 |   2 +-
 fs/btrfs/subpage.c               |  60 ++---
 fs/btrfs/subpage.h               |  11 +-
 fs/btrfs/tests/extent-io-tests.c |   4 +-
 10 files changed, 322 insertions(+), 229 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] btrfs: migrate extent_buffer::pages[] to folio
  2023-12-01  6:06 [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups Qu Wenruo
@ 2023-12-01  6:06 ` Qu Wenruo
  2023-12-01  6:06 ` [PATCH v2 2/2] btrfs: cleanup metadata page pointer usage Qu Wenruo
  2023-12-04 16:26 ` [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups David Sterba
  2 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2023-12-01  6:06 UTC (permalink / raw)
  To: linux-btrfs

For now extent_buffer::pages[] are still only accept single page
pointer, thus we can migrate to folios pretty easily.

As for single page, page and folio are 1:1 mapped, including their page
flags.

This patch would just do the conversion from struct page to struct
folio, providing the first step to higher order folio in the future.

This conversion is pretty simple:

- extent_buffer::pages[] -> extent_buffer::folios[]

- page_address(eb->pages[i]) -> folio_address(eb->pages[i])

- eb->pages[i] -> folio_page(eb->folios[i], 0)

There would be more specific cleanups preparing for the incoming higher
order folio support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/accessors.c             |  20 ++---
 fs/btrfs/accessors.h             |   4 +-
 fs/btrfs/ctree.c                 |   2 +-
 fs/btrfs/disk-io.c               |  19 ++---
 fs/btrfs/extent_io.c             | 123 ++++++++++++++++++-------------
 fs/btrfs/extent_io.h             |   7 +-
 fs/btrfs/tests/extent-io-tests.c |   4 +-
 7 files changed, 103 insertions(+), 76 deletions(-)

diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
index 206cf1612c1d..8f7cbb7154d4 100644
--- a/fs/btrfs/accessors.c
+++ b/fs/btrfs/accessors.c
@@ -27,7 +27,7 @@ static bool check_setget_bounds(const struct extent_buffer *eb,
 void btrfs_init_map_token(struct btrfs_map_token *token, struct extent_buffer *eb)
 {
 	token->eb = eb;
-	token->kaddr = page_address(eb->pages[0]);
+	token->kaddr = folio_address(eb->folios[0]);
 	token->offset = 0;
 }
 
@@ -50,7 +50,7 @@ void btrfs_init_map_token(struct btrfs_map_token *token, struct extent_buffer *e
  * an offset into the extent buffer page array, cast to a specific type.  This
  * gives us all the type checking.
  *
- * The extent buffer pages stored in the array pages do not form a contiguous
+ * The extent buffer pages stored in the array folios may not form a contiguous
  * phyusical range, but the API functions assume the linear offset to the range
  * from 0 to metadata node size.
  */
@@ -74,13 +74,13 @@ u##bits btrfs_get_token_##bits(struct btrfs_map_token *token,		\
 	    member_offset + size <= token->offset + PAGE_SIZE) {	\
 		return get_unaligned_le##bits(token->kaddr + oip);	\
 	}								\
-	token->kaddr = page_address(token->eb->pages[idx]);		\
+	token->kaddr = folio_address(token->eb->folios[idx]);		\
 	token->offset = idx << PAGE_SHIFT;				\
 	if (INLINE_EXTENT_BUFFER_PAGES == 1 || oip + size <= PAGE_SIZE ) \
 		return get_unaligned_le##bits(token->kaddr + oip);	\
 									\
 	memcpy(lebytes, token->kaddr + oip, part);			\
-	token->kaddr = page_address(token->eb->pages[idx + 1]);		\
+	token->kaddr = folio_address(token->eb->folios[idx + 1]);	\
 	token->offset = (idx + 1) << PAGE_SHIFT;			\
 	memcpy(lebytes + part, token->kaddr, size - part);		\
 	return get_unaligned_le##bits(lebytes);				\
@@ -91,7 +91,7 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb,		\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
 	const unsigned long oip = get_eb_offset_in_page(eb, member_offset); \
 	const unsigned long idx = get_eb_page_index(member_offset);	\
-	char *kaddr = page_address(eb->pages[idx]);			\
+	char *kaddr = folio_address(eb->folios[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
 	u8 lebytes[sizeof(u##bits)];					\
@@ -101,7 +101,7 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb,		\
 		return get_unaligned_le##bits(kaddr + oip);		\
 									\
 	memcpy(lebytes, kaddr + oip, part);				\
-	kaddr = page_address(eb->pages[idx + 1]);			\
+	kaddr = folio_address(eb->folios[idx + 1]);			\
 	memcpy(lebytes + part, kaddr, size - part);			\
 	return get_unaligned_le##bits(lebytes);				\
 }									\
@@ -125,7 +125,7 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token,		\
 		put_unaligned_le##bits(val, token->kaddr + oip);	\
 		return;							\
 	}								\
-	token->kaddr = page_address(token->eb->pages[idx]);		\
+	token->kaddr = folio_address(token->eb->folios[idx]);		\
 	token->offset = idx << PAGE_SHIFT;				\
 	if (INLINE_EXTENT_BUFFER_PAGES == 1 || oip + size <= PAGE_SIZE) { \
 		put_unaligned_le##bits(val, token->kaddr + oip);	\
@@ -133,7 +133,7 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token,		\
 	}								\
 	put_unaligned_le##bits(val, lebytes);				\
 	memcpy(token->kaddr + oip, lebytes, part);			\
-	token->kaddr = page_address(token->eb->pages[idx + 1]);		\
+	token->kaddr = folio_address(token->eb->folios[idx + 1]);	\
 	token->offset = (idx + 1) << PAGE_SHIFT;			\
 	memcpy(token->kaddr, lebytes + part, size - part);		\
 }									\
@@ -143,7 +143,7 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr,	\
 	const unsigned long member_offset = (unsigned long)ptr + off;	\
 	const unsigned long oip = get_eb_offset_in_page(eb, member_offset); \
 	const unsigned long idx = get_eb_page_index(member_offset);	\
-	char *kaddr = page_address(eb->pages[idx]);			\
+	char *kaddr = folio_address(eb->folios[idx]);			\
 	const int size = sizeof(u##bits);				\
 	const int part = PAGE_SIZE - oip;				\
 	u8 lebytes[sizeof(u##bits)];					\
@@ -156,7 +156,7 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr,	\
 									\
 	put_unaligned_le##bits(val, lebytes);				\
 	memcpy(kaddr + oip, lebytes, part);				\
-	kaddr = page_address(eb->pages[idx + 1]);			\
+	kaddr = folio_address(eb->folios[idx + 1]);			\
 	memcpy(kaddr, lebytes + part, size - part);			\
 }
 
diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
index aa0844535644..ed7aa32972ad 100644
--- a/fs/btrfs/accessors.h
+++ b/fs/btrfs/accessors.h
@@ -90,14 +90,14 @@ static inline void btrfs_set_token_##name(struct btrfs_map_token *token,\
 #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits)		\
 static inline u##bits btrfs_##name(const struct extent_buffer *eb)	\
 {									\
-	const type *p = page_address(eb->pages[0]) +			\
+	const type *p = folio_address(eb->folios[0]) +			\
 			offset_in_page(eb->start);			\
 	return get_unaligned_le##bits(&p->member);			\
 }									\
 static inline void btrfs_set_##name(const struct extent_buffer *eb,	\
 				    u##bits val)			\
 {									\
-	type *p = page_address(eb->pages[0]) + offset_in_page(eb->start); \
+	type *p = folio_address(eb->folios[0]) + offset_in_page(eb->start); \
 	put_unaligned_le##bits(val, &p->member);			\
 }
 
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 137c4eb24c28..e6c535cf3749 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -832,7 +832,7 @@ int btrfs_bin_search(struct extent_buffer *eb, int first_slot,
 
 		if (oip + key_size <= PAGE_SIZE) {
 			const unsigned long idx = get_eb_page_index(offset);
-			char *kaddr = page_address(eb->pages[idx]);
+			char *kaddr = folio_address(eb->folios[idx]);
 
 			oip = get_eb_offset_in_page(eb, offset);
 			tmp = (struct btrfs_disk_key *)(kaddr + oip);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9317606017e2..78bb85f775f6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -89,7 +89,7 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
 		first_page_part = fs_info->nodesize;
 		num_pages = 1;
 	} else {
-		kaddr = page_address(buf->pages[0]);
+		kaddr = folio_address(buf->folios[0]);
 		first_page_part = min_t(u32, PAGE_SIZE, fs_info->nodesize);
 		num_pages = num_extent_pages(buf);
 	}
@@ -98,7 +98,7 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
 			    first_page_part - BTRFS_CSUM_SIZE);
 
 	for (i = 1; i < num_pages && INLINE_EXTENT_BUFFER_PAGES > 1; i++) {
-		kaddr = page_address(buf->pages[i]);
+		kaddr = folio_address(buf->folios[i]);
 		crypto_shash_update(shash, kaddr, PAGE_SIZE);
 	}
 	memset(result, 0, BTRFS_CSUM_SIZE);
@@ -184,13 +184,14 @@ static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb,
 		return -EROFS;
 
 	for (i = 0; i < num_pages; i++) {
-		struct page *p = eb->pages[i];
-		u64 start = max_t(u64, eb->start, page_offset(p));
-		u64 end = min_t(u64, eb->start + eb->len, page_offset(p) + PAGE_SIZE);
+		u64 start = max_t(u64, eb->start, folio_pos(eb->folios[i]));
+		u64 end = min_t(u64, eb->start + eb->len,
+				folio_pos(eb->folios[i]) + PAGE_SIZE);
 		u32 len = end - start;
 
 		ret = btrfs_repair_io_failure(fs_info, 0, start, len,
-				start, p, offset_in_page(start), mirror_num);
+				start, folio_page(eb->folios[i], 0),
+				offset_in_page(start), mirror_num);
 		if (ret)
 			break;
 	}
@@ -277,8 +278,8 @@ blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
 
 	if (WARN_ON_ONCE(found_start != eb->start))
 		return BLK_STS_IOERR;
-	if (WARN_ON(!btrfs_page_test_uptodate(fs_info, eb->pages[0], eb->start,
-					      eb->len)))
+	if (WARN_ON(!btrfs_page_test_uptodate(fs_info, folio_page(eb->folios[0], 0),
+					      eb->start, eb->len)))
 		return BLK_STS_IOERR;
 
 	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
@@ -387,7 +388,7 @@ int btrfs_validate_extent_buffer(struct extent_buffer *eb,
 	}
 
 	csum_tree_block(eb, result);
-	header_csum = page_address(eb->pages[0]) +
+	header_csum = folio_address(eb->folios[0]) +
 		get_eb_offset_in_page(eb, offsetof(struct btrfs_header, csum));
 
 	if (memcmp(result, header_csum, csum_size) != 0) {
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 734016eac82f..e93f6a8d1f20 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -712,6 +712,26 @@ int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
 	return 0;
 }
 
+/*
+ * Populate needed folios for the extent buffer.
+ *
+ * For now, the folios populated are always in order 0 (aka, single page).
+ */
+static int alloc_eb_folio_array(struct extent_buffer *eb, gfp_t extra_gfp)
+{
+	struct page *page_array[INLINE_EXTENT_BUFFER_PAGES];
+	int num_pages = num_extent_pages(eb);
+	int ret;
+
+	ret = btrfs_alloc_page_array(num_pages, page_array, extra_gfp);
+	if (ret < 0)
+		return ret;
+
+	for (int i = 0; i < num_pages; i++)
+		eb->folios[i] = page_folio(page_array[i]);
+	return 0;
+}
+
 static bool btrfs_bio_is_contig(struct btrfs_bio_ctrl *bio_ctrl,
 				struct page *page, u64 disk_bytenr,
 				unsigned int pg_offset)
@@ -1689,7 +1709,7 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
 	bbio->inode = BTRFS_I(eb->fs_info->btree_inode);
 	bbio->file_offset = eb->start;
 	if (fs_info->nodesize < PAGE_SIZE) {
-		struct page *p = eb->pages[0];
+		struct page *p = folio_page(eb->folios[0], 0);
 
 		lock_page(p);
 		btrfs_subpage_set_writeback(fs_info, p, eb->start, eb->len);
@@ -1703,7 +1723,7 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
 		unlock_page(p);
 	} else {
 		for (int i = 0; i < num_extent_pages(eb); i++) {
-			struct page *p = eb->pages[i];
+			struct page *p = folio_page(eb->folios[i], 0);
 
 			lock_page(p);
 			clear_page_dirty_for_io(p);
@@ -3160,7 +3180,7 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
 
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
-		struct page *page = eb->pages[i];
+		struct page *page = folio_page(eb->folios[i], 0);
 
 		if (!page)
 			continue;
@@ -3222,7 +3242,7 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 	 */
 	set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
 
-	ret = btrfs_alloc_page_array(num_pages, new->pages, 0);
+	ret = alloc_eb_folio_array(new, 0);
 	if (ret) {
 		btrfs_release_extent_buffer(new);
 		return NULL;
@@ -3230,7 +3250,7 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 
 	for (i = 0; i < num_pages; i++) {
 		int ret;
-		struct page *p = new->pages[i];
+		struct page *p = folio_page(new->folios[i], 0);
 
 		ret = attach_extent_buffer_page(new, p, NULL);
 		if (ret < 0) {
@@ -3258,12 +3278,12 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
 		return NULL;
 
 	num_pages = num_extent_pages(eb);
-	ret = btrfs_alloc_page_array(num_pages, eb->pages, 0);
+	ret = alloc_eb_folio_array(eb, 0);
 	if (ret)
 		goto err;
 
 	for (i = 0; i < num_pages; i++) {
-		struct page *p = eb->pages[i];
+		struct page *p = folio_page(eb->folios[i], 0);
 
 		ret = attach_extent_buffer_page(eb, p, NULL);
 		if (ret < 0)
@@ -3277,9 +3297,9 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
 	return eb;
 err:
 	for (i = 0; i < num_pages; i++) {
-		if (eb->pages[i]) {
-			detach_extent_buffer_page(eb, eb->pages[i]);
-			__free_page(eb->pages[i]);
+		if (eb->folios[i]) {
+			detach_extent_buffer_page(eb, folio_page(eb->folios[i], 0));
+			__free_page(folio_page(eb->folios[i], 0));
 		}
 	}
 	__free_extent_buffer(eb);
@@ -3337,7 +3357,7 @@ static void mark_extent_buffer_accessed(struct extent_buffer *eb,
 
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
-		struct page *p = eb->pages[i];
+		struct page *p = folio_page(eb->folios[i], 0);
 
 		if (p != accessed)
 			mark_page_accessed(p);
@@ -3480,7 +3500,7 @@ static int check_eb_alignment(struct btrfs_fs_info *fs_info, u64 start)
 
 
 /*
- * Return 0 if eb->pages[i] is attached to btree inode successfully.
+ * Return 0 if eb->folios[i] is attached to btree inode successfully.
  * Return >0 if there is already annother extent buffer for the range,
  * and @found_eb_ret would be updated.
  */
@@ -3496,11 +3516,11 @@ static int attach_eb_page_to_filemap(struct extent_buffer *eb, int i,
 
 	ASSERT(found_eb_ret);
 
-	/* Caller should ensure the page exists. */
-	ASSERT(eb->pages[i]);
+	/* Caller should ensure the folio exists. */
+	ASSERT(eb->folios[i]);
 
 retry:
-	ret = filemap_add_folio(mapping, page_folio(eb->pages[i]), index + i,
+	ret = filemap_add_folio(mapping, eb->folios[i], index + i,
 			GFP_NOFS | __GFP_NOFAIL);
 	if (!ret)
 		return 0;
@@ -3521,8 +3541,8 @@ static int attach_eb_page_to_filemap(struct extent_buffer *eb, int i,
 		 * We're going to reuse the existing page, can
 		 * drop our page and subpage structure now.
 		 */
-		__free_page(eb->pages[i]);
-		eb->pages[i] = folio_page(existing_folio, 0);
+		__free_page(folio_page(eb->folios[i], 0));
+		eb->folios[i] = existing_folio;
 	} else {
 		struct extent_buffer *existing_eb;
 
@@ -3539,8 +3559,8 @@ static int attach_eb_page_to_filemap(struct extent_buffer *eb, int i,
 			return 1;
 		}
 		/* The extent buffer no longer exists, we can reuse the folio. */
-		__free_page(eb->pages[i]);
-		eb->pages[i] = folio_page(existing_folio, 0);
+		__free_page(folio_page(eb->folios[i], 0));
+		eb->folios[i] = existing_folio;
 	}
 	return 0;
 }
@@ -3609,7 +3629,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	}
 
 	/* Allocate all pages first. */
-	ret = btrfs_alloc_page_array(num_pages, eb->pages, __GFP_NOFAIL);
+	ret = alloc_eb_folio_array(eb, __GFP_NOFAIL);
 	if (ret < 0) {
 		btrfs_free_subpage(prealloc);
 		goto out;
@@ -3627,11 +3647,11 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		attached++;
 
 		/*
-		 * Only after attach_eb_page_to_filemap(), eb->pages[] is
+		 * Only after attach_eb_page_to_filemap(), eb->folios[] is
 		 * reliable, as we may choose to reuse the existing page cache
 		 * and free the allocated page.
 		 */
-		p = eb->pages[i];
+		p = folio_page(eb->folios[i], 0);
 		spin_lock(&mapping->private_lock);
 		/* Should not fail, as we have preallocated the memory */
 		ret = attach_extent_buffer_page(eb, p, prealloc);
@@ -3654,7 +3674,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		 * Check if the current page is physically contiguous with previous eb
 		 * page.
 		 */
-		if (i && eb->pages[i - 1] + 1 != p)
+		if (i && folio_page(eb->folios[i - 1], 0) + 1 != p)
 			page_contig = false;
 
 		if (!btrfs_page_test_uptodate(fs_info, p, eb->start, eb->len))
@@ -3672,7 +3692,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	/* All pages are physically contiguous, can skip cross page handling. */
 	if (page_contig)
-		eb->addr = page_address(eb->pages[0]) + offset_in_page(eb->start);
+		eb->addr = folio_address(eb->folios[0]) + offset_in_page(eb->start);
 again:
 	ret = radix_tree_preload(GFP_NOFS);
 	if (ret)
@@ -3700,15 +3720,15 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	 * live buffer and won't free them prematurely.
 	 */
 	for (int i = 0; i < num_pages; i++)
-		unlock_page(eb->pages[i]);
+		unlock_page(folio_page(eb->folios[i], 0));
 	return eb;
 
 out:
 	WARN_ON(!atomic_dec_and_test(&eb->refs));
 	for (int i = 0; i < attached; i++) {
-		ASSERT(eb->pages[i]);
-		detach_extent_buffer_page(eb, eb->pages[i]);
-		unlock_page(eb->pages[i]);
+		ASSERT(eb->folios[i]);
+		detach_extent_buffer_page(eb, folio_page(eb->folios[i], 0));
+		unlock_page(folio_page(eb->folios[i], 0));
 	}
 	/*
 	 * Now all pages of that extent buffer is unmapped, set UNMAPPED flag,
@@ -3827,7 +3847,7 @@ static void btree_clear_page_dirty(struct page *page)
 static void clear_subpage_extent_buffer_dirty(const struct extent_buffer *eb)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-	struct page *page = eb->pages[0];
+	struct page *page = folio_page(eb->folios[0], 0);
 	bool last;
 
 	/* btree_clear_page_dirty() needs page locked */
@@ -3879,7 +3899,7 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 	num_pages = num_extent_pages(eb);
 
 	for (i = 0; i < num_pages; i++) {
-		page = eb->pages[i];
+		page = folio_page(eb->folios[i], 0);
 		if (!PageDirty(page))
 			continue;
 		lock_page(page);
@@ -3918,19 +3938,19 @@ void set_extent_buffer_dirty(struct extent_buffer *eb)
 		 * the above race.
 		 */
 		if (subpage)
-			lock_page(eb->pages[0]);
+			lock_page(folio_page(eb->folios[0], 0));
 		for (i = 0; i < num_pages; i++)
-			btrfs_page_set_dirty(eb->fs_info, eb->pages[i],
+			btrfs_page_set_dirty(eb->fs_info, folio_page(eb->folios[i], 0),
 					     eb->start, eb->len);
 		if (subpage)
-			unlock_page(eb->pages[0]);
+			unlock_page(folio_page(eb->folios[0], 0));
 		percpu_counter_add_batch(&eb->fs_info->dirty_metadata_bytes,
 					 eb->len,
 					 eb->fs_info->dirty_metadata_batch);
 	}
 #ifdef CONFIG_BTRFS_DEBUG
 	for (i = 0; i < num_pages; i++)
-		ASSERT(PageDirty(eb->pages[i]));
+		ASSERT(PageDirty(folio_page(eb->folios[i], 0)));
 #endif
 }
 
@@ -3944,7 +3964,7 @@ void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
-		page = eb->pages[i];
+		page = folio_page(eb->folios[i], 0);
 		if (!page)
 			continue;
 
@@ -3970,7 +3990,7 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
-		page = eb->pages[i];
+		page = folio_page(eb->folios[i], 0);
 
 		/*
 		 * This is special handling for metadata subpage, as regular
@@ -4061,11 +4081,12 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	bbio->file_offset = eb->start;
 	memcpy(&bbio->parent_check, check, sizeof(*check));
 	if (eb->fs_info->nodesize < PAGE_SIZE) {
-		__bio_add_page(&bbio->bio, eb->pages[0], eb->len,
-			       eb->start - page_offset(eb->pages[0]));
+		__bio_add_page(&bbio->bio, folio_page(eb->folios[0], 0), eb->len,
+			       eb->start - folio_pos(eb->folios[0]));
 	} else {
 		for (i = 0; i < num_pages; i++)
-			__bio_add_page(&bbio->bio, eb->pages[i], PAGE_SIZE, 0);
+			__bio_add_page(&bbio->bio, folio_page(eb->folios[i], 0),
+				       PAGE_SIZE, 0);
 	}
 	btrfs_submit_bio(bbio, mirror_num);
 
@@ -4136,7 +4157,7 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	offset = get_eb_offset_in_page(eb, start);
 
 	while (len > 0) {
-		page = eb->pages[i];
+		page = folio_page(eb->folios[i], 0);
 
 		cur = min(len, (PAGE_SIZE - offset));
 		kaddr = page_address(page);
@@ -4173,7 +4194,7 @@ int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
 	offset = get_eb_offset_in_page(eb, start);
 
 	while (len > 0) {
-		page = eb->pages[i];
+		page = folio_page(eb->folios[i], 0);
 
 		cur = min(len, (PAGE_SIZE - offset));
 		kaddr = page_address(page);
@@ -4211,7 +4232,7 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	offset = get_eb_offset_in_page(eb, start);
 
 	while (len > 0) {
-		page = eb->pages[i];
+		page = folio_page(eb->folios[i], 0);
 
 		cur = min(len, (PAGE_SIZE - offset));
 
@@ -4286,7 +4307,7 @@ static void __write_extent_buffer(const struct extent_buffer *eb,
 	offset = get_eb_offset_in_page(eb, start);
 
 	while (len > 0) {
-		page = eb->pages[i];
+		page = folio_page(eb->folios[i], 0);
 		if (check_uptodate)
 			assert_eb_page_uptodate(eb, page);
 
@@ -4324,7 +4345,7 @@ static void memset_extent_buffer(const struct extent_buffer *eb, int c,
 		unsigned long index = get_eb_page_index(cur);
 		unsigned int offset = get_eb_offset_in_page(eb, cur);
 		unsigned int cur_len = min(start + len - cur, PAGE_SIZE - offset);
-		struct page *page = eb->pages[index];
+		struct page *page = folio_page(eb->folios[index], 0);
 
 		assert_eb_page_uptodate(eb, page);
 		memset_page(page, offset, c, cur_len);
@@ -4352,7 +4373,7 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
 		unsigned long index = get_eb_page_index(cur);
 		unsigned long offset = get_eb_offset_in_page(src, cur);
 		unsigned long cur_len = min(src->len, PAGE_SIZE - offset);
-		void *addr = page_address(src->pages[index]) + offset;
+		void *addr = folio_address(src->folios[index]) + offset;
 
 		write_extent_buffer(dst, addr, cur, cur_len);
 
@@ -4381,7 +4402,7 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 	offset = get_eb_offset_in_page(dst, dst_offset);
 
 	while (len > 0) {
-		page = dst->pages[i];
+		page = folio_page(dst->folios[i], 0);
 		assert_eb_page_uptodate(dst, page);
 
 		cur = min(len, (unsigned long)(PAGE_SIZE - offset));
@@ -4444,7 +4465,7 @@ int extent_buffer_test_bit(const struct extent_buffer *eb, unsigned long start,
 	size_t offset;
 
 	eb_bitmap_offset(eb, start, nr, &i, &offset);
-	page = eb->pages[i];
+	page = folio_page(eb->folios[i], 0);
 	assert_eb_page_uptodate(eb, page);
 	kaddr = page_address(page);
 	return 1U & (kaddr[offset] >> (nr & (BITS_PER_BYTE - 1)));
@@ -4456,7 +4477,7 @@ static u8 *extent_buffer_get_byte(const struct extent_buffer *eb, unsigned long
 
 	if (check_eb_range(eb, bytenr, 1))
 		return NULL;
-	return page_address(eb->pages[index]) + get_eb_offset_in_page(eb, bytenr);
+	return folio_address(eb->folios[index]) + get_eb_offset_in_page(eb, bytenr);
 }
 
 /*
@@ -4563,7 +4584,7 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
 		unsigned long pg_off = get_eb_offset_in_page(dst, cur_src);
 		unsigned long cur_len = min(src_offset + len - cur_src,
 					    PAGE_SIZE - pg_off);
-		void *src_addr = page_address(dst->pages[pg_index]) + pg_off;
+		void *src_addr = folio_address(dst->folios[pg_index]) + pg_off;
 		const bool use_memmove = areas_overlap(src_offset + cur_off,
 						       dst_offset + cur_off, cur_len);
 
@@ -4610,8 +4631,8 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 		cur = min_t(unsigned long, len, src_off_in_page + 1);
 		cur = min(cur, dst_off_in_page + 1);
 
-		src_addr = page_address(dst->pages[src_i]) + src_off_in_page -
-					cur + 1;
+		src_addr = folio_address(dst->folios[src_i]) + src_off_in_page -
+					 cur + 1;
 		use_memmove = areas_overlap(src_end - cur + 1, dst_end - cur + 1,
 					    cur);
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c73d53c22ec5..66c2e214b141 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -94,7 +94,12 @@ struct extent_buffer {
 
 	struct rw_semaphore lock;
 
-	struct page *pages[INLINE_EXTENT_BUFFER_PAGES];
+	/*
+	 * Pointers to all the folios of the extent buffer.
+	 *
+	 * For now the folio is always order 0 (aka, a single page).
+	 */
+	struct folio *folios[INLINE_EXTENT_BUFFER_PAGES];
 #ifdef CONFIG_BTRFS_DEBUG
 	struct list_head leak_list;
 	pid_t lock_owner;
diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index 1cc86af97dc6..25b3349595e0 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -652,7 +652,7 @@ static void dump_eb_and_memory_contents(struct extent_buffer *eb, void *memory,
 					const char *test_name)
 {
 	for (int i = 0; i < eb->len; i++) {
-		struct page *page = eb->pages[i >> PAGE_SHIFT];
+		struct page *page = folio_page(eb->folios[i >> PAGE_SHIFT], 0);
 		void *addr = page_address(page) + offset_in_page(i);
 
 		if (memcmp(addr, memory + i, 1) != 0) {
@@ -668,7 +668,7 @@ static int verify_eb_and_memory(struct extent_buffer *eb, void *memory,
 				const char *test_name)
 {
 	for (int i = 0; i < (eb->len >> PAGE_SHIFT); i++) {
-		void *eb_addr = page_address(eb->pages[i]);
+		void *eb_addr = folio_address(eb->folios[i]);
 
 		if (memcmp(memory + (i << PAGE_SHIFT), eb_addr, PAGE_SIZE) != 0) {
 			dump_eb_and_memory_contents(eb, memory, test_name);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] btrfs: cleanup metadata page pointer usage
  2023-12-01  6:06 [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups Qu Wenruo
  2023-12-01  6:06 ` [PATCH v2 1/2] btrfs: migrate extent_buffer::pages[] to folio Qu Wenruo
@ 2023-12-01  6:06 ` Qu Wenruo
  2023-12-05 14:00   ` David Sterba
  2023-12-04 16:26 ` [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups David Sterba
  2 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2023-12-01  6:06 UTC (permalink / raw)
  To: linux-btrfs

Although we have migrated extent_buffer::pages[] to folios[], we're
still mostly using the folio_page() help to grab the page.

This patch would do the following cleanups for metadata:

- Introduce num_extent_folios() helper
  This is to replace most num_extent_pages() callers.

- Use num_extent_folios() to iterate future large folios
  This allows us to use things like
  bio_add_folio()/bio_add_folio_nofail(), and only set the needed flags
  for the folio (aka the leading/tailing page), which reduces the loop
  iteration to 1 for large folios.

- Change metadata related functions to use folio pointers
  Including their function name, involving:
  * attach_extent_buffer_page()
  * detach_extent_buffer_page()
  * page_range_has_eb()
  * btrfs_release_extent_buffer_pages()
  * btree_clear_page_dirty()
  * btrfs_page_inc_eb_refs()
  * btrfs_page_dec_eb_refs()

- Change btrfs_is_subpage() to accept an address_space pointer
  This is to allow both page->mapping and folio->mapping to be utilized.
  As data is still using the old per-page code, and may keep so for a
  while.

- Special corner case place holder for future order mismatches between
  extent buffer and inode filemap
  For now it's  just a block of comments and a dead ASSERT(), no real
  handling yet.

The subpage code would still go page, just because subpage and large
folio are conflicting conditions, thus we don't need to bother subpage
with higher order folios at all. Just folio_page(folio, 0) would be
enough.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c   |   6 +
 fs/btrfs/extent_io.c | 319 ++++++++++++++++++++++++-------------------
 fs/btrfs/extent_io.h |  14 ++
 fs/btrfs/inode.c     |   2 +-
 fs/btrfs/subpage.c   |  60 ++++----
 fs/btrfs/subpage.h   |  11 +-
 6 files changed, 239 insertions(+), 173 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 78bb85f775f6..a5ace9f6e790 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -97,6 +97,12 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
 	crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
 			    first_page_part - BTRFS_CSUM_SIZE);
 
+	/*
+	 * Multiple single-page folios case would reach here.
+	 *
+	 * nodesize <= PAGE_SIZE and large folio all handled by above
+	 * crypto_shash_update() already.
+	 */
 	for (i = 1; i < num_pages && INLINE_EXTENT_BUFFER_PAGES > 1; i++) {
 		kaddr = folio_address(buf->folios[i]);
 		crypto_shash_update(shash, kaddr, PAGE_SIZE);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e93f6a8d1f20..8d762809482b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -441,7 +441,7 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 	else
 		btrfs_page_clear_uptodate(fs_info, page, start, len);
 
-	if (!btrfs_is_subpage(fs_info, page))
+	if (!btrfs_is_subpage(fs_info, page->mapping))
 		unlock_page(page);
 	else
 		btrfs_subpage_end_reader(fs_info, page, start, len);
@@ -565,7 +565,7 @@ static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page)
 	struct folio *folio = page_folio(page);
 
 	ASSERT(PageLocked(page));
-	if (!btrfs_is_subpage(fs_info, page))
+	if (!btrfs_is_subpage(fs_info, page->mapping))
 		return;
 
 	ASSERT(folio_test_private(folio));
@@ -886,11 +886,10 @@ static void submit_extent_page(struct btrfs_bio_ctrl *bio_ctrl,
 	} while (size);
 }
 
-static int attach_extent_buffer_page(struct extent_buffer *eb,
-				     struct page *page,
-				     struct btrfs_subpage *prealloc)
+static int attach_extent_buffer_folio(struct extent_buffer *eb,
+				      struct folio *folio,
+				      struct btrfs_subpage *prealloc)
 {
-	struct folio *folio = page_folio(page);
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 	int ret = 0;
 
@@ -900,8 +899,8 @@ static int attach_extent_buffer_page(struct extent_buffer *eb,
 	 * For cloned or dummy extent buffers, their pages are not mapped and
 	 * will not race with any other ebs.
 	 */
-	if (page->mapping)
-		lockdep_assert_held(&page->mapping->private_lock);
+	if (folio->mapping)
+		lockdep_assert_held(&folio->mapping->private_lock);
 
 	if (fs_info->nodesize >= PAGE_SIZE) {
 		if (!folio_test_private(folio))
@@ -922,7 +921,7 @@ static int attach_extent_buffer_page(struct extent_buffer *eb,
 		folio_attach_private(folio, prealloc);
 	else
 		/* Do new allocation to attach subpage */
-		ret = btrfs_attach_subpage(fs_info, page,
+		ret = btrfs_attach_subpage(fs_info, folio_page(folio, 0),
 					   BTRFS_SUBPAGE_METADATA);
 	return ret;
 }
@@ -939,7 +938,7 @@ int set_page_extent_mapped(struct page *page)
 
 	fs_info = btrfs_sb(page->mapping->host->i_sb);
 
-	if (btrfs_is_subpage(fs_info, page))
+	if (btrfs_is_subpage(fs_info, page->mapping))
 		return btrfs_attach_subpage(fs_info, page, BTRFS_SUBPAGE_DATA);
 
 	folio_attach_private(folio, (void *)EXTENT_FOLIO_PRIVATE);
@@ -957,7 +956,7 @@ void clear_page_extent_mapped(struct page *page)
 		return;
 
 	fs_info = btrfs_sb(page->mapping->host->i_sb);
-	if (btrfs_is_subpage(fs_info, page))
+	if (btrfs_is_subpage(fs_info, page->mapping))
 		return btrfs_detach_subpage(fs_info, page);
 
 	folio_detach_private(folio);
@@ -1281,7 +1280,7 @@ static void find_next_dirty_byte(struct btrfs_fs_info *fs_info,
 	 * For regular sector size == page size case, since one page only
 	 * contains one sector, we return the page offset directly.
 	 */
-	if (!btrfs_is_subpage(fs_info, page)) {
+	if (!btrfs_is_subpage(fs_info, page->mapping)) {
 		*start = page_offset(page);
 		*end = page_offset(page) + PAGE_SIZE;
 		return;
@@ -1722,16 +1721,21 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
 		wbc_account_cgroup_owner(wbc, p, eb->len);
 		unlock_page(p);
 	} else {
-		for (int i = 0; i < num_extent_pages(eb); i++) {
-			struct page *p = folio_page(eb->folios[i], 0);
+		int num_folios = num_extent_folios(eb);
 
-			lock_page(p);
-			clear_page_dirty_for_io(p);
-			set_page_writeback(p);
-			__bio_add_page(&bbio->bio, p, PAGE_SIZE, 0);
-			wbc_account_cgroup_owner(wbc, p, PAGE_SIZE);
-			wbc->nr_to_write--;
-			unlock_page(p);
+		for (int i = 0; i < num_folios; i++) {
+			struct folio *folio = eb->folios[i];
+			bool ret;
+
+			folio_lock(folio);
+			folio_clear_dirty_for_io(folio);
+			folio_start_writeback(folio);
+			ret = bio_add_folio(&bbio->bio, folio, folio_size(folio), 0);
+			ASSERT(ret);
+			wbc_account_cgroup_owner(wbc, folio_page(folio, 0),
+						 folio_size(folio));
+			wbc->nr_to_write -= folio_nr_pages(folio);
+			folio_unlock(folio);
 		}
 	}
 	btrfs_submit_bio(bbio, 0);
@@ -3088,12 +3092,11 @@ static int extent_buffer_under_io(const struct extent_buffer *eb)
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
-static bool page_range_has_eb(struct btrfs_fs_info *fs_info, struct page *page)
+static bool folio_range_has_eb(struct btrfs_fs_info *fs_info, struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
 	struct btrfs_subpage *subpage;
 
-	lockdep_assert_held(&page->mapping->private_lock);
+	lockdep_assert_held(&folio->mapping->private_lock);
 
 	if (folio_test_private(folio)) {
 		subpage = folio_get_private(folio);
@@ -3109,22 +3112,22 @@ static bool page_range_has_eb(struct btrfs_fs_info *fs_info, struct page *page)
 	return false;
 }
 
-static void detach_extent_buffer_page(struct extent_buffer *eb, struct page *page)
+static void detach_extent_buffer_folio(struct extent_buffer *eb,
+				       struct folio *folio)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 	const bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
-	struct folio *folio = page_folio(page);
 
 	/*
 	 * For mapped eb, we're going to change the folio private, which should
 	 * be done under the private_lock.
 	 */
 	if (mapped)
-		spin_lock(&page->mapping->private_lock);
+		spin_lock(&folio->mapping->private_lock);
 
 	if (!folio_test_private(folio)) {
 		if (mapped)
-			spin_unlock(&page->mapping->private_lock);
+			spin_unlock(&folio->mapping->private_lock);
 		return;
 	}
 
@@ -3138,13 +3141,13 @@ static void detach_extent_buffer_page(struct extent_buffer *eb, struct page *pag
 		 */
 		if (folio_test_private(folio) && folio_get_private(folio) == eb) {
 			BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
-			BUG_ON(PageDirty(page));
-			BUG_ON(PageWriteback(page));
+			BUG_ON(folio_test_dirty(folio));
+			BUG_ON(folio_test_writeback(folio));
 			/* We need to make sure we haven't be attached to a new eb. */
 			folio_detach_private(folio);
 		}
 		if (mapped)
-			spin_unlock(&page->mapping->private_lock);
+			spin_unlock(&folio->mapping->private_lock);
 		return;
 	}
 
@@ -3154,41 +3157,41 @@ static void detach_extent_buffer_page(struct extent_buffer *eb, struct page *pag
 	 * attached to one dummy eb, no sharing.
 	 */
 	if (!mapped) {
-		btrfs_detach_subpage(fs_info, page);
+		btrfs_detach_subpage(fs_info, folio_page(folio, 0));
 		return;
 	}
 
-	btrfs_page_dec_eb_refs(fs_info, page);
+	btrfs_folio_dec_eb_refs(fs_info, folio);
 
 	/*
 	 * We can only detach the folio private if there are no other ebs in the
 	 * page range and no unfinished IO.
 	 */
-	if (!page_range_has_eb(fs_info, page))
-		btrfs_detach_subpage(fs_info, page);
+	if (!folio_range_has_eb(fs_info, folio))
+		btrfs_detach_subpage(fs_info, folio_page(folio, 0));
 
-	spin_unlock(&page->mapping->private_lock);
+	spin_unlock(&folio->mapping->private_lock);
 }
 
 /* Release all pages attached to the extent buffer */
 static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
 {
 	int i;
-	int num_pages;
+	int num_folios;
 
 	ASSERT(!extent_buffer_under_io(eb));
 
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		struct page *page = folio_page(eb->folios[i], 0);
+	num_folios = num_extent_folios(eb);
+	for (i = 0; i < num_folios; i++) {
+		struct folio *folio = eb->folios[i];
 
-		if (!page)
+		if (!folio)
 			continue;
 
-		detach_extent_buffer_page(eb, page);
+		detach_extent_buffer_folio(eb, folio);
 
-		/* One for when we allocated the page */
-		put_page(page);
+		/* One for when we allocated the folio. */
+		folio_put(folio);
 	}
 }
 
@@ -3228,7 +3231,7 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 {
 	int i;
 	struct extent_buffer *new;
-	int num_pages = num_extent_pages(src);
+	int num_folios = num_extent_folios(src);
 	int ret;
 
 	new = __alloc_extent_buffer(src->fs_info, src->start, src->len);
@@ -3248,16 +3251,16 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 		return NULL;
 	}
 
-	for (i = 0; i < num_pages; i++) {
+	for (i = 0; i < num_folios; i++) {
+		struct folio *folio = new->folios[i];
 		int ret;
-		struct page *p = folio_page(new->folios[i], 0);
 
-		ret = attach_extent_buffer_page(new, p, NULL);
+		ret = attach_extent_buffer_folio(new, folio, NULL);
 		if (ret < 0) {
 			btrfs_release_extent_buffer(new);
 			return NULL;
 		}
-		WARN_ON(PageDirty(p));
+		WARN_ON(folio_test_dirty(folio));
 	}
 	copy_extent_buffer_full(new, src);
 	set_extent_buffer_uptodate(new);
@@ -3269,7 +3272,7 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
 						  u64 start, unsigned long len)
 {
 	struct extent_buffer *eb;
-	int num_pages;
+	int num_folios = 0;
 	int i;
 	int ret;
 
@@ -3277,15 +3280,13 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
 	if (!eb)
 		return NULL;
 
-	num_pages = num_extent_pages(eb);
 	ret = alloc_eb_folio_array(eb, 0);
 	if (ret)
 		goto err;
 
-	for (i = 0; i < num_pages; i++) {
-		struct page *p = folio_page(eb->folios[i], 0);
-
-		ret = attach_extent_buffer_page(eb, p, NULL);
+	num_folios = num_extent_folios(eb);
+	for (i = 0; i < num_folios; i++) {
+		ret = attach_extent_buffer_folio(eb, eb->folios[i], NULL);
 		if (ret < 0)
 			goto err;
 	}
@@ -3296,10 +3297,10 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	return eb;
 err:
-	for (i = 0; i < num_pages; i++) {
+	for (i = 0; i < num_folios; i++) {
 		if (eb->folios[i]) {
-			detach_extent_buffer_page(eb, folio_page(eb->folios[i], 0));
-			__free_page(folio_page(eb->folios[i], 0));
+			detach_extent_buffer_folio(eb, eb->folios[i]);
+			__folio_put(eb->folios[i]);
 		}
 	}
 	__free_extent_buffer(eb);
@@ -3348,20 +3349,15 @@ static void check_buffer_tree_ref(struct extent_buffer *eb)
 	spin_unlock(&eb->refs_lock);
 }
 
-static void mark_extent_buffer_accessed(struct extent_buffer *eb,
-		struct page *accessed)
+static void mark_extent_buffer_accessed(struct extent_buffer *eb)
 {
-	int num_pages, i;
+	int num_folios;
 
 	check_buffer_tree_ref(eb);
 
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		struct page *p = folio_page(eb->folios[i], 0);
-
-		if (p != accessed)
-			mark_page_accessed(p);
-	}
+	num_folios = num_extent_folios(eb);
+	for (int i = 0; i < num_folios; i++)
+		folio_mark_accessed(eb->folios[i]);
 }
 
 struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
@@ -3389,7 +3385,7 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
 		spin_lock(&eb->refs_lock);
 		spin_unlock(&eb->refs_lock);
 	}
-	mark_extent_buffer_accessed(eb, NULL);
+	mark_extent_buffer_accessed(eb);
 	return eb;
 }
 
@@ -3503,9 +3499,12 @@ static int check_eb_alignment(struct btrfs_fs_info *fs_info, u64 start)
  * Return 0 if eb->folios[i] is attached to btree inode successfully.
  * Return >0 if there is already annother extent buffer for the range,
  * and @found_eb_ret would be updated.
+ * Return -EAGAIN if the filemap has an existing folio but with different size
+ * than @eb.
+ * The caller needs to free the existing folios and retry using the same order.
  */
-static int attach_eb_page_to_filemap(struct extent_buffer *eb, int i,
-				     struct extent_buffer **found_eb_ret)
+static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i,
+				      struct extent_buffer **found_eb_ret)
 {
 
 	struct btrfs_fs_info *fs_info = eb->fs_info;
@@ -3536,6 +3535,12 @@ static int attach_eb_page_to_filemap(struct extent_buffer *eb, int i,
 	 */
 	ASSERT(folio_nr_pages(existing_folio) == 1);
 
+	if (folio_size(existing_folio) != folio_size(eb->folios[0])) {
+		folio_unlock(existing_folio);
+		folio_put(existing_folio);
+		return -EAGAIN;
+	}
+
 	if (fs_info->nodesize < PAGE_SIZE) {
 		/*
 		 * We're going to reuse the existing page, can
@@ -3569,7 +3574,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 owner_root, int level)
 {
 	unsigned long len = fs_info->nodesize;
-	int num_pages;
+	int num_folios;
 	int attached = 0;
 	struct extent_buffer *eb;
 	struct extent_buffer *existing_eb = NULL;
@@ -3611,8 +3616,6 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 
 	btrfs_set_buffer_lockdep_class(lockdep_owner, eb, level);
 
-	num_pages = num_extent_pages(eb);
-
 	/*
 	 * Preallocate folio private for subpage case, so that we won't
 	 * allocate memory with private_lock nor page lock hold.
@@ -3628,6 +3631,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		}
 	}
 
+reallocate:
 	/* Allocate all pages first. */
 	ret = alloc_eb_folio_array(eb, __GFP_NOFAIL);
 	if (ret < 0) {
@@ -3635,26 +3639,53 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		goto out;
 	}
 
-	/* Attach all pages to the filemap. */
-	for (int i = 0; i < num_pages; i++) {
-		struct page *p;
+	num_folios = num_extent_folios(eb);
+	/*
+	 * Attach all pages to the filemap.
+	 */
+	for (int i = 0; i < num_folios; i++) {
+		struct folio *folio;
 
-		ret = attach_eb_page_to_filemap(eb, i, &existing_eb);
+		ret = attach_eb_folio_to_filemap(eb, i, &existing_eb);
 		if (ret > 0) {
 			ASSERT(existing_eb);
 			goto out;
 		}
+
+		/*
+		 * TODO: Special handling for a corner case where the order of
+		 * folios mismatch between the new eb and filemap.
+		 *
+		 * This happens when:
+		 *
+		 * - the new eb is using higher order folio
+		 *
+		 * - the filemap is still using 0-order folios for the range
+		 *   This can happen at the previous eb allocation, and we don't
+		 *   have higher order folio for the call.
+		 *
+		 * - the existing eb has already been freed
+		 *
+		 * In this case, we have to free the existing folios first, and
+		 * re-allocate using the same order.
+		 * Thankfully this is not going to happen yet, as we're still
+		 * using 0-order folios.
+		 */
+		if (unlikely(ret == -EAGAIN)) {
+			ASSERT(0);
+			goto reallocate;
+		}
 		attached++;
 
 		/*
-		 * Only after attach_eb_page_to_filemap(), eb->folios[] is
+		 * Only after attach_eb_folio_to_filemap(), eb->folios[] is
 		 * reliable, as we may choose to reuse the existing page cache
 		 * and free the allocated page.
 		 */
-		p = folio_page(eb->folios[i], 0);
+		folio = eb->folios[i];
 		spin_lock(&mapping->private_lock);
 		/* Should not fail, as we have preallocated the memory */
-		ret = attach_extent_buffer_page(eb, p, prealloc);
+		ret = attach_extent_buffer_folio(eb, folio, prealloc);
 		ASSERT(!ret);
 		/*
 		 * To inform we have extra eb under allocation, so that
@@ -3665,19 +3696,23 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		 * detach_extent_buffer_page().
 		 * Thus needs no special handling in error path.
 		 */
-		btrfs_page_inc_eb_refs(fs_info, p);
+		btrfs_folio_inc_eb_refs(fs_info, folio);
 		spin_unlock(&mapping->private_lock);
 
-		WARN_ON(btrfs_page_test_dirty(fs_info, p, eb->start, eb->len));
+		WARN_ON(btrfs_page_test_dirty(fs_info, folio_page(folio, 0),
+					      eb->start, eb->len));
 
 		/*
 		 * Check if the current page is physically contiguous with previous eb
 		 * page.
+		 * At this stage, either we allocated a large folio, thus @i
+		 * would only be 0, or we fall back to per-page allocation.
 		 */
-		if (i && folio_page(eb->folios[i - 1], 0) + 1 != p)
+		if (i && folio_page(eb->folios[i - 1], 0) + 1 != folio_page(folio, 0))
 			page_contig = false;
 
-		if (!btrfs_page_test_uptodate(fs_info, p, eb->start, eb->len))
+		if (!btrfs_page_test_uptodate(fs_info, folio_page(folio, 0),
+					      eb->start, eb->len))
 			uptodate = 0;
 
 		/*
@@ -3719,7 +3754,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	 * btree_release_folio will correctly detect that a page belongs to a
 	 * live buffer and won't free them prematurely.
 	 */
-	for (int i = 0; i < num_pages; i++)
+	for (int i = 0; i < num_folios; i++)
 		unlock_page(folio_page(eb->folios[i], 0));
 	return eb;
 
@@ -3727,7 +3762,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	WARN_ON(!atomic_dec_and_test(&eb->refs));
 	for (int i = 0; i < attached; i++) {
 		ASSERT(eb->folios[i]);
-		detach_extent_buffer_page(eb, folio_page(eb->folios[i], 0));
+		detach_extent_buffer_folio(eb, eb->folios[i]);
 		unlock_page(folio_page(eb->folios[i], 0));
 	}
 	/*
@@ -3832,31 +3867,31 @@ void free_extent_buffer_stale(struct extent_buffer *eb)
 	release_extent_buffer(eb);
 }
 
-static void btree_clear_page_dirty(struct page *page)
+static void btree_clear_folio_dirty(struct folio *folio)
 {
-	ASSERT(PageDirty(page));
-	ASSERT(PageLocked(page));
-	clear_page_dirty_for_io(page);
-	xa_lock_irq(&page->mapping->i_pages);
-	if (!PageDirty(page))
-		__xa_clear_mark(&page->mapping->i_pages,
-				page_index(page), PAGECACHE_TAG_DIRTY);
-	xa_unlock_irq(&page->mapping->i_pages);
+	ASSERT(folio_test_dirty(folio));
+	ASSERT(folio_test_locked(folio));
+	folio_clear_dirty_for_io(folio);
+	xa_lock_irq(&folio->mapping->i_pages);
+	if (!folio_test_dirty(folio))
+		__xa_clear_mark(&folio->mapping->i_pages,
+				folio_index(folio), PAGECACHE_TAG_DIRTY);
+	xa_unlock_irq(&folio->mapping->i_pages);
 }
 
 static void clear_subpage_extent_buffer_dirty(const struct extent_buffer *eb)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-	struct page *page = folio_page(eb->folios[0], 0);
+	struct folio *folio = eb->folios[0];
 	bool last;
 
-	/* btree_clear_page_dirty() needs page locked */
-	lock_page(page);
-	last = btrfs_subpage_clear_and_test_dirty(fs_info, page, eb->start,
-						  eb->len);
+	/* btree_clear_folio_dirty() needs page locked */
+	folio_lock(folio);
+	last = btrfs_subpage_clear_and_test_dirty(fs_info, folio_page(folio, 0),
+			eb->start, eb->len);
 	if (last)
-		btree_clear_page_dirty(page);
-	unlock_page(page);
+		btree_clear_folio_dirty(folio);
+	folio_unlock(folio);
 	WARN_ON(atomic_read(&eb->refs) == 0);
 }
 
@@ -3865,8 +3900,7 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 	int i;
-	int num_pages;
-	struct page *page;
+	int num_folios;
 
 	btrfs_assert_tree_write_locked(eb);
 
@@ -3896,15 +3930,15 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 	if (eb->fs_info->nodesize < PAGE_SIZE)
 		return clear_subpage_extent_buffer_dirty(eb);
 
-	num_pages = num_extent_pages(eb);
+	num_folios = num_extent_folios(eb);
+	for (i = 0; i < num_folios; i++) {
+		struct folio *folio = eb->folios[i];
 
-	for (i = 0; i < num_pages; i++) {
-		page = folio_page(eb->folios[i], 0);
-		if (!PageDirty(page))
+		if (!folio_test_dirty(folio))
 			continue;
-		lock_page(page);
-		btree_clear_page_dirty(page);
-		unlock_page(page);
+		folio_lock(folio);
+		btree_clear_folio_dirty(folio);
+		folio_unlock(folio);
 	}
 	WARN_ON(atomic_read(&eb->refs) == 0);
 }
@@ -3912,14 +3946,14 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
 void set_extent_buffer_dirty(struct extent_buffer *eb)
 {
 	int i;
-	int num_pages;
+	int num_folios;
 	bool was_dirty;
 
 	check_buffer_tree_ref(eb);
 
 	was_dirty = test_and_set_bit(EXTENT_BUFFER_DIRTY, &eb->bflags);
 
-	num_pages = num_extent_pages(eb);
+	num_folios = num_extent_folios(eb);
 	WARN_ON(atomic_read(&eb->refs) == 0);
 	WARN_ON(!test_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags));
 
@@ -3939,7 +3973,7 @@ void set_extent_buffer_dirty(struct extent_buffer *eb)
 		 */
 		if (subpage)
 			lock_page(folio_page(eb->folios[0], 0));
-		for (i = 0; i < num_pages; i++)
+		for (i = 0; i < num_folios; i++)
 			btrfs_page_set_dirty(eb->fs_info, folio_page(eb->folios[i], 0),
 					     eb->start, eb->len);
 		if (subpage)
@@ -3949,23 +3983,23 @@ void set_extent_buffer_dirty(struct extent_buffer *eb)
 					 eb->fs_info->dirty_metadata_batch);
 	}
 #ifdef CONFIG_BTRFS_DEBUG
-	for (i = 0; i < num_pages; i++)
-		ASSERT(PageDirty(folio_page(eb->folios[i], 0)));
+	for (i = 0; i < num_folios; i++)
+		ASSERT(folio_test_dirty(eb->folios[i]));
 #endif
 }
 
 void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-	struct page *page;
-	int num_pages;
+	int num_folios;
 	int i;
 
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		page = folio_page(eb->folios[i], 0);
-		if (!page)
+	num_folios = num_extent_folios(eb);
+	for (i = 0; i < num_folios; i++) {
+		struct folio *folio = eb->folios[i];
+
+		if (!folio)
 			continue;
 
 		/*
@@ -3973,34 +4007,33 @@ void clear_extent_buffer_uptodate(struct extent_buffer *eb)
 		 * btrfs_is_subpage() can not handle cloned/dummy metadata.
 		 */
 		if (fs_info->nodesize >= PAGE_SIZE)
-			ClearPageUptodate(page);
+			folio_clear_uptodate(folio);
 		else
-			btrfs_subpage_clear_uptodate(fs_info, page, eb->start,
-						     eb->len);
+			btrfs_subpage_clear_uptodate(fs_info, folio_page(folio, 0),
+						     eb->start, eb->len);
 	}
 }
 
 void set_extent_buffer_uptodate(struct extent_buffer *eb)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-	struct page *page;
-	int num_pages;
+	int num_folios;
 	int i;
 
 	set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		page = folio_page(eb->folios[i], 0);
+	num_folios = num_extent_folios(eb);
+	for (i = 0; i < num_folios; i++) {
+		struct folio *folio = eb->folios[i];
 
 		/*
 		 * This is special handling for metadata subpage, as regular
 		 * btrfs_is_subpage() can not handle cloned/dummy metadata.
 		 */
 		if (fs_info->nodesize >= PAGE_SIZE)
-			SetPageUptodate(page);
+			folio_mark_uptodate(folio);
 		else
-			btrfs_subpage_set_uptodate(fs_info, page, eb->start,
-						   eb->len);
+			btrfs_subpage_set_uptodate(fs_info, folio_page(folio, 0),
+						   eb->start, eb->len);
 	}
 }
 
@@ -4050,8 +4083,8 @@ static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
 int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 			     struct btrfs_tree_parent_check *check)
 {
-	int num_pages = num_extent_pages(eb), i;
 	struct btrfs_bio *bbio;
+	bool ret;
 
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
@@ -4081,12 +4114,18 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	bbio->file_offset = eb->start;
 	memcpy(&bbio->parent_check, check, sizeof(*check));
 	if (eb->fs_info->nodesize < PAGE_SIZE) {
-		__bio_add_page(&bbio->bio, folio_page(eb->folios[0], 0), eb->len,
-			       eb->start - folio_pos(eb->folios[0]));
+		ret = bio_add_folio(&bbio->bio, eb->folios[0], eb->len,
+				    eb->start - folio_pos(eb->folios[0]));
+		ASSERT(ret);
 	} else {
-		for (i = 0; i < num_pages; i++)
-			__bio_add_page(&bbio->bio, folio_page(eb->folios[i], 0),
-				       PAGE_SIZE, 0);
+		int num_folios = num_extent_folios(eb);
+
+		for (int i = 0; i < num_folios; i++) {
+			struct folio *folio = eb->folios[i];
+
+			ret = bio_add_folio(&bbio->bio, folio, folio_size(folio), 0);
+			ASSERT(ret);
+		}
 	}
 	btrfs_submit_bio(bbio, mirror_num);
 
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 66c2e214b141..a5fd5cb20a3c 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -243,6 +243,20 @@ static inline int num_extent_pages(const struct extent_buffer *eb)
 	return (eb->len >> PAGE_SHIFT) ?: 1;
 }
 
+/*
+ * This can only be determined at runtime by checking eb::folios[0].
+ *
+ * As we can have either one large folio covering the whole eb
+ * (either nodesize <= PAGE_SIZE, or high order folio), or multiple
+ * single-paged folios.
+ */
+static inline int num_extent_folios(const struct extent_buffer *eb)
+{
+	if (folio_order(eb->folios[0]))
+		return 1;
+	return num_extent_pages(eb);
+}
+
 static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
 {
 	return test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f7c0a5ec675f..9ede6aa77fde 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7863,7 +7863,7 @@ static void wait_subpage_spinlock(struct page *page)
 	struct folio *folio = page_folio(page);
 	struct btrfs_subpage *subpage;
 
-	if (!btrfs_is_subpage(fs_info, page))
+	if (!btrfs_is_subpage(fs_info, page->mapping))
 		return;
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index caf0013f2545..7fd7671be458 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -64,7 +64,8 @@
  *   This means a slightly higher tree locking latency.
  */
 
-bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info, struct page *page)
+bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info,
+		      struct address_space *mapping)
 {
 	if (fs_info->sectorsize >= PAGE_SIZE)
 		return false;
@@ -74,8 +75,7 @@ bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info, struct page *page)
 	 * mapping. And if page->mapping->host is data inode, it's subpage.
 	 * As we have ruled our sectorsize >= PAGE_SIZE case already.
 	 */
-	if (!page->mapping || !page->mapping->host ||
-	    is_data_inode(page->mapping->host))
+	if (!mapping || !mapping->host || is_data_inode(mapping->host))
 		return true;
 
 	/*
@@ -129,7 +129,8 @@ int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
 		ASSERT(PageLocked(page));
 
 	/* Either not subpage, or the folio already has private attached. */
-	if (!btrfs_is_subpage(fs_info, page) || folio_test_private(folio))
+	if (!btrfs_is_subpage(fs_info, page->mapping) ||
+	    folio_test_private(folio))
 		return 0;
 
 	subpage = btrfs_alloc_subpage(fs_info, type);
@@ -147,7 +148,8 @@ void btrfs_detach_subpage(const struct btrfs_fs_info *fs_info,
 	struct btrfs_subpage *subpage;
 
 	/* Either not subpage, or the folio already has private attached. */
-	if (!btrfs_is_subpage(fs_info, page) || !folio_test_private(folio))
+	if (!btrfs_is_subpage(fs_info, page->mapping) ||
+	    !folio_test_private(folio))
 		return;
 
 	subpage = folio_detach_private(folio);
@@ -193,33 +195,31 @@ void btrfs_free_subpage(struct btrfs_subpage *subpage)
  * detach_extent_buffer_page() won't detach the folio private while we're still
  * allocating the extent buffer.
  */
-void btrfs_page_inc_eb_refs(const struct btrfs_fs_info *fs_info,
-			    struct page *page)
+void btrfs_folio_inc_eb_refs(const struct btrfs_fs_info *fs_info,
+			     struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
 	struct btrfs_subpage *subpage;
 
-	if (!btrfs_is_subpage(fs_info, page))
+	if (!btrfs_is_subpage(fs_info, folio->mapping))
 		return;
 
-	ASSERT(folio_test_private(folio) && page->mapping);
-	lockdep_assert_held(&page->mapping->private_lock);
+	ASSERT(folio_test_private(folio) && folio->mapping);
+	lockdep_assert_held(&folio->mapping->private_lock);
 
 	subpage = folio_get_private(folio);
 	atomic_inc(&subpage->eb_refs);
 }
 
-void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
-			    struct page *page)
+void btrfs_folio_dec_eb_refs(const struct btrfs_fs_info *fs_info,
+			     struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
 	struct btrfs_subpage *subpage;
 
-	if (!btrfs_is_subpage(fs_info, page))
+	if (!btrfs_is_subpage(fs_info, folio->mapping))
 		return;
 
-	ASSERT(folio_test_private(folio) && page->mapping);
-	lockdep_assert_held(&page->mapping->private_lock);
+	ASSERT(folio_test_private(folio) && folio->mapping);
+	lockdep_assert_held(&folio->mapping->private_lock);
 
 	subpage = folio_get_private(folio);
 	ASSERT(atomic_read(&subpage->eb_refs));
@@ -352,7 +352,7 @@ int btrfs_page_start_writer_lock(const struct btrfs_fs_info *fs_info,
 {
 	struct folio *folio = page_folio(page);
 
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page->mapping)) {
 		lock_page(page);
 		return 0;
 	}
@@ -369,7 +369,7 @@ int btrfs_page_start_writer_lock(const struct btrfs_fs_info *fs_info,
 void btrfs_page_end_writer_lock(const struct btrfs_fs_info *fs_info,
 		struct page *page, u64 start, u32 len)
 {
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page))
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page->mapping))
 		return unlock_page(page);
 	btrfs_subpage_clamp_range(page, &start, &len);
 	if (btrfs_subpage_end_and_test_writer(fs_info, page, start, len))
@@ -612,7 +612,8 @@ IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(checked);
 void btrfs_page_set_##name(const struct btrfs_fs_info *fs_info,		\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {	\
+	if (unlikely(!fs_info) ||					\
+	    !btrfs_is_subpage(fs_info, page->mapping)) {		\
 		set_page_func(page);					\
 		return;							\
 	}								\
@@ -621,7 +622,8 @@ void btrfs_page_set_##name(const struct btrfs_fs_info *fs_info,		\
 void btrfs_page_clear_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {	\
+	if (unlikely(!fs_info) ||					\
+	    !btrfs_is_subpage(fs_info, page->mapping)) {		\
 		clear_page_func(page);					\
 		return;							\
 	}								\
@@ -630,14 +632,16 @@ void btrfs_page_clear_##name(const struct btrfs_fs_info *fs_info,	\
 bool btrfs_page_test_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page))	\
+	if (unlikely(!fs_info) ||					\
+	    !btrfs_is_subpage(fs_info, page->mapping))			\
 		return test_page_func(page);				\
 	return btrfs_subpage_test_##name(fs_info, page, start, len);	\
 }									\
 void btrfs_page_clamp_set_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {	\
+	if (unlikely(!fs_info) ||					\
+	    !btrfs_is_subpage(fs_info, page->mapping)) {	\
 		set_page_func(page);					\
 		return;							\
 	}								\
@@ -647,7 +651,8 @@ void btrfs_page_clamp_set_##name(const struct btrfs_fs_info *fs_info,	\
 void btrfs_page_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page)) {	\
+	if (unlikely(!fs_info) ||					\
+	    !btrfs_is_subpage(fs_info, page->mapping)) {		\
 		clear_page_func(page);					\
 		return;							\
 	}								\
@@ -657,7 +662,8 @@ void btrfs_page_clamp_clear_##name(const struct btrfs_fs_info *fs_info, \
 bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len)			\
 {									\
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, page))	\
+	if (unlikely(!fs_info) ||					\
+	    !btrfs_is_subpage(fs_info, page->mapping)) \
 		return test_page_func(page);				\
 	btrfs_subpage_clamp_range(page, &start, &len);			\
 	return btrfs_subpage_test_##name(fs_info, page, start, len);	\
@@ -686,7 +692,7 @@ void btrfs_page_assert_not_dirty(const struct btrfs_fs_info *fs_info,
 		return;
 
 	ASSERT(!PageDirty(page));
-	if (!btrfs_is_subpage(fs_info, page))
+	if (!btrfs_is_subpage(fs_info, page->mapping))
 		return;
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
@@ -716,7 +722,7 @@ void btrfs_page_unlock_writer(struct btrfs_fs_info *fs_info, struct page *page,
 
 	ASSERT(PageLocked(page));
 	/* For non-subpage case, we just unlock the page */
-	if (!btrfs_is_subpage(fs_info, page))
+	if (!btrfs_is_subpage(fs_info, page->mapping))
 		return unlock_page(page);
 
 	ASSERT(folio_test_private(folio) && folio_get_private(folio));
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 5cbf67ccbdeb..93d1c5690faf 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -73,7 +73,8 @@ enum btrfs_subpage_type {
 	BTRFS_SUBPAGE_DATA,
 };
 
-bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info, struct page *page);
+bool btrfs_is_subpage(const struct btrfs_fs_info *fs_info,
+		      struct address_space *mapping);
 
 void btrfs_init_subpage_info(struct btrfs_subpage_info *subpage_info, u32 sectorsize);
 int btrfs_attach_subpage(const struct btrfs_fs_info *fs_info,
@@ -86,10 +87,10 @@ struct btrfs_subpage *btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
 					  enum btrfs_subpage_type type);
 void btrfs_free_subpage(struct btrfs_subpage *subpage);
 
-void btrfs_page_inc_eb_refs(const struct btrfs_fs_info *fs_info,
-			    struct page *page);
-void btrfs_page_dec_eb_refs(const struct btrfs_fs_info *fs_info,
-			    struct page *page);
+void btrfs_folio_inc_eb_refs(const struct btrfs_fs_info *fs_info,
+			     struct folio *folio);
+void btrfs_folio_dec_eb_refs(const struct btrfs_fs_info *fs_info,
+			     struct folio *folio);
 
 void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info,
 		struct page *page, u64 start, u32 len);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups
  2023-12-01  6:06 [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups Qu Wenruo
  2023-12-01  6:06 ` [PATCH v2 1/2] btrfs: migrate extent_buffer::pages[] to folio Qu Wenruo
  2023-12-01  6:06 ` [PATCH v2 2/2] btrfs: cleanup metadata page pointer usage Qu Wenruo
@ 2023-12-04 16:26 ` David Sterba
  2023-12-04 21:10   ` Qu Wenruo
  2 siblings, 1 reply; 8+ messages in thread
From: David Sterba @ 2023-12-04 16:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Dec 01, 2023 at 04:36:53PM +1030, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Adda new patch to do more cleanups for metadata page pointers usage
> 
> This patchset would migrate extent_buffer::pages[] to folios[], then
> cleanup the existing metadata page pointer usage to proper folios ones.
> 
> This cleanup would help future higher order folios usage for metadata in
> the following aspects:
> 
> - No more need to iterate through the remaining pages for page flags
>   We just call folio_set/mark/start_*() helpers, for the single folio.
>   The helper would only set the flag (mostly for the leading page).
> 
> - Single bio_add_folio() call for the whole eb
> 
> - Better filio helpers naming
>   PageUptodate() compared to folio_test_uptodate().
> 
> The first patch would do a very simple conversion, then the 2nd patch do
> the prepartion for the higher order folio situation.
> 
> There are two locations which won't be converted to folios yet:
> 
> - Subpage code
>   There is no meaning to support higher order folio for subpage.
>   The two conditions are just conflicting with each other.
> 
> - Data page pointers
>   That would be more useful in the future, before we going to support
>   multi-page sectorsize.
> 
> However the 2nd one would also add a new corner case:
> 
> - Order mismatch in filemap and eb folios
>   Unforatunately I don't have a better plan other than re-allocate the
>   folios to the same order.
>   Maybe in the future we would have better ways to handle it? Like
>   migrating the pages to the higher order one?

As long as it's a no-op for now this is OK, we can do the higher order
allocation for eb pages afterwards.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups
  2023-12-04 16:26 ` [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups David Sterba
@ 2023-12-04 21:10   ` Qu Wenruo
  2023-12-05 14:04     ` David Sterba
  0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2023-12-04 21:10 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



On 2023/12/5 02:56, David Sterba wrote:
> On Fri, Dec 01, 2023 at 04:36:53PM +1030, Qu Wenruo wrote:
>> [CHANGELOG]
>> v2:
>> - Adda new patch to do more cleanups for metadata page pointers usage
>>
>> This patchset would migrate extent_buffer::pages[] to folios[], then
>> cleanup the existing metadata page pointer usage to proper folios ones.
>>
>> This cleanup would help future higher order folios usage for metadata in
>> the following aspects:
>>
>> - No more need to iterate through the remaining pages for page flags
>>    We just call folio_set/mark/start_*() helpers, for the single folio.
>>    The helper would only set the flag (mostly for the leading page).
>>
>> - Single bio_add_folio() call for the whole eb
>>
>> - Better filio helpers naming
>>    PageUptodate() compared to folio_test_uptodate().
>>
>> The first patch would do a very simple conversion, then the 2nd patch do
>> the prepartion for the higher order folio situation.
>>
>> There are two locations which won't be converted to folios yet:
>>
>> - Subpage code
>>    There is no meaning to support higher order folio for subpage.
>>    The two conditions are just conflicting with each other.
>>
>> - Data page pointers
>>    That would be more useful in the future, before we going to support
>>    multi-page sectorsize.
>>
>> However the 2nd one would also add a new corner case:
>>
>> - Order mismatch in filemap and eb folios
>>    Unforatunately I don't have a better plan other than re-allocate the
>>    folios to the same order.
>>    Maybe in the future we would have better ways to handle it? Like
>>    migrating the pages to the higher order one?
>
> As long as it's a no-op for now this is OK, we can do the higher order
> allocation for eb pages afterwards.
>
Yep, it won't cause any problem for now.

Although this corner case is making me wondering if the new
alloc-then-attach is really any better than the original
alloc-and-attach solution.

If the mm (filemap) layer can allow us to allocate larger folios, it may
be much simpler.
The current code only needs to setlarge folio support for the mapping,
then go with high order fpg_flags.
The filemap code is already doing the retry and unalignment check.

But the existing filemap code would also try to reduce the order, which
can lead to other problems, like one extent buffer with multiple
different order folios.
Meanwhile alloc-and-attach gives us full control on the order, thus
allowing all-or-none (one single large folio, or all single page ones)
solution required by the 2nd patch.

Anyway, I would continue with the current alloc-then-attach method to
experiment the higher order folios allocation first to find out all the
pitfalls first.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] btrfs: cleanup metadata page pointer usage
  2023-12-01  6:06 ` [PATCH v2 2/2] btrfs: cleanup metadata page pointer usage Qu Wenruo
@ 2023-12-05 14:00   ` David Sterba
  2023-12-05 20:13     ` Qu Wenruo
  0 siblings, 1 reply; 8+ messages in thread
From: David Sterba @ 2023-12-05 14:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Dec 01, 2023 at 04:36:55PM +1030, Qu Wenruo wrote:
> Although we have migrated extent_buffer::pages[] to folios[], we're
> still mostly using the folio_page() help to grab the page.
> 
> This patch would do the following cleanups for metadata:
> 
> - Introduce num_extent_folios() helper
>   This is to replace most num_extent_pages() callers.
> 
> - Use num_extent_folios() to iterate future large folios
>   This allows us to use things like
>   bio_add_folio()/bio_add_folio_nofail(), and only set the needed flags
>   for the folio (aka the leading/tailing page), which reduces the loop
>   iteration to 1 for large folios.
> 
> - Change metadata related functions to use folio pointers
>   Including their function name, involving:
>   * attach_extent_buffer_page()
>   * detach_extent_buffer_page()
>   * page_range_has_eb()
>   * btrfs_release_extent_buffer_pages()
>   * btree_clear_page_dirty()
>   * btrfs_page_inc_eb_refs()
>   * btrfs_page_dec_eb_refs()
> 
> - Change btrfs_is_subpage() to accept an address_space pointer
>   This is to allow both page->mapping and folio->mapping to be utilized.
>   As data is still using the old per-page code, and may keep so for a
>   while.
> 
> - Special corner case place holder for future order mismatches between
>   extent buffer and inode filemap
>   For now it's  just a block of comments and a dead ASSERT(), no real
>   handling yet.
> 
> The subpage code would still go page, just because subpage and large
> folio are conflicting conditions, thus we don't need to bother subpage
> with higher order folios at all. Just folio_page(folio, 0) would be
> enough.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

KASAN reports some problems:

[  228.984474] Btrfs loaded, debug=on, assert=on, ref-verify=on, zoned=yes, fsverity=yes
[  228.986241] BTRFS: selftest: sectorsize: 4096  nodesize: 4096
[  228.987192] BTRFS: selftest: running btrfs free space cache tests
[  228.988141] BTRFS: selftest: running extent only tests
[  228.989096] BTRFS: selftest: running bitmap only tests
[  228.990108] BTRFS: selftest: running bitmap and extent tests
[  228.991396] BTRFS: selftest: running space stealing from bitmap to extent tests
[  228.993137] BTRFS: selftest: running bytes index tests
[  228.994741] BTRFS: selftest: running extent buffer operation tests
[  228.995875] BTRFS: selftest: running btrfs_split_item tests
[  228.997062] ==================================================================
[  228.998388] BUG: KASAN: global-out-of-bounds in alloc_eb_folio_array+0xd6/0x180 [btrfs]
[  229.000005] Read of size 8 at addr ffffffffc07f32e8 by task modprobe/13543
[  229.000973] 
[  229.001294] CPU: 2 PID: 13543 Comm: modprobe Tainted: G           O       6.7.0-rc4-default+ #2250
[  229.002556] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[  229.004054] Call Trace:
[  229.004489]  <TASK>
[  229.004913]  dump_stack_lvl+0x46/0x70
[  229.005546]  print_address_description.constprop.0+0x30/0x420
[  229.006431]  print_report+0xb0/0x260
[  229.007023]  ? kasan_addr_to_slab+0x9/0xc0
[  229.007685]  kasan_report+0xbe/0xf0
[  229.008275]  ? alloc_eb_folio_array+0xd6/0x180 [btrfs]
[  229.009212]  ? alloc_eb_folio_array+0xd6/0x180 [btrfs]
[  229.010113]  alloc_eb_folio_array+0xd6/0x180 [btrfs]
[  229.010932]  ? btrfs_alloc_page_array+0x100/0x100 [btrfs]
[  229.011816]  ? is_module_address+0x11/0x30
[  229.012488]  ? static_obj+0x73/0xa0
[  229.013054]  ? lockdep_init_map_type+0xe8/0x360
[  229.013751]  ? __raw_spin_lock_init+0x73/0x90
[  229.014426]  ? __alloc_extent_buffer+0x14a/0x190 [btrfs]
[  229.015225]  __alloc_dummy_extent_buffer+0x2e/0x2b0 [btrfs]
[  229.016121]  test_btrfs_split_item+0xcf/0x7d0 [btrfs]
[  229.016995]  ? btrfs_test_free_space_cache+0x1e0/0x1e0 [btrfs]
[  229.017872]  ? info_print_prefix+0x100/0x100
[  229.018523]  ? btrfs_test_free_space_cache+0xda/0x1e0 [btrfs]
[  229.019461]  ? __kmem_cache_free+0xfa/0x200
[  229.020064]  btrfs_run_sanity_tests+0x78/0x140 [btrfs]
[  229.020976]  init_btrfs_fs+0x38/0x220 [btrfs]
[  229.021867]  ? btrfs_interface_init+0x20/0x20 [btrfs]
[  229.022818]  do_one_initcall+0xc3/0x3b0
[  229.023473]  ? trace_event_raw_event_initcall_level+0x150/0x150
[  229.024436]  ? __kmem_cache_alloc_node+0x1b5/0x2e0
[  229.025279]  ? do_init_module+0x38/0x3b0
[  229.025806]  ? kasan_unpoison+0x40/0x60
[  229.026477]  do_init_module+0x135/0x3b0
[  229.027134]  load_module+0x11c1/0x13c0
[  229.027766]  ? layout_and_allocate.isra.0+0x280/0x280
[  229.028435]  ? kernel_read_file+0x252/0x3f0
[  229.029060]  ? __ia32_sys_fsconfig+0x70/0x70
[  229.029755]  ? init_module_from_file+0xd1/0x130
[  229.030489]  init_module_from_file+0xd1/0x130
[  229.031184]  ? __do_sys_init_module+0x1a0/0x1a0
[  229.031954]  ? idempotent_init_module+0x3b9/0x3d0
[  229.032727]  ? do_raw_spin_unlock+0x84/0xf0
[  229.033393]  idempotent_init_module+0x1ac/0x3d0
[  229.034101]  ? init_module_from_file+0x130/0x130
[  229.034837]  ? __fget_files+0xfd/0x1e0
[  229.035449]  __x64_sys_finit_module+0x72/0xb0
[  229.037902]  do_syscall_64+0x41/0xe0
[  229.038495]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
[  229.039314] RIP: 0033:0x7f07839f53dd
[  229.042725] RSP: 002b:00007fff7eb94c18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  229.043642] RAX: ffffffffffffffda RBX: 00005574ab37ba20 RCX: 00007f07839f53dd
[  229.044833] RDX: 0000000000000000 RSI: 00005574ab372c5a RDI: 000000000000000e
[  229.046002] RBP: 00005574ab372c5a R08: 0000000000000000 R09: 00005574ab37baa0
[  229.047195] R10: 00005574ab37f850 R11: 0000000000000246 R12: 0000000000040000
[  229.048275] R13: 0000000000000000 R14: 00005574ab3841b0 R15: 00005574ab37b480
[  229.049320]  </TASK>
[  229.049767] 
[  229.050124] The buggy address belongs to the variable:
[  229.050961]  __func__.1+0x288/0xffffffffffeeafa0 [btrfs]
[  229.051928] 
[  229.052205] Memory state around the buggy address:
[  229.052971]  ffffffffc07f3180: f9 f9 f9 f9 00 04 f9 f9 f9 f9 f9 f9 00 00 00 00
[  229.054075]  ffffffffc07f3200: 00 00 00 00 03 f9 f9 f9 f9 f9 f9 f9 00 00 00 03
[  229.055173] >ffffffffc07f3280: f9 f9 f9 f9 00 02 f9 f9 f9 f9 f9 f9 00 07 f9 f9
[  229.056210]                                                           ^
[  229.057066]  ffffffffc07f3300: f9 f9 f9 f9 00 00 00 00 04 f9 f9 f9 f9 f9 f9 f9
[  229.058152]  ffffffffc07f3380: 00 00 00 02 f9 f9 f9 f9 00 00 03 f9 f9 f9 f9 f9
[  229.059119] ==================================================================
[  229.060341] Disabling lock debugging due to kernel taint
[  229.061254] BUG: unable to handle page fault for address: 00006b636f6c5f7a
[  229.062383] #PF: supervisor read access in kernel mode
[  229.063232] #PF: error_code(0x0000) - not-present page
[  229.064079] PGD 0 P4D 0 
[  229.064544] Oops: 0000 [#1] PREEMPT SMP KASAN
[  229.065296] CPU: 2 PID: 13543 Comm: modprobe Tainted: G    B      O       6.7.0-rc4-default+ #2250
[  229.066783] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[  229.068574] RIP: 0010:folio_flags.constprop.0+0x12/0x30 [btrfs]
[  229.072394] RSP: 0018:ffff88800294f840 EFLAGS: 00010282
[  229.073281] RAX: 0000000000000000 RBX: 00006b636f6c5f72 RCX: ffffffffc05c5c12
[  229.074409] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00006b636f6c5f7a
[  229.075488] RBP: 00006b636f6c5f72 R08: 0000000000000001 R09: fffffbfff37183f8
[  229.076600] R10: ffffffff9b8c1fc7 R11: 0000000000000001 R12: 0000000000000000
[  229.077723] R13: ffff88801d1e4008 R14: 0000000000001000 R15: ffffffffc08b0228
[  229.078905] FS:  00007f07838e5740(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
[  229.080262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  229.081228] CR2: 00006b636f6c5f7a CR3: 000000000598a000 CR4: 00000000000006b0
[  229.082417] Call Trace:
[  229.082849]  <TASK>
[  229.083289]  ? __die+0x1f/0x60
[  229.083896]  ? page_fault_oops+0x71/0xa0
[  229.084629]  ? exc_page_fault+0x69/0xd0
[  229.085354]  ? asm_exc_page_fault+0x22/0x30
[  229.086130]  ? folio_flags.constprop.0+0x12/0x30 [btrfs]
[  229.087170]  ? folio_flags.constprop.0+0x12/0x30 [btrfs]
[  229.088173]  __alloc_dummy_extent_buffer+0x5b/0x2b0 [btrfs]
[  229.089206]  test_btrfs_split_item+0xcf/0x7d0 [btrfs]
[  229.090139]  ? btrfs_test_free_space_cache+0x1e0/0x1e0 [btrfs]
[  229.091192]  ? info_print_prefix+0x100/0x100
[  229.091900]  ? btrfs_test_free_space_cache+0xda/0x1e0 [btrfs]
[  229.092924]  ? __kmem_cache_free+0xfa/0x200
[  229.093604]  btrfs_run_sanity_tests+0x78/0x140 [btrfs]
[  229.094544]  init_btrfs_fs+0x38/0x220 [btrfs]
[  229.095401]  ? btrfs_interface_init+0x20/0x20 [btrfs]
[  229.096325]  do_one_initcall+0xc3/0x3b0
[  229.096948]  ? trace_event_raw_event_initcall_level+0x150/0x150
[  229.097765]  ? __kmem_cache_alloc_node+0x1b5/0x2e0
[  229.098594]  ? do_init_module+0x38/0x3b0
[  229.099189]  ? kasan_unpoison+0x40/0x60
[  229.099794]  do_init_module+0x135/0x3b0
[  229.100479]  load_module+0x11c1/0x13c0
[  229.101095]  ? layout_and_allocate.isra.0+0x280/0x280
[  229.101874]  ? kernel_read_file+0x252/0x3f0
[  229.102481]  ? __ia32_sys_fsconfig+0x70/0x70
[  229.103189]  ? init_module_from_file+0xd1/0x130
[  229.103918]  init_module_from_file+0xd1/0x130
[  229.104615]  ? __do_sys_init_module+0x1a0/0x1a0
[  229.105352]  ? idempotent_init_module+0x3b9/0x3d0
[  229.106118]  ? do_raw_spin_unlock+0x84/0xf0
[  229.106787]  idempotent_init_module+0x1ac/0x3d0
[  229.107592]  ? init_module_from_file+0x130/0x130
[  229.108413]  ? __fget_files+0xfd/0x1e0
[  229.108979]  __x64_sys_finit_module+0x72/0xb0
[  229.109686]  do_syscall_64+0x41/0xe0
[  229.110286]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
[  229.111074] RIP: 0033:0x7f07839f53dd
[  229.114183] RSP: 002b:00007fff7eb94c18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  229.115057] RAX: ffffffffffffffda RBX: 00005574ab37ba20 RCX: 00007f07839f53dd
[  229.116099] RDX: 0000000000000000 RSI: 00005574ab372c5a RDI: 000000000000000e
[  229.117162] RBP: 00005574ab372c5a R08: 0000000000000000 R09: 00005574ab37baa0
[  229.118178] R10: 00005574ab37f850 R11: 0000000000000246 R12: 0000000000040000
[  229.119254] R13: 0000000000000000 R14: 00005574ab3841b0 R15: 00005574ab37b480
[  229.120183]  </TASK>
[  229.120582] Modules linked in: btrfs(O+) blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop
[  229.122735] CR2: 00006b636f6c5f7a
[  229.123266] ---[ end trace 0000000000000000 ]---
[  229.123916] RIP: 0010:folio_flags.constprop.0+0x12/0x30 [btrfs]
[  229.127469] RSP: 0018:ffff88800294f840 EFLAGS: 00010282
[  229.128273] RAX: 0000000000000000 RBX: 00006b636f6c5f72 RCX: ffffffffc05c5c12
[  229.129305] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00006b636f6c5f7a
[  229.130327] RBP: 00006b636f6c5f72 R08: 0000000000000001 R09: fffffbfff37183f8
[  229.131362] R10: ffffffff9b8c1fc7 R11: 0000000000000001 R12: 0000000000000000
[  229.132442] R13: ffff88801d1e4008 R14: 0000000000001000 R15: ffffffffc08b0228
[  229.133657] FS:  00007f07838e5740(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
[  229.134887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  229.135776] CR2: 00006b636f6c5f7a CR3: 000000000598a000 CR4: 00000000000006b0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups
  2023-12-04 21:10   ` Qu Wenruo
@ 2023-12-05 14:04     ` David Sterba
  0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2023-12-05 14:04 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Tue, Dec 05, 2023 at 07:40:55AM +1030, Qu Wenruo wrote:
> On 2023/12/5 02:56, David Sterba wrote:
> > On Fri, Dec 01, 2023 at 04:36:53PM +1030, Qu Wenruo wrote:
> >> [CHANGELOG]
> >> v2:
> >> - Adda new patch to do more cleanups for metadata page pointers usage
> >>
> >> This patchset would migrate extent_buffer::pages[] to folios[], then
> >> cleanup the existing metadata page pointer usage to proper folios ones.
> >>
> >> This cleanup would help future higher order folios usage for metadata in
> >> the following aspects:
> >>
> >> - No more need to iterate through the remaining pages for page flags
> >>    We just call folio_set/mark/start_*() helpers, for the single folio.
> >>    The helper would only set the flag (mostly for the leading page).
> >>
> >> - Single bio_add_folio() call for the whole eb
> >>
> >> - Better filio helpers naming
> >>    PageUptodate() compared to folio_test_uptodate().
> >>
> >> The first patch would do a very simple conversion, then the 2nd patch do
> >> the prepartion for the higher order folio situation.
> >>
> >> There are two locations which won't be converted to folios yet:
> >>
> >> - Subpage code
> >>    There is no meaning to support higher order folio for subpage.
> >>    The two conditions are just conflicting with each other.
> >>
> >> - Data page pointers
> >>    That would be more useful in the future, before we going to support
> >>    multi-page sectorsize.
> >>
> >> However the 2nd one would also add a new corner case:
> >>
> >> - Order mismatch in filemap and eb folios
> >>    Unforatunately I don't have a better plan other than re-allocate the
> >>    folios to the same order.
> >>    Maybe in the future we would have better ways to handle it? Like
> >>    migrating the pages to the higher order one?
> >
> > As long as it's a no-op for now this is OK, we can do the higher order
> > allocation for eb pages afterwards.
> >
> Yep, it won't cause any problem for now.
> 
> Although this corner case is making me wondering if the new
> alloc-then-attach is really any better than the original
> alloc-and-attach solution.
> 
> If the mm (filemap) layer can allow us to allocate larger folios, it may
> be much simpler.
> The current code only needs to setlarge folio support for the mapping,
> then go with high order fpg_flags.
> The filemap code is already doing the retry and unalignment check.
> 
> But the existing filemap code would also try to reduce the order, which
> can lead to other problems, like one extent buffer with multiple
> different order folios.
> Meanwhile alloc-and-attach gives us full control on the order, thus
> allowing all-or-none (one single large folio, or all single page ones)
> solution required by the 2nd patch.
> 
> Anyway, I would continue with the current alloc-then-attach method to
> experiment the higher order folios allocation first to find out all the
> pitfalls first.

Agreed, we'll eventually converge to the full folio API but what you
found so far seems that we can expect surprises.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] btrfs: cleanup metadata page pointer usage
  2023-12-05 14:00   ` David Sterba
@ 2023-12-05 20:13     ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2023-12-05 20:13 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



On 2023/12/6 00:30, David Sterba wrote:
> On Fri, Dec 01, 2023 at 04:36:55PM +1030, Qu Wenruo wrote:
>> Although we have migrated extent_buffer::pages[] to folios[], we're
>> still mostly using the folio_page() help to grab the page.
>>
>> This patch would do the following cleanups for metadata:
>>
>> - Introduce num_extent_folios() helper
>>    This is to replace most num_extent_pages() callers.
>>
>> - Use num_extent_folios() to iterate future large folios
>>    This allows us to use things like
>>    bio_add_folio()/bio_add_folio_nofail(), and only set the needed flags
>>    for the folio (aka the leading/tailing page), which reduces the loop
>>    iteration to 1 for large folios.
>>
>> - Change metadata related functions to use folio pointers
>>    Including their function name, involving:
>>    * attach_extent_buffer_page()
>>    * detach_extent_buffer_page()
>>    * page_range_has_eb()
>>    * btrfs_release_extent_buffer_pages()
>>    * btree_clear_page_dirty()
>>    * btrfs_page_inc_eb_refs()
>>    * btrfs_page_dec_eb_refs()
>>
>> - Change btrfs_is_subpage() to accept an address_space pointer
>>    This is to allow both page->mapping and folio->mapping to be utilized.
>>    As data is still using the old per-page code, and may keep so for a
>>    while.
>>
>> - Special corner case place holder for future order mismatches between
>>    extent buffer and inode filemap
>>    For now it's  just a block of comments and a dead ASSERT(), no real
>>    handling yet.
>>
>> The subpage code would still go page, just because subpage and large
>> folio are conflicting conditions, thus we don't need to bother subpage
>> with higher order folios at all. Just folio_page(folio, 0) would be
>> enough.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> KASAN reports some problems:
>
> [  228.984474] Btrfs loaded, debug=on, assert=on, ref-verify=on, zoned=yes, fsverity=yes
> [  228.986241] BTRFS: selftest: sectorsize: 4096  nodesize: 4096
> [  228.987192] BTRFS: selftest: running btrfs free space cache tests
> [  228.988141] BTRFS: selftest: running extent only tests
> [  228.989096] BTRFS: selftest: running bitmap only tests
> [  228.990108] BTRFS: selftest: running bitmap and extent tests
> [  228.991396] BTRFS: selftest: running space stealing from bitmap to extent tests
> [  228.993137] BTRFS: selftest: running bytes index tests
> [  228.994741] BTRFS: selftest: running extent buffer operation tests
> [  228.995875] BTRFS: selftest: running btrfs_split_item tests
> [  228.997062] ==================================================================
> [  228.998388] BUG: KASAN: global-out-of-bounds in alloc_eb_folio_array+0xd6/0x180 [btrfs]

Unfortunately I failed to reproduce here, even I have enabled KASAN and
always have selftest enabled.

> [  229.000005] Read of size 8 at addr ffffffffc07f32e8 by task modprobe/13543

The address looks like stack address, and since it's not reported from
btrfs_alloc_page_array(), it must be the
   "eb->folios[i] = page_folio(page_array[i]);" line.

But I failed to see any obvious problem here.

Any extra clue for debugging?

Thanks,
Qu
> [  229.000973]
> [  229.001294] CPU: 2 PID: 13543 Comm: modprobe Tainted: G           O       6.7.0-rc4-default+ #2250
> [  229.002556] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
> [  229.004054] Call Trace:
> [  229.004489]  <TASK>
> [  229.004913]  dump_stack_lvl+0x46/0x70
> [  229.005546]  print_address_description.constprop.0+0x30/0x420
> [  229.006431]  print_report+0xb0/0x260
> [  229.007023]  ? kasan_addr_to_slab+0x9/0xc0
> [  229.007685]  kasan_report+0xbe/0xf0
> [  229.008275]  ? alloc_eb_folio_array+0xd6/0x180 [btrfs]
> [  229.009212]  ? alloc_eb_folio_array+0xd6/0x180 [btrfs]
> [  229.010113]  alloc_eb_folio_array+0xd6/0x180 [btrfs]
> [  229.010932]  ? btrfs_alloc_page_array+0x100/0x100 [btrfs]
> [  229.011816]  ? is_module_address+0x11/0x30
> [  229.012488]  ? static_obj+0x73/0xa0
> [  229.013054]  ? lockdep_init_map_type+0xe8/0x360
> [  229.013751]  ? __raw_spin_lock_init+0x73/0x90
> [  229.014426]  ? __alloc_extent_buffer+0x14a/0x190 [btrfs]
> [  229.015225]  __alloc_dummy_extent_buffer+0x2e/0x2b0 [btrfs]
> [  229.016121]  test_btrfs_split_item+0xcf/0x7d0 [btrfs]
> [  229.016995]  ? btrfs_test_free_space_cache+0x1e0/0x1e0 [btrfs]
> [  229.017872]  ? info_print_prefix+0x100/0x100
> [  229.018523]  ? btrfs_test_free_space_cache+0xda/0x1e0 [btrfs]
> [  229.019461]  ? __kmem_cache_free+0xfa/0x200
> [  229.020064]  btrfs_run_sanity_tests+0x78/0x140 [btrfs]
> [  229.020976]  init_btrfs_fs+0x38/0x220 [btrfs]
> [  229.021867]  ? btrfs_interface_init+0x20/0x20 [btrfs]
> [  229.022818]  do_one_initcall+0xc3/0x3b0
> [  229.023473]  ? trace_event_raw_event_initcall_level+0x150/0x150
> [  229.024436]  ? __kmem_cache_alloc_node+0x1b5/0x2e0
> [  229.025279]  ? do_init_module+0x38/0x3b0
> [  229.025806]  ? kasan_unpoison+0x40/0x60
> [  229.026477]  do_init_module+0x135/0x3b0
> [  229.027134]  load_module+0x11c1/0x13c0
> [  229.027766]  ? layout_and_allocate.isra.0+0x280/0x280
> [  229.028435]  ? kernel_read_file+0x252/0x3f0
> [  229.029060]  ? __ia32_sys_fsconfig+0x70/0x70
> [  229.029755]  ? init_module_from_file+0xd1/0x130
> [  229.030489]  init_module_from_file+0xd1/0x130
> [  229.031184]  ? __do_sys_init_module+0x1a0/0x1a0
> [  229.031954]  ? idempotent_init_module+0x3b9/0x3d0
> [  229.032727]  ? do_raw_spin_unlock+0x84/0xf0
> [  229.033393]  idempotent_init_module+0x1ac/0x3d0
> [  229.034101]  ? init_module_from_file+0x130/0x130
> [  229.034837]  ? __fget_files+0xfd/0x1e0
> [  229.035449]  __x64_sys_finit_module+0x72/0xb0
> [  229.037902]  do_syscall_64+0x41/0xe0
> [  229.038495]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
> [  229.039314] RIP: 0033:0x7f07839f53dd
> [  229.042725] RSP: 002b:00007fff7eb94c18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [  229.043642] RAX: ffffffffffffffda RBX: 00005574ab37ba20 RCX: 00007f07839f53dd
> [  229.044833] RDX: 0000000000000000 RSI: 00005574ab372c5a RDI: 000000000000000e
> [  229.046002] RBP: 00005574ab372c5a R08: 0000000000000000 R09: 00005574ab37baa0
> [  229.047195] R10: 00005574ab37f850 R11: 0000000000000246 R12: 0000000000040000
> [  229.048275] R13: 0000000000000000 R14: 00005574ab3841b0 R15: 00005574ab37b480
> [  229.049320]  </TASK>
> [  229.049767]
> [  229.050124] The buggy address belongs to the variable:
> [  229.050961]  __func__.1+0x288/0xffffffffffeeafa0 [btrfs]
> [  229.051928]
> [  229.052205] Memory state around the buggy address:
> [  229.052971]  ffffffffc07f3180: f9 f9 f9 f9 00 04 f9 f9 f9 f9 f9 f9 00 00 00 00
> [  229.054075]  ffffffffc07f3200: 00 00 00 00 03 f9 f9 f9 f9 f9 f9 f9 00 00 00 03
> [  229.055173] >ffffffffc07f3280: f9 f9 f9 f9 00 02 f9 f9 f9 f9 f9 f9 00 07 f9 f9
> [  229.056210]                                                           ^
> [  229.057066]  ffffffffc07f3300: f9 f9 f9 f9 00 00 00 00 04 f9 f9 f9 f9 f9 f9 f9
> [  229.058152]  ffffffffc07f3380: 00 00 00 02 f9 f9 f9 f9 00 00 03 f9 f9 f9 f9 f9
> [  229.059119] ==================================================================
> [  229.060341] Disabling lock debugging due to kernel taint
> [  229.061254] BUG: unable to handle page fault for address: 00006b636f6c5f7a
> [  229.062383] #PF: supervisor read access in kernel mode
> [  229.063232] #PF: error_code(0x0000) - not-present page
> [  229.064079] PGD 0 P4D 0
> [  229.064544] Oops: 0000 [#1] PREEMPT SMP KASAN
> [  229.065296] CPU: 2 PID: 13543 Comm: modprobe Tainted: G    B      O       6.7.0-rc4-default+ #2250
> [  229.066783] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
> [  229.068574] RIP: 0010:folio_flags.constprop.0+0x12/0x30 [btrfs]
> [  229.072394] RSP: 0018:ffff88800294f840 EFLAGS: 00010282
> [  229.073281] RAX: 0000000000000000 RBX: 00006b636f6c5f72 RCX: ffffffffc05c5c12
> [  229.074409] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00006b636f6c5f7a
> [  229.075488] RBP: 00006b636f6c5f72 R08: 0000000000000001 R09: fffffbfff37183f8
> [  229.076600] R10: ffffffff9b8c1fc7 R11: 0000000000000001 R12: 0000000000000000
> [  229.077723] R13: ffff88801d1e4008 R14: 0000000000001000 R15: ffffffffc08b0228
> [  229.078905] FS:  00007f07838e5740(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
> [  229.080262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  229.081228] CR2: 00006b636f6c5f7a CR3: 000000000598a000 CR4: 00000000000006b0
> [  229.082417] Call Trace:
> [  229.082849]  <TASK>
> [  229.083289]  ? __die+0x1f/0x60
> [  229.083896]  ? page_fault_oops+0x71/0xa0
> [  229.084629]  ? exc_page_fault+0x69/0xd0
> [  229.085354]  ? asm_exc_page_fault+0x22/0x30
> [  229.086130]  ? folio_flags.constprop.0+0x12/0x30 [btrfs]
> [  229.087170]  ? folio_flags.constprop.0+0x12/0x30 [btrfs]
> [  229.088173]  __alloc_dummy_extent_buffer+0x5b/0x2b0 [btrfs]
> [  229.089206]  test_btrfs_split_item+0xcf/0x7d0 [btrfs]
> [  229.090139]  ? btrfs_test_free_space_cache+0x1e0/0x1e0 [btrfs]
> [  229.091192]  ? info_print_prefix+0x100/0x100
> [  229.091900]  ? btrfs_test_free_space_cache+0xda/0x1e0 [btrfs]
> [  229.092924]  ? __kmem_cache_free+0xfa/0x200
> [  229.093604]  btrfs_run_sanity_tests+0x78/0x140 [btrfs]
> [  229.094544]  init_btrfs_fs+0x38/0x220 [btrfs]
> [  229.095401]  ? btrfs_interface_init+0x20/0x20 [btrfs]
> [  229.096325]  do_one_initcall+0xc3/0x3b0
> [  229.096948]  ? trace_event_raw_event_initcall_level+0x150/0x150
> [  229.097765]  ? __kmem_cache_alloc_node+0x1b5/0x2e0
> [  229.098594]  ? do_init_module+0x38/0x3b0
> [  229.099189]  ? kasan_unpoison+0x40/0x60
> [  229.099794]  do_init_module+0x135/0x3b0
> [  229.100479]  load_module+0x11c1/0x13c0
> [  229.101095]  ? layout_and_allocate.isra.0+0x280/0x280
> [  229.101874]  ? kernel_read_file+0x252/0x3f0
> [  229.102481]  ? __ia32_sys_fsconfig+0x70/0x70
> [  229.103189]  ? init_module_from_file+0xd1/0x130
> [  229.103918]  init_module_from_file+0xd1/0x130
> [  229.104615]  ? __do_sys_init_module+0x1a0/0x1a0
> [  229.105352]  ? idempotent_init_module+0x3b9/0x3d0
> [  229.106118]  ? do_raw_spin_unlock+0x84/0xf0
> [  229.106787]  idempotent_init_module+0x1ac/0x3d0
> [  229.107592]  ? init_module_from_file+0x130/0x130
> [  229.108413]  ? __fget_files+0xfd/0x1e0
> [  229.108979]  __x64_sys_finit_module+0x72/0xb0
> [  229.109686]  do_syscall_64+0x41/0xe0
> [  229.110286]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
> [  229.111074] RIP: 0033:0x7f07839f53dd
> [  229.114183] RSP: 002b:00007fff7eb94c18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [  229.115057] RAX: ffffffffffffffda RBX: 00005574ab37ba20 RCX: 00007f07839f53dd
> [  229.116099] RDX: 0000000000000000 RSI: 00005574ab372c5a RDI: 000000000000000e
> [  229.117162] RBP: 00005574ab372c5a R08: 0000000000000000 R09: 00005574ab37baa0
> [  229.118178] R10: 00005574ab37f850 R11: 0000000000000246 R12: 0000000000040000
> [  229.119254] R13: 0000000000000000 R14: 00005574ab3841b0 R15: 00005574ab37b480
> [  229.120183]  </TASK>
> [  229.120582] Modules linked in: btrfs(O+) blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop
> [  229.122735] CR2: 00006b636f6c5f7a
> [  229.123266] ---[ end trace 0000000000000000 ]---
> [  229.123916] RIP: 0010:folio_flags.constprop.0+0x12/0x30 [btrfs]
> [  229.127469] RSP: 0018:ffff88800294f840 EFLAGS: 00010282
> [  229.128273] RAX: 0000000000000000 RBX: 00006b636f6c5f72 RCX: ffffffffc05c5c12
> [  229.129305] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00006b636f6c5f7a
> [  229.130327] RBP: 00006b636f6c5f72 R08: 0000000000000001 R09: fffffbfff37183f8
> [  229.131362] R10: ffffffff9b8c1fc7 R11: 0000000000000001 R12: 0000000000000000
> [  229.132442] R13: ffff88801d1e4008 R14: 0000000000001000 R15: ffffffffc08b0228
> [  229.133657] FS:  00007f07838e5740(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
> [  229.134887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  229.135776] CR2: 00006b636f6c5f7a CR3: 000000000598a000 CR4: 00000000000006b0
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-12-05 20:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-01  6:06 [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups Qu Wenruo
2023-12-01  6:06 ` [PATCH v2 1/2] btrfs: migrate extent_buffer::pages[] to folio Qu Wenruo
2023-12-01  6:06 ` [PATCH v2 2/2] btrfs: cleanup metadata page pointer usage Qu Wenruo
2023-12-05 14:00   ` David Sterba
2023-12-05 20:13     ` Qu Wenruo
2023-12-04 16:26 ` [PATCH v2 0/2] btrfs: migrate extent_buffer::pages[] to folio and more cleanups David Sterba
2023-12-04 21:10   ` Qu Wenruo
2023-12-05 14:04     ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox